Modern Portfolio Theory

Markowitz's mean–variance framework is the foundation of quantitative portfolio choice: given expected returns and a covariance matrix, it finds the portfolios that deliver the most return per unit of risk. This chapter builds the efficient frontier numerically for the Dow Jones constituents. Use the R | Python toggle to switch languages.

library(tidyverse)
library(tidyfinance)
library(scales)
library(ggrepel)
import pandas as pd
import numpy as np
import tidyfinance as tf
from plotnine import *
from mizani.formatters import percent_format

The asset universe

We download the index constituents and their daily prices, keep only stocks with a full history (so the covariance matrix is balanced), then aggregate to monthly returns. Each stock's mean return and volatility summarize its risk–return profile.

symbols <- download_data(
  type = "constituents",
  index = "Dow Jones Industrial Average"
)

prices_daily <- download_data(
  type = "stock_prices",
  symbol = symbols$symbol,
  start_date = "2000-01-01",
  end_date = "2024-12-31"
)

prices_daily <- prices_daily |>
  group_by(symbol) |>
  mutate(n = n()) |>
  ungroup() |>
  filter(n == max(n)) |>
  select(-n)

returns_monthly <- prices_daily |>
  mutate(date = floor_date(date, "month")) |>
  group_by(symbol, date) |>
  summarize(price = last(adjusted_close), .groups = "drop_last") |>
  mutate(ret = price / lag(price) - 1) |>
  drop_na(ret) |>
  select(-price)
symbols = tf.download_data(
    domain="constituents", index="Dow Jones Industrial Average"
)

prices_daily = tf.download_data(
    domain="stock_prices",
    symbols=symbols["symbol"].tolist(),
    start_date="2000-01-01",
    end_date="2023-12-31",
)

prices_daily = (prices_daily
    .groupby("symbol")
    .apply(lambda x: x.assign(counts=x["adjusted_close"].dropna().count()))
    .reset_index(drop=True)
    .query("counts == counts.max()")
)

returns_monthly = (prices_daily
    .assign(date=lambda x: x["date"].dt.to_period("M").dt.to_timestamp())
    .groupby(["symbol", "date"], as_index=False)
    .agg(adjusted_close=("adjusted_close", "last"))
    .assign(ret=lambda x: x.groupby("symbol")["adjusted_close"].pct_change())
)

The two inputs the optimizer needs are the vector of mean returns and the covariance matrix. We get the covariance by pivoting returns to wide form (one column per stock) and calling cov.

assets <- returns_monthly |>
  group_by(symbol) |>
  summarize(mu = mean(ret), sigma = sd(ret))

returns_wide <- returns_monthly |>
  pivot_wider(names_from = symbol, values_from = ret)

sigma <- returns_wide |>
  select(-date) |>
  cov()
assets = (returns_monthly
    .groupby("symbol", as_index=False)
    .agg(mu=("ret", "mean"), sigma=("ret", "std"))
)

returns_wide = returns_monthly.pivot(
    index="date", columns="symbol", values="ret"
).reset_index()

sigma = returns_wide.drop(columns=["date"]).cov()

The minimum-variance portfolio

The minimum-variance portfolio (MVP) ignores expected returns entirely — it just minimizes variance subject to the weights summing to one. Its closed form is the inverse covariance matrix times a vector of ones, renormalized so the weights sum to one.

iota <- rep(1, dim(sigma)[1])
sigma_inv <- solve(sigma)
omega_mvp <- as.vector(sigma_inv %*% iota) /
  as.numeric(t(iota) %*% sigma_inv %*% iota)
iota = np.ones(sigma.shape[0])
sigma_inv = np.linalg.inv(sigma.values)
omega_mvp = (sigma_inv @ iota) / (iota @ sigma_inv @ iota)

With the weights in hand, the MVP's expected return and volatility are simple quadratic forms — its location is the leftmost tip of the frontier.

mu <- assets$mu
summary_mvp <- tibble(
  mu = as.numeric(t(omega_mvp) %*% mu),
  sigma = as.numeric(sqrt(t(omega_mvp) %*% sigma %*% omega_mvp)),
  type = "Minimum-Variance Portfolio"
)
summary_mvp
mu = assets["mu"].values
mu_mvp = omega_mvp @ mu
sigma_mvp = np.sqrt(omega_mvp @ sigma.values @ omega_mvp)
summary_mvp = pd.DataFrame(
    {"mu": [mu_mvp], "sigma": [sigma_mvp],
     "type": ["Minimum-Variance Portfolio"]}
)
summary_mvp

Efficient portfolios

To trace the frontier we need a second efficient portfolio — say the one that achieves the highest single-stock mean return at least variance. Its weights have a closed form built from three scalars (C, D, E) that summarize the covariance structure, with a Lagrange multiplier lambda_tilde controlling how far we move from the MVP toward higher return.

mu_bar <- max(assets$mu)
C <- as.numeric(t(iota) %*% sigma_inv %*% iota)
D <- as.numeric(t(iota) %*% sigma_inv %*% mu)
E <- as.numeric(t(mu) %*% sigma_inv %*% mu)
lambda_tilde <- as.numeric(2 * (mu_bar - D / C) / (E - D^2 / C))
omega_efp <- as.vector(
  omega_mvp + lambda_tilde / 2 * (sigma_inv %*% mu - D * omega_mvp)
)
mu_bar = assets["mu"].max()
C = iota @ sigma_inv @ iota
D = iota @ sigma_inv @ mu
E = mu @ sigma_inv @ mu
lambda_tilde = 2 * (mu_bar - D / C) / (E - (D**2) / C)
omega_efp = omega_mvp + (lambda_tilde / 2) * (sigma_inv @ mu - D * omega_mvp)

The efficient frontier

The two-fund separation theorem says every efficient portfolio is a combination of two efficient portfolios. So sweeping a mixing weight a over the MVP and the efficient portfolio traces the entire frontier — for each a we record the portfolio's mean and volatility.

efficient_frontier <- tibble(
  a = seq(from = -1, to = 2, by = 0.01)
) |>
  mutate(
    omega = map(a, \(x) x * omega_efp + (1 - x) * omega_mvp),
    mu = map_dbl(omega, \(x) t(x) %*% mu),
    sigma = map_dbl(omega, \(x) sqrt(t(x) %*% sigma %*% x))
  )
efficient_frontier = (
    pd.DataFrame({"a": np.arange(-1, 2.01, 0.01)})
    .assign(omega=lambda x: x["a"].map(
        lambda a: a * omega_efp + (1 - a) * omega_mvp))
    .assign(
        mu=lambda x: x["omega"].map(lambda w: w @ mu),
        sigma=lambda x: x["omega"].map(lambda w: np.sqrt(w @ sigma @ w)),
    )
)

Plotting mean against volatility draws the frontier's hyperbola. Overlaying the individual stocks shows the diversification gain: the frontier sits well to the left of any single stock — the same expected return for less risk.

efficient_frontier |>
  ggplot(aes(x = sigma, y = mu)) +
  geom_point() +
  geom_point(data = assets, aes(x = sigma, y = mu)) +
  scale_x_continuous(labels = percent) +
  scale_y_continuous(labels = percent) +
  labs(
    x = "Volatility", y = "Expected return",
    title = "Efficient frontier for Dow Jones index constituents"
  )
frontier_figure = (
    ggplot(efficient_frontier, aes(x="sigma", y="mu"))
    + geom_point()
    + geom_point(data=assets, mapping=aes(x="sigma", y="mu"))
    + scale_x_continuous(labels=percent_format())
    + scale_y_continuous(labels=percent_format())
    + labs(x="Volatility", y="Expected return",
           title="Efficient frontier for Dow Jones index constituents")
)
frontier_figure.show()

Study notes following the Tidy Finance curriculum by Scheuch, Voigt, Weiss, and Frey. Prose is my own; the R/Python code is reproduced from the book's open-source source, licensed CC BY-NC-SA 4.0.