Capital Asset Pricing Model

The CAPM extends mean–variance theory to equilibrium: when every investor holds efficient portfolios and can also borrow or lend at a risk-free rate, only an asset's co-movement with the market portfolio is rewarded. This chapter builds the tangency portfolio, derives the security market line, and estimates CAPM alphas and betas by regression. Use the R | Python toggle to switch.

library(tidyverse)
library(tidyfinance)
library(scales)
library(ggrepel)
import pandas as pd
import numpy as np
import tidyfinance as tf
from plotnine import *
from mizani.formatters import percent_format

Asset returns and volatilities

As in the previous chapter, we use the Dow Jones constituents: download prices, keep stocks with full history, and form monthly returns.

symbols <- download_data(
  type = "constituents",
  index = "Dow Jones Industrial Average"
)

prices_daily <- download_data(
  type = "stock_prices", symbol = symbols$symbol,
  start_date = "2000-01-01", end_date = "2024-12-31"
) |>
  select(symbol, date, adjusted_close)

prices_daily <- prices_daily |>
  group_by(symbol) |>
  mutate(n = n()) |>
  ungroup() |>
  filter(n == max(n)) |>
  select(-n)

returns_monthly <- prices_daily |>
  mutate(date = floor_date(date, "month")) |>
  group_by(symbol, date) |>
  summarize(price = last(adjusted_close), .groups = "drop_last") |>
  mutate(ret = price / lag(price) - 1) |>
  drop_na(ret) |>
  select(-price)
symbols = tf.download_data(
    domain="constituents", index="Dow Jones Industrial Average"
)

prices_daily = tf.download_data(
    domain="stock_prices", symbols=symbols["symbol"].tolist(),
    start_date="2000-01-01", end_date="2023-12-31"
)

prices_daily = (prices_daily
    .groupby("symbol")
    .apply(lambda x: x.assign(counts=x["adjusted_close"].dropna().count()))
    .reset_index(drop=True)
    .query("counts == counts.max()")
)

returns_monthly = (prices_daily
    .assign(date=prices_daily["date"].dt.to_period("M").dt.to_timestamp())
    .groupby(["symbol", "date"], as_index=False)
    .agg(adjusted_close=("adjusted_close", "last"))
    .assign(ret=lambda x: x.groupby("symbol")["adjusted_close"].pct_change())
)

The tangency portfolio and the risk-free asset

Introducing a risk-free asset changes the opportunity set: the efficient frontier becomes a straight line from the risk-free rate through one special risky portfolio, the tangency portfolio. We proxy the risk-free rate with the 13-week T-bill (^IRX), converting its annualized yield to a monthly rate.

risk_free_monthly <- download_data(
  type = "stock_prices", symbol = "^IRX",
  start_date = "2019-10-01", end_date = "2024-09-30"
) |>
  mutate(risk_free = (1 + adjusted_close / 100)^(1 / 12) - 1) |>
  select(date, risk_free) |>
  drop_na()

rf <- mean(risk_free_monthly$risk_free)
risk_free_monthly = (
    tf.download_data("stock_prices", symbols="^IRX",
                     start_date="2019-10-01", end_date="2024-09-30")
    .assign(risk_free=lambda x: (1 + x["adjusted_close"] / 100)**(1/12) - 1)
    .dropna()
)

rf = risk_free_monthly["risk_free"].mean()

The tangency weights are the inverse covariance matrix times the vector of excess returns, renormalized. Its Sharpe ratio — excess return over volatility — is the slope of the efficient frontier and, by construction, the maximum attainable.

assets <- returns_monthly |>
  group_by(symbol) |>
  summarise(mu = mean(ret), sigma = sd(ret))

sigma <- returns_monthly |>
  pivot_wider(names_from = symbol, values_from = ret) |>
  select(-date) |>
  cov()

mu <- returns_monthly |>
  group_by(symbol) |>
  summarise(mu = mean(ret)) |>
  pull(mu)

w_tan <- solve(sigma) %*% (mu - rf)
w_tan <- w_tan / sum(w_tan)

mu_w <- as.numeric(t(w_tan) %*% mu)
sigma_w <- as.numeric(sqrt(t(w_tan) %*% sigma %*% w_tan))
sharpe_ratio <- (mu_w - rf) / sigma_w
assets = (returns_monthly
    .groupby("symbol", as_index=False)
    .agg(mu=("ret", "mean"), sigma=("ret", "std"))
)

sigma = (returns_monthly
    .pivot(index="date", columns="symbol", values="ret")
    .cov()
)

mu = returns_monthly.groupby("symbol")["ret"].mean().values

w_tan = np.linalg.solve(sigma, mu - rf)
w_tan = w_tan / np.sum(w_tan)

mu_w = w_tan.T @ mu
sigma_w = np.sqrt(w_tan.T @ sigma @ w_tan)
sharpe_ratio = (mu_w - rf) / sigma_w

The security market line

Each asset's beta is its covariance with the tangency portfolio divided by that portfolio's variance. The CAPM's central equation says expected excess return is linear in beta, with slope equal to the market (tangency) risk premium — the security market line.

betas <- (sigma %*% w_tan) / as.numeric(t(w_tan) %*% sigma %*% w_tan)
assets <- assets |> mutate(beta = betas)

price_of_risk <- as.numeric(t(w_tan) %*% mu - rf)

assets |>
  ggplot(aes(x = beta, y = mu)) +
  geom_point() +
  geom_abline(intercept = rf, slope = price_of_risk) +
  scale_y_continuous(labels = percent) +
  labs(x = "Beta", y = "Expected return", title = "Security market line")
betas = (sigma @ w_tan) / (w_tan.T @ sigma @ w_tan)
assets["beta"] = betas.values

price_of_risk = float(w_tan.T @ mu - rf)

assets_figure = (
    ggplot(assets, aes(x="beta", y="mu"))
    + geom_point()
    + geom_abline(intercept=rf, slope=price_of_risk)
    + scale_y_continuous(labels=percent_format())
    + labs(x="Beta", y="Expected return", title="Security market line")
)
assets_figure.show()

Estimating alpha and beta by regression

In equilibrium the tangency portfolio is the market portfolio, so empirically the CAPM is a regression of each asset's excess return on the market's excess return: the slope is beta, the intercept is alpha (risk-adjusted performance, which should be zero if the CAPM holds). We use the Fama–French market factor as the market proxy, and run one regression per stock via nested/ grouped data.

factors <- download_data(
  type = "factors_ff_5_2x3_monthly",
  start_date = "2000-01-01", end_date = "2024-09-30"
) |>
  select(date, mkt_excess, risk_free)

returns_excess_monthly <- returns_monthly |>
  left_join(factors, join_by(date)) |>
  mutate(ret_excess = ret - risk_free) |>
  select(symbol, date, ret_excess, mkt_excess)

estimate_capm <- function(data) {
  fit <- lm("ret_excess ~ mkt_excess", data = data)
  tibble(
    coefficient = c("alpha", "beta"),
    estimate = coefficients(fit),
    t_statistic = summary(fit)$coefficients[, "t value"]
  )
}

capm_results <- returns_excess_monthly |>
  nest(data = -symbol) |>
  mutate(capm = map(data, estimate_capm)) |>
  unnest(capm) |>
  select(symbol, coefficient, estimate, t_statistic)
import statsmodels.formula.api as smf

factors = tf.download_data(
    domain="famafrench",
    dataset="F-F_Research_Data_5_Factors_2x3",
    start_date="2000-01-01", end_date="2024-09-30",
)

returns_excess_monthly = (returns_monthly
    .merge(factors, on="date", how="left")
    .assign(ret_excess=lambda x: x["ret"] - x["risk_free"])
)

def estimate_capm(data):
    model = smf.ols("ret_excess ~ mkt_excess", data=data).fit()
    return pd.DataFrame({
        "coefficient": ["alpha", "beta"],
        "estimate": model.params.values,
        "t_statistic": model.tvalues.values,
    })

capm_results = (returns_excess_monthly
    .groupby("symbol", group_keys=True)
    .apply(estimate_capm)
    .reset_index()
)

Plotting the estimated alphas, colored by statistical significance, is the standard read on the cross-section: under the CAPM almost none should be significantly different from zero.

capm_results |>
  filter(coefficient == "alpha") |>
  mutate(is_significant = abs(t_statistic) >= 1.96) |>
  ggplot(aes(x = estimate, y = fct_reorder(symbol, estimate),
             fill = is_significant)) +
  geom_col() +
  scale_x_continuous(labels = percent) +
  labs(x = "Estimated asset alphas", y = NULL,
       fill = "Significant at 95%?",
       title = "Estimated CAPM alphas for Dow index constituents")
alphas = (capm_results
    .query("coefficient == 'alpha'")
    .assign(is_significant=lambda x: np.abs(x["t_statistic"]) >= 1.96)
)
alphas["symbol"] = pd.Categorical(
    alphas["symbol"],
    categories=alphas.sort_values("estimate")["symbol"], ordered=True
)

alphas_figure = (
    ggplot(alphas, aes(y="estimate", x="symbol", fill="is_significant"))
    + geom_col()
    + scale_y_continuous(labels=percent_format())
    + coord_flip()
    + labs(x="Estimated asset alphas", y="", fill="Significant at 95%?",
           title="Estimated CAPM alphas for Dow index constituents")
)
alphas_figure.show()

Shortcomings and extensions

The CAPM rarely survives the data cleanly: the true market portfolio is unobservable, betas drift over time, and documented anomalies (size, value) show systematic market risk alone does not explain returns. These failures motivate the multi-factor models — Fama–French three- and five-factor, Carhart momentum — that the later asset-pricing chapters build and test.


Study notes following the Tidy Finance curriculum by Scheuch, Voigt, Weiss, and Frey. Prose is my own; the R/Python code is reproduced from the book's open-source source, licensed CC BY-NC-SA 4.0.