Beta Estimation

Jun He May 31, 2026

This chapter estimates CAPM market betas for the whole CRSP cross-section, using rolling-window regressions of excess returns on the market factor. It is the first computation that runs over thousands of stocks, so the emphasis is on doing the rolling regression efficiently. Code reads the panel built in earlier chapters from the local database. Use the R | Python toggle to switch.

library(tidyverse)
library(tidyfinance)
library(scales)

import pandas as pd
import numpy as np
import sqlite3
from statsmodels.regression.rolling import RollingOLS
import statsmodels.formula.api as smf

Beta from monthly returns

We load the monthly CRSP panel and the market factor, join them, and run a single CAPM regression for one stock to fix ideas: the slope on mkt_excess is that stock's beta over the full sample.

crsp_monthly <- tbl(tidy_finance, "crsp_monthly") |>
  select(permno, date, industry, ret_excess) |>
  collect()

factors_ff3_monthly <- tbl(tidy_finance, "factors_ff3_monthly") |>
  select(date, mkt_excess) |>
  collect()

crsp_monthly <- crsp_monthly |>
  left_join(factors_ff3_monthly, join_by(date))

fit <- lm(ret_excess ~ mkt_excess,
  data = crsp_monthly |> filter(permno == "14593"))
summary(fit)

tidy_finance = sqlite3.connect(database="data/tidy_finance_python.sqlite")

crsp_monthly = (pd.read_sql_query(
    sql="SELECT permno, date, industry, ret_excess FROM crsp_monthly",
    con=tidy_finance, parse_dates={"date"})
  .dropna()
)

factors_ff3_monthly = pd.read_sql_query(
  sql="SELECT date, mkt_excess FROM factors_ff3_monthly",
  con=tidy_finance, parse_dates={"date"}
)

crsp_monthly = crsp_monthly.merge(factors_ff3_monthly, how="left", on="date")

model_beta = smf.ols(
    formula="ret_excess ~ mkt_excess",
    data=crsp_monthly.query("permno == 14593")
).fit()
model_beta.summary()

Rolling-window estimation

A single full-sample beta hides time variation. We instead estimate beta on a rolling window — here 60 months, requiring at least 48 observations. In R this is wrapped in a small estimate_capm helper called by slide_period_vec; in Python RollingOLS does the rolling fit directly.

estimate_capm <- function(data, min_obs = 1) {
  if (nrow(data) < min_obs) {
    beta <- as.numeric(NA)
  } else {
    fit <- lm(ret_excess ~ mkt_excess, data = data)
    beta <- as.numeric(coefficients(fit)[2])
  }
  beta
}

roll_capm_estimation <- function(data, months, min_obs) {
  data <- data |> arrange(date)
  betas <- slide_period_vec(
    .x = data, .i = data$date, .period = "month",
    .f = ~ estimate_capm(., min_obs),
    .before = months - 1, .complete = FALSE
  )
  tibble(date = unique(data$date), beta = betas)
}

window_size = 60
min_obs = 48

def roll_capm_estimation(data, window_size, min_obs):
    data = data.sort_values("date")
    result = (RollingOLS.from_formula(
        formula="ret_excess ~ mkt_excess",
        data=data, window=window_size,
        min_nobs=min_obs, missing="drop")
      .fit()
      .params.get("mkt_excess")
    )
    result.index = data.index
    return result

Estimation across the cross-section

Applying the rolling estimator to every stock is a grouped operation: nest each stock's data, map the rolling function over it, and unnest. The result is a permno–date–beta_monthly panel of time-varying betas.

beta_monthly <- crsp_monthly |>
  nest(data = c(date, ret_excess, mkt_excess)) |>
  mutate(beta = map(
    data, ~ roll_capm_estimation(., months = 60, min_obs = 48)
  )) |>
  unnest(c(beta)) |>
  select(permno, date, beta_monthly = beta) |>
  drop_na()

beta_monthly = (crsp_monthly
  .groupby("permno")
  .apply(lambda x: x.assign(
      beta=roll_capm_estimation(x, window_size, min_obs)))
  .reset_index(drop=True)
  .dropna()
  .rename(columns={"beta": "beta_monthly"})
)

Beta from daily returns

The same machinery runs on daily data, with a shorter window (three months, minimum 50 daily observations). Daily betas use far more observations per window but carry microstructure noise; comparing them with the monthly estimates shows how the frequency choice moves the answer.

crsp_daily <- tbl(tidy_finance, "crsp_daily") |>
  select(permno, date, ret_excess) |>
  collect()

factors_ff3_daily <- tbl(tidy_finance, "factors_ff3_daily") |>
  select(date, mkt_excess) |>
  collect()

beta_daily <- crsp_daily |>
  left_join(factors_ff3_daily, join_by(date)) |>
  nest(data = c(date, ret_excess, mkt_excess)) |>
  mutate(beta = map(
    data, ~ roll_capm_estimation(., months = 3, min_obs = 50)
  )) |>
  unnest(c(beta)) |>
  select(permno, date, beta_daily = beta) |>
  drop_na()

crsp_daily = pd.read_sql_query(
  sql="SELECT permno, date, ret_excess FROM crsp_daily",
  con=tidy_finance, parse_dates={"date"}
)

factors_ff3_daily = pd.read_sql_query(
  sql="SELECT date, mkt_excess FROM factors_ff3_daily",
  con=tidy_finance, parse_dates={"date"}
)

beta_daily = (crsp_daily
  .merge(factors_ff3_daily, how="left", on="date")
  .groupby("permno")
  .apply(lambda x: x.assign(
      beta=roll_capm_estimation(x, window_size=60, min_obs=50)))
  .reset_index(drop=True)
  .dropna()
  .rename(columns={"beta": "beta_daily"})
)

Study notes following the Tidy Finance curriculum by Scheuch, Voigt, Weiss, and Frey. Prose is my own; the R/Python code is reproduced from the book's open-source source, licensed CC BY-NC-SA 4.0.