Accessing and Managing Financial Data

Jun He May 31, 2026

This chapter sets up the data backbone: download the open-source datasets the book reuses, then store them in a local SQLite database so later chapters load fast and reproducibly. The downloads assume a start_date and end_date have been defined. Use the R | Python toggle to switch.

library(tidyverse)
library(tidyfinance)

import pandas as pd
import numpy as np
import tidyfinance as tf

Fama–French factors

The Fama–French factors (market, size, value, and the risk-free rate) are a single named download. These are the benchmark factors used throughout the asset-pricing chapters.

factors_ff3_monthly <- download_data(
  type = "factors_ff_3_monthly",
  start_date = start_date,
  end_date = end_date
)
factors_ff3_monthly

factors_ff3_monthly = tf.download_data(
  domain="factors_ff_3_monthly",
  start_date=start_date,
  end_date=end_date
)
factors_ff3_monthly

q-factors and macro predictors

The Hou–Xue–Zhang q-factors are an alternative factor set, downloaded the same way.

factors_q_monthly <- download_data(
  type = "factors_q5_monthly",
  start_date = start_date,
  end_date = end_date
)
factors_q_monthly

factors_q_monthly = tf.download_data(
  domain="factors_q5_monthly",
  start_date=start_date,
  end_date=end_date
)
factors_q_monthly

Goyal–Welch macro predictors (for return-predictability work) come from the same interface.

macro_predictors <- download_data(
  type = "macro_predictors_monthly",
  start_date = start_date,
  end_date = end_date
)

macro_predictors = tf.download_data(
  domain="macro_predictors_monthly",
  start_date=start_date,
  end_date=end_date
)

Other macroeconomic data (FRED)

Arbitrary FRED series — here the consumer price index — are available by passing the series code, which is handy for deflating nominal quantities.

cpi_monthly <- download_data(
  type = "fred",
  series = "CPIAUCNS",
  start_date = start_date,
  end_date = end_date
)

cpi_monthly = tf.download_data(
  domain="fred",
  series="CPIAUCNS",
  start_date=start_date,
  end_date=end_date
)

Setting up a database

Rather than re-downloading each session, we persist everything in a local SQLite file — the single source of truth for later chapters.

library(RSQLite)
library(dbplyr)

tidy_finance <- dbConnect(
  SQLite(),
  "data/tidy_finance_r.sqlite",
  extended_types = TRUE
)

import sqlite3

tidy_finance = sqlite3.connect(database="data/tidy_finance_python.sqlite")

Writing and reading tables

Writing a table persists it (overwriting any prior version); reading it back pulls it into memory. This write-once, read-many pattern is how every subsequent chapter gets its inputs.

factors_ff3_monthly |>
  dbWriteTable(
    conn = tidy_finance,
    name = "factors_ff3_monthly",
    value = _,
    overwrite = TRUE
  )

tidy_finance |>
  dbReadTable("factors_ff3_monthly")

(factors_ff3_monthly
  .to_sql(name="factors_ff3_monthly",
    con=tidy_finance,
    if_exists="replace",
    index=False
  )
)

pd.read_sql_query(
  sql="SELECT * FROM factors_ff3_monthly",
  con=tidy_finance,
  parse_dates={"date"}
)

Study notes following the Tidy Finance curriculum by Scheuch, Voigt, Weiss, and Frey. Prose is my own; the R/Python code is reproduced from the book's open-source source, licensed CC BY-NC-SA 4.0.