Accessing and Managing Financial Data
This chapter sets up the data backbone: download the open-source datasets the book reuses, then store them in a local SQLite database so later chapters load fast and reproducibly. The downloads assume a start_date and end_date have been defined. Use the R | Python toggle to switch.
library(tidyverse)
library(tidyfinance)
import pandas as pd
import numpy as np
import tidyfinance as tf
Fama–French factors
The Fama–French factors (market, size, value, and the risk-free rate) are a single named download. These are the benchmark factors used throughout the asset-pricing chapters.
factors_ff3_monthly <- download_data(
type = "factors_ff_3_monthly",
start_date = start_date,
end_date = end_date
)
factors_ff3_monthly
factors_ff3_monthly = tf.download_data(
domain="factors_ff_3_monthly",
start_date=start_date,
end_date=end_date
)
factors_ff3_monthly
q-factors and macro predictors
The Hou–Xue–Zhang q-factors are an alternative factor set, downloaded the same way.
factors_q_monthly <- download_data(
type = "factors_q5_monthly",
start_date = start_date,
end_date = end_date
)
factors_q_monthly
factors_q_monthly = tf.download_data(
domain="factors_q5_monthly",
start_date=start_date,
end_date=end_date
)
factors_q_monthly
Goyal–Welch macro predictors (for return-predictability work) come from the same interface.
macro_predictors <- download_data(
type = "macro_predictors_monthly",
start_date = start_date,
end_date = end_date
)
macro_predictors = tf.download_data(
domain="macro_predictors_monthly",
start_date=start_date,
end_date=end_date
)
Other macroeconomic data (FRED)
Arbitrary FRED series — here the consumer price index — are available by passing the series code, which is handy for deflating nominal quantities.
cpi_monthly <- download_data(
type = "fred",
series = "CPIAUCNS",
start_date = start_date,
end_date = end_date
)
cpi_monthly = tf.download_data(
domain="fred",
series="CPIAUCNS",
start_date=start_date,
end_date=end_date
)
Setting up a database
Rather than re-downloading each session, we persist everything in a local SQLite file — the single source of truth for later chapters.
library(RSQLite)
library(dbplyr)
tidy_finance <- dbConnect(
SQLite(),
"data/tidy_finance_r.sqlite",
extended_types = TRUE
)
import sqlite3
tidy_finance = sqlite3.connect(database="data/tidy_finance_python.sqlite")
Writing and reading tables
Writing a table persists it (overwriting any prior version); reading it back pulls it into memory. This write-once, read-many pattern is how every subsequent chapter gets its inputs.
factors_ff3_monthly |>
dbWriteTable(
conn = tidy_finance,
name = "factors_ff3_monthly",
value = _,
overwrite = TRUE
)
tidy_finance |>
dbReadTable("factors_ff3_monthly")
(factors_ff3_monthly
.to_sql(name="factors_ff3_monthly",
con=tidy_finance,
if_exists="replace",
index=False
)
)
pd.read_sql_query(
sql="SELECT * FROM factors_ff3_monthly",
con=tidy_finance,
parse_dates={"date"}
)
Study notes following the Tidy Finance curriculum by Scheuch, Voigt, Weiss, and Frey. Prose is my own; the R/Python code is reproduced from the book's open-source source, licensed CC BY-NC-SA 4.0.