WRDS Pseudo Data — Overview & Setup

These pages generate pseudo versions of the WRDS tables, so the surrounding workflows can be run without access to WRDS. Use the R | Python toggle at the top to switch languages.

The data is deliberately labelled pseudo: it is not meaningful and contains no samples of the original data — every column is filled with random numbers. The identifiers and the industry/exchange combinations are nonsensical and must never be used together with real CRSP or Compustat data.

Packages

library(tidyverse)
library(arrow)
import numpy as np
import pandas as pd
from pathlib import Path
import string

Output directory

We store the generated tables in a local folder. Careful: if you have already downloaded the real WRDS data, this will overwrite it.

if (!dir.exists("data-r")) {
  dir.create("data-r")
}
Path("data-python").mkdir(exist_ok=True)

Seed and date vectors

We fix a seed so the random draws are reproducible, then build vectors of dates at yearly, monthly, and daily frequency over ten years. These drive the yearly, monthly, and daily panels in the next chapters.

set.seed(1234)

start_date <- as.Date("2003-01-01")
end_date <- as.Date("2022-12-31")

time_series_years <- seq(year(start_date), year(end_date), 1)
time_series_months <- seq(start_date, end_date, "1 month")
time_series_days <- seq(start_date, end_date, "1 day")
rng = np.random.default_rng(1234)

start_date = pd.Timestamp("2003-01-01")
end_date = pd.Timestamp("2022-12-31")

time_series_years = list(range(start_date.year, end_date.year + 1))
time_series_months = pd.date_range(start_date, end_date, freq="MS")
time_series_days = pd.date_range(start_date, end_date, freq="D")