CoskewACX has some gaps
aneuhierl opened this issue · 4 comments
aneuhierl commented
CoskewACX has some periods with zero valid observations. Below is some code to reproduce the issue.
# Andrew 2020 04
# creates what you need to do machine learning style stuff
# downloads downloadable wide signals
# downloads crsp predictors
# merges together and outputs
# takes about 5 minutes
# creates:
# temp/signed_predictors_all_wide.csv
# temp/SignalDocumentation.xlsx
# ==== ENVIRONMENT ====
rm(list = ls())
library(tidyverse)
library(googledrive)
library(data.table)
library(fst)
library(getPass)
library(RPostgres)
# root of April 2021 release on Gdrive
# pathRelease = 'https://drive.google.com/drive/folders/1I6nMmo8k_zGCcp9tUvmMedKTAkb9734R'
# root of March 2022 release on Gdrive
pathRelease = 'https://drive.google.com/drive/u/0/folders/1O18scg9iBTiBaDiQFhoGxdn4FdsbMqGo'
# login to gdrive
# this prompts a login
pathRelease %>% drive_ls()
dir.create('temp/')
# download
target_dribble = pathRelease %>% drive_ls() %>%
filter(name == 'Firm Level Characteristics') %>% drive_ls() %>%
filter(name == 'Full Sets') %>% drive_ls() %>%
filter(name == 'signed_predictors_dl_wide.zip')
dl = drive_download(target_dribble, path = 'temp/deleteme.zip', overwrite = T)
# unzip, read, clean up
unzip('temp/deleteme.zip', exdir = 'temp')
wide_dl_raw = fread('temp/signed_predictors_dl_wide.csv')
file.remove('temp/signed_predictors_dl_wide.csv')
# check
rel_chars <- "CoskewACX"
count_dt <- wide_dl_raw[, lapply(.SD,function(x) sum(!is.na(x))), .SDcols=rel_chars, by=list(yyyymm)]
# which months have no obs
count_dt[CoskewACX==0 & yyyymm>=196301 & yyyymm<202201,][order(yyyymm)]
chenandrewy commented
Thanks @aneuhierl for the reproducible code! Here is the output for ease of reference
(CoskewACX is the # of observations in the month)
And here is a bit more detail of the time-series:
These are strange patterns. It might be due to line 67 in CoskewACX.do:
* exclude of more than five missing obs (just above eq B-7)
drop if nobs <= 252-5
We'll look into it but it might take some time since this code does rolling estimates using daily data.
chenandrewy commented
chenandrewy commented
For CoskewACX, drop if nobs <= 252-5
seems to be the problem. 1968 June is one of the bad months, and the 12 months the end of June 1968 had only 246 trading days.
chenandrewy commented
CoskewACX is fixed here: f5a6183. I made new issues for the other gaps.