OpenSourceAP/CrossSection

CoskewACX has some gaps

aneuhierl opened this issue · 4 comments

CoskewACX has some periods with zero valid observations. Below is some code to reproduce the issue.

# Andrew 2020 04

# creates what you need to do machine learning style stuff
# downloads downloadable wide signals
# downloads crsp predictors
# merges together and outputs

# takes about 5 minutes

# creates:
#   temp/signed_predictors_all_wide.csv
#   temp/SignalDocumentation.xlsx



# ==== ENVIRONMENT ====
rm(list = ls())

library(tidyverse)
library(googledrive)
library(data.table)
library(fst)
library(getPass)
library(RPostgres)

# root of April 2021 release on Gdrive
# pathRelease = 'https://drive.google.com/drive/folders/1I6nMmo8k_zGCcp9tUvmMedKTAkb9734R'

# root of March 2022 release on Gdrive
pathRelease = 'https://drive.google.com/drive/u/0/folders/1O18scg9iBTiBaDiQFhoGxdn4FdsbMqGo'


# login to gdrive
# this prompts a login
pathRelease %>% drive_ls()

dir.create('temp/')

# download
target_dribble = pathRelease %>% drive_ls() %>% 
  filter(name == 'Firm Level Characteristics') %>% drive_ls() %>% 
  filter(name == 'Full Sets') %>% drive_ls() %>% 
  filter(name == 'signed_predictors_dl_wide.zip') 
dl = drive_download(target_dribble, path = 'temp/deleteme.zip', overwrite = T)

# unzip, read, clean up
unzip('temp/deleteme.zip', exdir = 'temp')
wide_dl_raw = fread('temp/signed_predictors_dl_wide.csv')
file.remove('temp/signed_predictors_dl_wide.csv')

# check
rel_chars <- "CoskewACX"
count_dt <- wide_dl_raw[, lapply(.SD,function(x) sum(!is.na(x))), .SDcols=rel_chars,  by=list(yyyymm)]

# which months have no obs
count_dt[CoskewACX==0 & yyyymm>=196301 & yyyymm<202201,][order(yyyymm)]

Thanks @aneuhierl for the reproducible code! Here is the output for ease of reference

image

(CoskewACX is the # of observations in the month)

And here is a bit more detail of the time-series:
image

image

image

These are strange patterns. It might be due to line 67 in CoskewACX.do:

* exclude of more than five missing obs (just above eq B-7)	
drop if nobs <= 252-5  

We'll look into it but it might take some time since this code does rolling estimates using daily data.

There also are gaps, in-sample, for the LS portfolios for ChNAalyst, CoskewACX, iomom_supp, and DivYieldST

image

DivYieldST is just missing Jan 1939. In that month it has ports 01 and 03, but no port 04, so no LS.

For CoskewACX, drop if nobs <= 252-5 seems to be the problem. 1968 June is one of the bad months, and the 12 months the end of June 1968 had only 246 trading days.

CoskewACX is fixed here: f5a6183. I made new issues for the other gaps.