Nonprofit-Open-Data-Collective/irs-efile-master-concordance-file

f990-part-05 misses two new variables

Closed this issue · 7 comments

f990-part-05.csv doesn't have the two new variables added in 2018 (questions 15 and 16 in the Form-990 Part-V)

15 Is the organization subject to the section 4960 tax on payment(s) of more than $1,000,000 in remuneration or excess parachute payment(s) during the year?
16 Is the organization an educational institution subject to the section 4968 excise tax on net investment income?

I need it for my research, maybe someone can help me?

lecy commented

The IRS stopped releasing XML schemas publicly, unfortunately. It has taken some time to get access to the most recent XSD document definition files. There will be some updates to the concordance this fall as soon as I have the bandwidth.

In the meantime, if you need a couple of variables you can update the concordance pretty easily. Grab a couple of docs that have the information you need and create a list of the xpaths:

library( dplyr )        # data wrangling
library( xmltools )     # xml utilities
library( xml2 )         # xml utilities
library( XML )          # xml utilities 
library( knitr )        # formatting 


url <- "https://nccs-efile.s3.us-east-1.amazonaws.com/xml/201300879349300235_public.xml"
doc <- xml2::read_xml( file(url) )
xml2::xml_ns_strip( doc )

doc %>% xmltools::xml_get_paths()

xx <- 
  doc %>% 
  xml_find_all("//*") %>% 
  xml_path()

Grab the relevant xpaths and add new lines the concordance file:

library( irs990efile )
source( "https://raw.githubusercontent.com/Nonprofit-Open-Data-Collective/irs990efile/main/data-raw/code-chunks/rdb-keys.R" )
url <- "https://raw.githubusercontent.com/Nonprofit-Open-Data-Collective/irs-efile-master-concordance-file/master/concordance.csv"
concordance <- read.csv( url )

# CREATE NEW CODE CHUNK 
create_code_chunks(  rdb.table="F9-P01-T00-SUMMARY", show=TRUE )

That will generate a new "chunk" script (parse all variables for a specific table).

Is your desired table this one? F9-P05-T00-OTHER-IRS-FILING


Or alternatively, you can add the variable directly to the existing chunk:

https://github.com/Nonprofit-Open-Data-Collective/irs990efile/blob/main/R/CHUNKS-F9-P05-T00-OTHER-IRS-FILING.R

Test it out on a couple of cases to make sure it's parsing the variable correctly. I would grab a few forms from different years to make sure you are getting all of the variants of the xpaths.

If you can grab those I'll add them to the package so you can run everything in parallel.

Thanks so much Jesse.
F9-P05-T00-OTHER-IRS-FILING is indeed my desired table. Specifically, these two variables only exist as of 2018.
I have found the xpaths related to the two items:

//Return/ReturnData/IRS990/SubjToTaxRmnrtnExPrchtPymtInd
//Return/ReturnData/IRS990/SubjectToExcsTaxNetInvstIncInd

and made new variable info for them in the format of the chunks file:

## VARIABLE NAME:  F9_05_SUBJ_TO_4960_TAX_X
## DESCRIPTION: Organization subject to the section 4960 tax on payment(s) of more than $1,000,000?
## LOCATION:  F990-PART-05-LINE-15
## TABLE:  F9-P05-T00-OTHER-IRS-FILING
## VARIABLE TYPE:  checkbox
## PRODUCTION RULE:  NA

V1 <- '//Return/ReturnData/IRS990/SubjToTaxRmnrtnExPrchtPymtInd'
V_SUBJ_TO_4960_TAX_X <- paste( V1, sep='|' )
F9_05_SUBJ_TO_4960_TAX_X <- xml2::xml_text( xml2::xml_find_all( doc, V_SUBJ_TO_4960_TAX_X ) )
if( length( F9_05_SUBJ_TO_4960_TAX_X ) > 1 )
{ 
  create_record( varname=F9_05_SUBJ_TO_4960_TAX_X, ein=ORG_EIN, year=TAX_YEAR, url=URL )
  F9_05_SUBJ_TO_4960_TAX_X <-  paste0( '{', F9_05_SUBJ_TO_4960_TAX_X, '}', collapse=';' ) 
} 




## VARIABLE NAME:  F9_05_SUBJ_TO_4968_TAX_X
## DESCRIPTION: Educational institution subject to the section 4968 tax on net investment income?
## LOCATION:  F990-PART-05-LINE-16
## TABLE:  F9-P05-T00-OTHER-IRS-FILING
## VARIABLE TYPE:  checkbox
## PRODUCTION RULE:  NA

V1 <- '//Return/ReturnData/IRS990/SubjectToExcsTaxNetInvstIncInd'
V_SUBJ_TO_4968_TAX_X <- paste( V1, sep='|' )
F9_05_SUBJ_TO_4968_TAX_X <- xml2::xml_text( xml2::xml_find_all( doc, V_SUBJ_TO_4968_TAX_X ) )
if( length( F9_05_SUBJ_TO_4968_TAX_X ) > 1 )
{ 
  create_record( varname=F9_05_SUBJ_TO_4968_TAX_X, ein=ORG_EIN, year=TAX_YEAR, url=URL )
  F9_05_SUBJ_TO_4968_TAX_X <-  paste0( '{', F9_05_SUBJ_TO_4968_TAX_X, '}', collapse=';' ) 
} 

Is this what you need to add it to the package, or do you need more information?

lecy commented

Perfect

lecy commented

The chonk script is updated:

https://github.com/Nonprofit-Open-Data-Collective/irs990efile/blob/main/R/CHUNKS-F9-P05-T00-OTHER-IRS-FILING.R

You should be able to reinstall the package and build that table.

@lecy I believe the 2004-2020 schema files are available, more links from this internet archive page

lecy commented

Thanks @jsfenfen !

I've got them, just haven't had a chance to convert to the xpath list yet. You've got a python routine that generates all valid xpaths, correct?

Umm, haven't done that step either yet...