Nonprofit-Open-Data-Collective/irs-efile-master-concordance-file

MCF: Truncated variable_names in "minus year" variables.

Closed this issue · 4 comments

In many cases, the variable_name is identical for variables that look at prior years. In some cases the minus one year is singular and minus two years is plural.

Here's an example:

SA_02_PZ_ARDPCTYMYEAR | Current tax year minus one year
SA_02_PZ_ARDPCTYMYEAR | Current tax year minus two years
SA_02_PZ_ARDPCTYMYEAR | Current tax year minus three years
SA_02_PZ_ARDPCTYMYEAR | Current tax year minus four years

Here's a text file of all the ones that I could find. Haven't verified these are all problem, but at first blush looks like they are.

Elsewhere these vars have the number of minus years as a suffix, maybe they got lopped off?

minus_year_issue.txt

Actually, there are more, the list above just included those with the word 'minus' in the description.

Hey @MiguelABarbosa this looks to be a systemic issue in variable name generation whereby differences further up the tree are ignored? In doing this my approach was to generate all names in a given table using the last part of the xpath, and then, if they aren't unique, use the last two parts, repeating until they are unique enough...

more_minus_years.txt

Partial fix here: 1cbb52a

closing, will handle remnants in #12