Nonprofit-Open-Data-Collective/irs-efile-master-concordance-file

Is the returnheader 'filer' equivalent to the 990 'filer' ?

Closed this issue · 14 comments

I noticed that the variable F9_00_HD_FILERFOR1 represents all of these xpaths:

/Return/ReturnHeader/Filer/ForeignAddress/AddressLine1 /Return/ReturnHeader/Filer/ForeignAddress/AddressLine1Txt /Return/ReturnData/IRS990/ForeignAddress/AddressLine1Txt /Return/ReturnData/IRS990/ForeignAddress/AddressLine1

In other words, it's the filer that appears in the return header and it's the filer set in IRS990? I would have expected those to be two different variables--how would this work if these two xpath values were different in the same filing? Not sure that ever happens, or could? But it also raises the issue of having a "non-repeating" variable effectively repeat (which is more an implementation issue, I guess?) The same thing seems to happen for all of the following vars, actually

F9_00_HD_FILERFOR1 F9_00_HD_FILERFOR2 F9_00_HD_FILERFORCITY F9_00_HD_FILERFORCTRY F9_00_HD_FILERFORPOST F9_00_HD_FILERFORSTATE

If this is validated by IRS and they say so, I guess I could see making these vars the same--do they publish that?

lecy commented

It might be the same variable, but in different locations across form versions (990 vs 990-EZ vs 990-PF)?

If that's the case, then the issue you raise above would not be a problem.

It is also possible that these were incorrectly mapped, that the ReturnData version might represent an address for an accountant, contractor, or other entity?

We should find an instance that comes from the ReturnData version, and check the e-viewer rendering to see if that data originates from where we expect.

Sure, I found an instance where F9_00_HD_FILERUSCITY was repeated twice, once as /Return/ReturnData/IRS990/USAddress/CityNm and once as /Return/ReturnHeader/Filer/USAddress/CityNm (the value was the same, but still). In some ways the e-viewer isn't super helpful for this because it doesn't explicitly show the header vars that aren't rendered. My sense is it's better to have two different concordance variable names for these circumstances, is that right? Unless we have some guarantee they'll always be the same?

Hi folks,

My hunch is that this has to do with a version change, not a form difference. After I launch opendata.love, you'll be able to download every instance of each of those Xpaths easily, along with information about which schema versions they appeared in. By grouping by Xpath, version, and form type and doing a count, you can quickly get to the bottom of the difference. Stay tuned--the data and metadata files should be mighty handy for getting through issues like this.

Best,
David

lecy commented

"My sense is it's better to have two different concordance variable names for these circumstances, is that right? Unless we have some guarantee they'll always be the same?"

I agree here because (1) if the data could be entered on two different fields in theory it could be two values, and (2) even if they are the same value it would result in cardinality > 1 for that variable so scripts that assume unique relationships could run into problems.

This seems like an extreme edge case which results from really poor design of the form.

What about naming the variable something like VAR_NAME_V1 and VAR_NAME_v2 to signal they are the same construct, but the value might be located in different places and be repeated?

lecy commented

Are we sure these are the same constructs, though?

Sure, I found an instance where F9_00_HD_FILERUSCITY was repeated twice, once as /Return/ReturnData/IRS990/USAddress/CityNm and once as /Return/ReturnHeader/Filer/USAddress/CityNm

It could be asking for two different addresses (one for the nonprofit, one for the executive director, for example)? In some cases these would be the same?

What about naming the variable something like VAR_NAME_V1 and VAR_NAME_v2 to signal they are the same construct, but the value might be located in different places and be repeated?

@lecy So I think they should get renamed with similar sounding names (because they are very similar variables) but I wouldn't want to "overload" the variable names by having a formal _v1, _v2 convention.

To the extent that this is something that should be recorded, I'd want to describe it in a secondary file, although I'm not quite sure if this is that special. What I mean is that there are probably other variables that have the same relationship as these (i.e. they are recording the same thing in different forms) that already have different names, but I dunno that they need to be memorialized at this point. Also, now that I think about it, the thought of trying to correct already-existing variables covering the same things to have _vx notations on them sounds exhausting.

My hunch is that this has to do with a version change, not a form difference

@borenstein I looked into that, and that's not the issue here. See 201622239349302037 as an example, though there are many. There's a filer city given in the return header, distinct from the filer city given in the 'main body' of the irs 990 form. It seems pretty clear these refer to the same thing, but they appear separately on the xml. If I wrote accounting software, I'd definitely only make the user enter this once and use that value in both locations. But that's different from saying that all filings in the past and the future will have the same values in both locations in the xml released by the IRS, right?

lecy commented

Jacob - In response to, "There's a filer city given in the return header, distinct from the filer city given in the 'main body' of the irs 990 form. It seems pretty clear these refer to the same thing, but they appear separately on the xml."

I am looking at the 990 form now - where is the filer city field in the main body?

image

@lecy: it's actually line F that we're seeing this on, so the US City part of the address of the principal officer (there's a field there for address in line F in the xml that you can't really see in the form). In my description file /IRS990/USAddress/CityNm is: [USAddress] Address of principal officer - US; [CityNm] City.
So that's one value of F9_00_HD_FILERUSCITY.

The other xpath value of F9_00_HD_FILERUSCITY for this version (post 2014) is
/Return/ReturnHeader/Filer/USAddress/CityNm
The value that it matches isn't shown (uh, I think, anyways) because the return header values aren't directly rendered. I think. But see the attached file.

In any case, going through this process has made me realize this shouldn't be the same variable, in part because one instance of F9_00_HD_FILERUSCITY is the address for the principal officer of the org, which even on the form is presented as distinct from the address of the business, and the other is the value given in the returnheader for the org.

201622239349302037_public.txt

lecy commented

@jsfenfen Ok, great. Thanks for double-checking.

The good news is that we know how to deal with this problem - the single variable needs to be split into two variables, filer_city and principal_officer_city.

It also is not completely irrational that the principal officer's address is in the xpath as ReturnData instead of ReturnHeader because it is not reported on the 990-EZ or 990-PF forms (it would appear separately under the list of officers on each form, but there is a separate variable to record that instance).

/Return/ReturnData/IRS990/USAddress/CityNm
/Return/ReturnHeader/Filer/USAddress/CityNm

@lecy in looking at this, I think the underlying error is the location code: "HEADER-OR-SIGNATURE-BLOCK". My argument would be that this case shows why HEADER should be a different location than SIGNATURE-BLOCK. The problem here is with xpaths with the same variable_name, one from the header, the other from the signature block.

I'm using 'header' to refer to stuff that actually appears in /return/returnheader/ as distinct from /return/returndata/IRS990/ etc. My perspective on what the header is, is based totally on how the xml is structured, not any notion of what 'looks' like header information on the form.

Also, I'm lazy, so from my perspective it's easier to change the variable name by disambiguating the location code portion of it. I guess I'd propose using HEADER only for variables that occur in /return/returnheader and SIGNATURE-BLOCK for those that are in /return/returndata/. And so I'd use SG instead of HD for the location code for the signature-block variables ? I'm trying to think why that wouldn't work ?

Edit: Here's a diff of an edit I did to make this distinction in a different branch. Note that I messed it up slightly and included a fix for an unrelated var in there: F9_10_PC_LOANFROMOFFICEOY.

lecy commented

@jsfenfen "I'd use SG instead of HD for the location code for the signature-block variables"

I was using that group to represent variables that are consistent across all three form versions (PC, EZ, and PF). But I have no problem splitting HD into HD and SG if you think that is useful. All of the SG variables are identical across forms, so it's a clean category.

I think a harder question is whether we change the prefix of the new variable F9_00_HD_OFFICERUSCITY (whatever the actual name is) to something else?

What would that be? The principal officer address only occurs on the 990-PC form, and does not appear in the signature box. So it doesn't really fit either category HD or SG as you define them here. I think keeping it as HD makes the most sense to me, unless the xpath is identical to another field on the body of the form.

@lecy I tried it with the location code unchanged, but the scope set to SG for xpaths outside of /return/returnheader/. Does that work? Is there a better approach? Separating this out in this manner saves like 13 variables, but I'm open to any rule-based approach we could take here.

WRT the changed var names, I changed the HD in them to SG as well, so
F9_00_HD_OFFICERUSCITY would be F9_00_SG_OFFICERUSCITY
where that occurs. Again, see the diff.

Agree this isn't perfect, in that some of the vars are somewhat misleadingly named, but at least aren't duplicated. In general I'd add that stuff that is in HD is in the header, and thus, submitted by everyone (not sure how well all the fields are populated).

lecy commented

@jsfenfen I think it sounds good! No good solution for the weird variables like principal officer address, and it makes sense to me to group that with signature variables. The SG designation also works. Thanks!

Great, I think that answers the question.