donboyd5/synpuf

Should we tweak variables to capture structural relationships between them?

Opened this issue · 2 comments

We currently avoid synthesizing some invalid relationships like between wages and EITC by calculating variables like EITC via Tax-Calculator rather than synthesis.

We also tweak some variables to work better in synthesis, by modeling e00600 - e00650 and e01500 - e01700 rather than e00600 and e01500, respectively. This ensures that e00600>e00650 and e01500>e01700 as required by Tax-Calculator (see #17).

This issue is to explore whether we should engineer other features to better capture relationships between synthesis, like the latter example. It is motivated by a recent call with Benedetto and Stinson from Census, where they recommended thinking through important structural relationships.

Interesting, we are indeed synthesizing f6251 (Form 6251, Alternative Minimum Tax) and fded (Form of Deduction Code, itemized/standard/neither), both required by Tax-Calculator (spreadsheet). Does Tax-Calculator need these though, or could they be determined by whatever minimizes tax burden? Seems like this would be a valuable Tax-Calculator feature regardless of our project. @andersonfrailey