iza-institute-of-labor-economics/gettsim

BUG: Replace `_tu` grouping

Closed this issue · 24 comments

Bug description

The grouping _tu should be replaced by the correct grouping (_sn, _bg, _fg, or _hh).

To get rid of the _tu groupings, we need the following new groups:

  • _spouses should group individuals with their partners based on p_id_ehepartner. A married indicator would be useful as well.
  • _eg (Einstandsgemeinschaft) should group individuals based on the already existing p_id_einstandspartner

This affects the following taxes and transfers

eink_st_y_tu

  • Grouping should be replaced by _sn
  • This is also true for all variables relevant for eink_st_y_sn calculation
  • Also, parameters change to _sn: vorsorgeaufw_y_tu, sonderausgaben_betreuung_y_tu

soli_st_y_tu

  • Same as eink_st_y_tu

abgelt_st_y_tu

  • Same as eink_st_y_tu

Kinderbonus

  • kinderbonus_m_tu should be processed by the aggregate by p_id functions, not via a grouping

Lohnsteuer

  • anz_kinder_mit_kindergeld_tu changes to number of Kinderfreibetrag claims on the individual level

Freibeträge

  • In general, groupings change to _sn
  • alleinerz_tu should change to alleinerz_sn.
  • alleinerz_freib_y_tu after 2015 should use the amount of Kindergeld claims as an input, not anz_kinder_tu
  • eink_st_sonderausgaben_y_tu uses anz_erwachsene_tu as an input. Should cange to anz_erwachsene_sn: Couples who file taxes together (same sn) get two times the transfer of singles.

Vorsorgeaufwand

  • Same as eink_st_y_tu
  • gemeinsam_veranlagt_tu can go because anz_erwachsene_sn should yield the same result

zu_verst_eink

  • Same as eink_st_y_tu

Arbeitslosengeld

  • Switch anz_kinder_tu to eligibility for Kinderfreibetrag (§32 EStG)

Elterngeld

  • Net income calculation should be based on eink_st_y_sn, not eink_st_y_tu

Erziehungsgeld

  • anz_erwachsene_tu to anz_erwachsene_fg (See issue #670)
  • anz_kinder_mit_kindergeld_tu to anz_kinder_mit_kindergeld_fg

Grundrente

  • gemeinsam_veranlagt_tu can go; should instead be an indicator for being married (e.g married)
  • _grundr_zuschlag_eink_vor_freibetrag_m_tu should not be aggregated on _tu level but on _ehe

Grundsicherung im Alter

  • grunds_im_alter_eink_m and its inputs should be on _sn level but when used as an input on _eg level
  • In general, the correct grouping is _eg, not _hh
  • Correct grouping of grunds_im_alter_vermög_freib_hh and its inputs is _fg. When used as an input, use _fg as well.

Kindergeld

  • Remove aggregation dict at top of file.
  • Kindergeld should be created via the grouping by p_id function

Unterhaltsvorschuss

  • Switch to alleinerz_sn.
  • The parental income check in unterhaltsvors_m should be on the single parent level only (no aggregation). Use aggregation by p_id mechanic here.
  • Nearly all _tu are replaced with no aggregation at all because only income of the single parent is considered

Wohngeld

  • Correct grouping in general: _hh
  • wohngeld_abzüge_st_sozialv_m and its inputs should be on the _sn level, but aggregated to the _hh level when its used as an input.
  • Whenever anz_kinder_bis_10_tu is used, it should be number of Kindergeld claims on the individual level.
  • Correctly specify the Wohngeld Vorrang check: It is possible that for the same household some individuals receive Wohngeld and some receive ALG2.

ALG II

  • _arbeitsl_geld_2_alleinerz_mehrbedarf_m_bg, arbeitsl_geld_2_kindersatz_m_bg_bis_2010, arbeitsl_geld_2_kindersatz_m_bg_ab_2011, arbeitsl_geld_2_regelsatz_m_bg_bis_2010, arbeitsl_geld_2_regelsatz_m_bg_ab_2011, arbeitsl_geld_2_vor_vorrang_m_bg inputs should be on the _bg level as well

Kinderzuschlag

  • All _tu are replaced with _bg
  • Same applies for Kosten Unterkunft (not 100% sure, need to check again)
  • Occurrences of _hh have already been replaced with _bg in #662

Hopefully this is all correct, but let me know if something seems off @hmgaudecker

I'll update the list if something comes up.

kinderbonus_m_tu changes to kinderbonus_m_sn

I think that should just follow kindergeld

The PR for this likely closes #683, #670
Potentially also #270

Related to #606

anz_kinder_mit_kindergeld_tu changes to number of Kindergeld claims on _sn level

Does Kindergeld or Kinderfreibetrag matter here or both, @JakobWegmann? In any case, it should be the versions at individual level, if we have them.

Elterngeld

  • Net income calculation should be based on lohnst_m, not eink_st_y_tu
  • Note: Even then the calculation is slightly off compared with the simplified rules described here. Might be something for a different PR.

Yeah. Big one. I still believe this should go altogether and become an input we require from prior calculations. Typically $t-1$. In any case, don't worry much about it in this PR.

Grundsicherung im Alter
Income checks on the _tu level should be replaced with _sn. But this is just an approximation because i) there might be couples that do not file taxes together but are considered as a couple for Grundsicherung, ii) income of the partner is not considered if the partner cannot satisfy her own needs.

I wonder whether we should support another grouping, like "married" or "einstandspartner" ?

Since 2020, income of children is not considered, even if it is higher than 100.000 € (S. 5 §43 SGB XII).

Yeah, I think we can ignore that until somebody actually needs it. But maybe add a note in the function?

I wonder whether we should support another grouping, like "married" or "einstandspartner" ?

I agree, especially as we have p_id_einstandspartner as an input variable already. Just learned, that this would be the correct grouping for Grundrente as well (doesn't matter if taxes are filed jointly).

alleinerz_tu should change to alleinerz_sn and can be determined endogenously.

Leave as is for now (AFAICT, alleinerz is the input variable), make endogenous in a different PR.

anz_kinder_mit_kindergeld_tu changes to number of Kindergeld claims on _sn level

Does Kindergeld or Kinderfreibetrag matter here or both, @JakobWegmann? In any case, it should be the versions at individual level, if we have them.

I'm 98% sure that only the Kinderfreibetrag matters. I think the law is very clear.

Elterngeld

  • Net income calculation should be based on lohnst_m, not eink_st_y_tu
  • Note: Even then the calculation is slightly off compared with the simplified rules described here. Might be something for a different PR.

Yeah. Big one. I still believe this should go altogether and become an input we require from prior calculations. Typically t−1. In any case, don't worry much about it in this PR.

The parental leave calculation in my Stata code matches the calculation, so I think I can at least fix the calculation in GETTSIM as soon as all these changes are implemented.

Elterngeld

  • Net income calculation should be based on lohnst_m, not eink_st_y_tu
  • Note: Even then the calculation is slightly off compared with the simplified rules described here. Might be something for a different PR.

Yeah. Big one. I still believe this should go altogether and become an input we require from prior calculations. Typically t−1. In any case, don't worry much about it in this PR.

The parental leave calculation in my Stata code matches the calculation, so I think I can at least fix the calculation in GETTSIM as soon as all these changes are implemented.

Yes, but given our annual structure, I suppose the better approximation is not to use concurrent income, right?

Yes, I agree it would be more intuitive and less error prone to have an additional input.

It should be an output, though! But not in this PR, @MImmesberger, can you open an issue for that, please?

It should be an output, though!

Just to make sure I understood that correctly: it should be an output for convenience (when calculating the inputs for t in t-1), i.e. it would not be used as an input for another function?

Not in the concurrent year, no. But in the subsequent year.

Say I have panel data for 2023 and 2024 and I want to calculate Elterngeld for kids born in 2024. This would proceed as follows:

  1. Run GETTSIM on 2023 data, output elterngeld_eink_relev_current_m .
  2. Call this variable elterngeld_eink_relev_lag_m and run GETTSIM on 2024 data using it as an input.

We'll need to have a suitable distinction of the names, this suggestion is bogus, of course.

Great! This looks like a big step forward!

I wonder whether we should support another grouping, like "married" or "einstandspartner" ?

I agree, especially as we have p_id_einstandspartner as an input variable already. Just learned, that this would be the correct grouping for Grundrente as well (doesn't matter if taxes are filed jointly).

AFAIK, unmarried partners are considered for Grundsicherung im Alter, but not Grundrente. So I guess we needed three groupings (which should be all available from the input data):

  • sn; filing tax together -> taxes
  • married partners -> grundrente
  • einstandspartner/einstandsgemeinschaft; partners living together -> Grundsicherung im Alter

(I updated the original post following our discussion. Also, I added the two new groupings that are needed)

For ALG2 (_bg grouping), the Wohngeld priority check should still be on the _hh level for Wohngeld, correct? I'm referring to this function:

def wohngeld_vorrang_hh(
    wohngeld_nach_vermög_check_m_hh: float,
    arbeitsl_geld_2_vor_vorrang_m_bg: float,
) -> bool:
    """Check if housing benefit has priority.

    Parameters
    ----------
    wohngeld_nach_vermög_check_m_hh
        See :func:`wohngeld_nach_vermög_check_m_hh`.
    arbeitsl_geld_2_vor_vorrang_m_bg
        See :func:`arbeitsl_geld_2_vor_vorrang_m_bg`.

    Returns
    -------

    """
    return wohngeld_nach_vermög_check_m_hh >= arbeitsl_geld_2_vor_vorrang_m_bg

For ALG2 (_bg grouping), the Wohngeld priority check should still be on the _hh level for Wohngeld, correct?

Not quite. It is possible that some individuals in a household receive Wohngeld and others receive Bürgergeld.

To my understanding, Wohngeld is calculated at the household level but can be broken down to individual values. For the priority check, these should be aggregated at the _bg level and compared there. Does that seem correct, @mjbloemer @michaelhebsaker ? Would you have a reference of how to distribute Wohngeld across household members?

A married indicator would be useful as well.

Should this be called married or verheirat? I'd wager this should go into demographic_vars.py, right?

A married indicator would be useful as well.

Should this be called married or verheirat? I'd wager this should go into demographic_vars.py, right?

I think I'd use ehe_id. Short enough and very clear.

So, ehe_id instead of spouse_id and no additional boolean married variable?

Ah, sorry, I had not read through @MImmesberger's updates to the main issue. Yes, I think ehe is clearer, also since we are using German identifiers for the other groupings.

Instead of an extra indicator we can check for p_id_ehepartner >= 0, right? Should be easy enough, rather not add an extra variable.

Instead of an extra indicator we can check for p_id_ehepartner >= 0, right? Should be easy enough, rather not add an extra variable.

Yes, that was the implementation of married anyway. I'll remove it again.