BUG: Replace `_tu` grouping
Closed this issue · 24 comments
Bug description
The grouping _tu
should be replaced by the correct grouping (_sn
, _bg
, _fg
, or _hh
).
To get rid of the _tu
groupings, we need the following new groups:
-
_spouses
should group individuals with their partners based onp_id_ehepartner
. Amarried
indicator would be useful as well. -
_eg
(Einstandsgemeinschaft) should group individuals based on the already existingp_id_einstandspartner
This affects the following taxes and transfers
eink_st_y_tu
- Grouping should be replaced by
_sn
- This is also true for all variables relevant for
eink_st_y_sn
calculation - Also, parameters change to
_sn
:vorsorgeaufw_y_tu
,sonderausgaben_betreuung_y_tu
soli_st_y_tu
- Same as eink_st_y_tu
abgelt_st_y_tu
- Same as eink_st_y_tu
Kinderbonus
-
kinderbonus_m_tu
should be processed by the aggregate byp_id
functions, not via a grouping
Lohnsteuer
-
anz_kinder_mit_kindergeld_tu
changes to number of Kinderfreibetrag claims on the individual level
Freibeträge
- In general, groupings change to
_sn
-
alleinerz_tu
should change toalleinerz_sn
. -
alleinerz_freib_y_tu
after 2015 should use the amount of Kindergeld claims as an input, notanz_kinder_tu
-
eink_st_sonderausgaben_y_tu
usesanz_erwachsene_tu
as an input. Should cange toanz_erwachsene_sn
: Couples who file taxes together (samesn
) get two times the transfer of singles.
Vorsorgeaufwand
- Same as eink_st_y_tu
-
gemeinsam_veranlagt_tu
can go becauseanz_erwachsene_sn
should yield the same result
zu_verst_eink
- Same as eink_st_y_tu
Arbeitslosengeld
- Switch
anz_kinder_tu
to eligibility for Kinderfreibetrag (§32 EStG)
Elterngeld
- Net income calculation should be based on
eink_st_y_sn
, noteink_st_y_tu
Erziehungsgeld
-
anz_erwachsene_tu
toanz_erwachsene_fg
(See issue #670) -
anz_kinder_mit_kindergeld_tu
toanz_kinder_mit_kindergeld_fg
Grundrente
-
gemeinsam_veranlagt_tu
can go; should instead be an indicator for being married (e.gmarried
) -
_grundr_zuschlag_eink_vor_freibetrag_m_tu
should not be aggregated on_tu
level but on_ehe
Grundsicherung im Alter
-
grunds_im_alter_eink_m
and its inputs should be on_sn
level but when used as an input on_eg
level - In general, the correct grouping is
_eg
, not_hh
- Correct grouping of
grunds_im_alter_vermög_freib_hh
and its inputs is_fg
. When used as an input, use_fg
as well.
Kindergeld
- Remove aggregation dict at top of file.
- Kindergeld should be created via the grouping by
p_id
function
Unterhaltsvorschuss
- Switch to
alleinerz_sn
. - The parental income check in
unterhaltsvors_m
should be on the single parent level only (no aggregation). Use aggregation byp_id
mechanic here. - Nearly all
_tu
are replaced with no aggregation at all because only income of the single parent is considered
Wohngeld
- Correct grouping in general:
_hh
-
wohngeld_abzüge_st_sozialv_m
and its inputs should be on the_sn
level, but aggregated to the_hh
level when its used as an input. - Whenever
anz_kinder_bis_10_tu
is used, it should be number of Kindergeld claims on the individual level. - Correctly specify the Wohngeld Vorrang check: It is possible that for the same household some individuals receive Wohngeld and some receive ALG2.
ALG II
-
_arbeitsl_geld_2_alleinerz_mehrbedarf_m_bg
,arbeitsl_geld_2_kindersatz_m_bg_bis_2010
,arbeitsl_geld_2_kindersatz_m_bg_ab_2011
,arbeitsl_geld_2_regelsatz_m_bg_bis_2010
,arbeitsl_geld_2_regelsatz_m_bg_ab_2011
,arbeitsl_geld_2_vor_vorrang_m_bg
inputs should be on the_bg
level as well
Kinderzuschlag
- All
_tu
are replaced with_bg
- Same applies for Kosten Unterkunft (not 100% sure, need to check again)
- Occurrences of
_hh
have already been replaced with_bg
in #662
Hopefully this is all correct, but let me know if something seems off @hmgaudecker
I'll update the list if something comes up.
kinderbonus_m_tu
changes tokinderbonus_m_sn
I think that should just follow kindergeld
anz_kinder_mit_kindergeld_tu changes to number of Kindergeld claims on _sn level
Does Kindergeld or Kinderfreibetrag matter here or both, @JakobWegmann? In any case, it should be the versions at individual level, if we have them.
Elterngeld
- Net income calculation should be based on lohnst_m, not eink_st_y_tu
- Note: Even then the calculation is slightly off compared with the simplified rules described here. Might be something for a different PR.
Yeah. Big one. I still believe this should go altogether and become an input we require from prior calculations. Typically
Grundsicherung im Alter
Income checks on the _tu level should be replaced with _sn. But this is just an approximation because i) there might be couples that do not file taxes together but are considered as a couple for Grundsicherung, ii) income of the partner is not considered if the partner cannot satisfy her own needs.
I wonder whether we should support another grouping, like "married" or "einstandspartner" ?
Since 2020, income of children is not considered, even if it is higher than 100.000 € (S. 5 §43 SGB XII).
Yeah, I think we can ignore that until somebody actually needs it. But maybe add a note in the function?
I wonder whether we should support another grouping, like "married" or "einstandspartner" ?
I agree, especially as we have p_id_einstandspartner
as an input variable already. Just learned, that this would be the correct grouping for Grundrente as well (doesn't matter if taxes are filed jointly).
alleinerz_tu should change to alleinerz_sn and can be determined endogenously.
Leave as is for now (AFAICT, alleinerz is the input variable), make endogenous in a different PR.
anz_kinder_mit_kindergeld_tu changes to number of Kindergeld claims on _sn level
Does Kindergeld or Kinderfreibetrag matter here or both, @JakobWegmann? In any case, it should be the versions at individual level, if we have them.
I'm 98% sure that only the Kinderfreibetrag matters. I think the law is very clear.
Elterngeld
- Net income calculation should be based on lohnst_m, not eink_st_y_tu
- Note: Even then the calculation is slightly off compared with the simplified rules described here. Might be something for a different PR.
Yeah. Big one. I still believe this should go altogether and become an input we require from prior calculations. Typically t−1. In any case, don't worry much about it in this PR.
The parental leave calculation in my Stata code matches the calculation, so I think I can at least fix the calculation in GETTSIM as soon as all these changes are implemented.
Elterngeld
- Net income calculation should be based on lohnst_m, not eink_st_y_tu
- Note: Even then the calculation is slightly off compared with the simplified rules described here. Might be something for a different PR.
Yeah. Big one. I still believe this should go altogether and become an input we require from prior calculations. Typically t−1. In any case, don't worry much about it in this PR.
The parental leave calculation in my Stata code matches the calculation, so I think I can at least fix the calculation in GETTSIM as soon as all these changes are implemented.
Yes, but given our annual structure, I suppose the better approximation is not to use concurrent income, right?
Yes, I agree it would be more intuitive and less error prone to have an additional input.
It should be an output, though! But not in this PR, @MImmesberger, can you open an issue for that, please?
It should be an output, though!
Just to make sure I understood that correctly: it should be an output for convenience (when calculating the inputs for t
in t-1
), i.e. it would not be used as an input for another function?
Not in the concurrent year, no. But in the subsequent year.
Say I have panel data for 2023 and 2024 and I want to calculate Elterngeld for kids born in 2024. This would proceed as follows:
- Run GETTSIM on 2023 data, output
elterngeld_eink_relev_current_m
. - Call this variable
elterngeld_eink_relev_lag_m
and run GETTSIM on 2024 data using it as an input.
We'll need to have a suitable distinction of the names, this suggestion is bogus, of course.
Great! This looks like a big step forward!
I wonder whether we should support another grouping, like "married" or "einstandspartner" ?
I agree, especially as we have p_id_einstandspartner as an input variable already. Just learned, that this would be the correct grouping for Grundrente as well (doesn't matter if taxes are filed jointly).
AFAIK, unmarried partners are considered for Grundsicherung im Alter, but not Grundrente. So I guess we needed three groupings (which should be all available from the input data):
sn
; filing tax together -> taxes- married partners -> grundrente
einstandspartner/einstandsgemeinschaft
; partners living together -> Grundsicherung im Alter
(I updated the original post following our discussion. Also, I added the two new groupings that are needed)
For ALG2 (_bg
grouping), the Wohngeld priority check should still be on the _hh
level for Wohngeld, correct? I'm referring to this function:
def wohngeld_vorrang_hh(
wohngeld_nach_vermög_check_m_hh: float,
arbeitsl_geld_2_vor_vorrang_m_bg: float,
) -> bool:
"""Check if housing benefit has priority.
Parameters
----------
wohngeld_nach_vermög_check_m_hh
See :func:`wohngeld_nach_vermög_check_m_hh`.
arbeitsl_geld_2_vor_vorrang_m_bg
See :func:`arbeitsl_geld_2_vor_vorrang_m_bg`.
Returns
-------
"""
return wohngeld_nach_vermög_check_m_hh >= arbeitsl_geld_2_vor_vorrang_m_bg
For ALG2 (
_bg
grouping), the Wohngeld priority check should still be on the_hh
level for Wohngeld, correct?
Not quite. It is possible that some individuals in a household receive Wohngeld and others receive Bürgergeld.
To my understanding, Wohngeld is calculated at the household level but can be broken down to individual values. For the priority check, these should be aggregated at the _bg
level and compared there. Does that seem correct, @mjbloemer @michaelhebsaker ? Would you have a reference of how to distribute Wohngeld across household members?
A
married
indicator would be useful as well.
Should this be called married
or verheirat
? I'd wager this should go into demographic_vars.py
, right?
A
married
indicator would be useful as well.Should this be called
married
orverheirat
? I'd wager this should go intodemographic_vars.py
, right?
I think I'd use ehe_id
. Short enough and very clear.
So, ehe_id
instead of spouse_id
and no additional boolean married
variable?
Ah, sorry, I had not read through @MImmesberger's updates to the main issue. Yes, I think ehe
is clearer, also since we are using German identifiers for the other groupings.
Instead of an extra indicator we can check for p_id_ehepartner >= 0
, right? Should be easy enough, rather not add an extra variable.
Instead of an extra indicator we can check for
p_id_ehepartner >= 0
, right? Should be easy enough, rather not add an extra variable.
Yes, that was the implementation of married
anyway. I'll remove it again.