Some codebook tables get rows repeated in DB
Closed this issue · 2 comments
Example:
> nhanesCodebook("WHQ")$WHD120$WHD120
Code.or.Value Value.Description Count Cumulative Skip.to.Item
1 66 to 400 Range of Values 4019 4019 <NA>
2 77777 Refused 2 4021 <NA>
3 99999 Don't know 239 4260 <NA>
4 . Missing 1784 6044 <NA>
5 66 to 400 Range of Values 4019 4019 <NA>
6 77777 Refused 2 4021 <NA>
7 99999 Don't know 239 4260 <NA>
8 . Missing 1784 6044 <NA>
whereas in the same table, other variables seem fine.
> nhanesCodebook("WHQ")$WHD130$WHD130
Code.or.Value Value.Description Count Cumulative Skip.to.Item
1 39 to 79 Range of Values 2270 2270 <NA>
2 7777 Refused 1 2271 <NA>
3 9999 Don't know 146 2417 <NA>
4 . Missing 3627 6044 <NA>
The source looks fine:
https://wwwn.cdc.gov/nchs/nhanes/1999-2000/WHQ.htm#WHD120
and the non-DB version of nhanesCodebook() also looks OK.
I think this is the only table with this problem.
In some other cases the rows are duplicated in the source webpage. I am not sure what we should store in those cases, but if anyone want to look at examples:
https://wwwn.cdc.gov/Nchs/Nhanes/2003-2004/KIQ_U_C.htm
@sam-pullman - let @rsgoncalves know if you'd rather this be addressed in the metadata. He identified it there too.
@Genoa-HMS I suggest we discuss this during the Epiconductor exploration in-person meeting, we need to identify where the root of this issue is coming from before we can assign a team to it. This could be caused by the translation code, the metadata, the nhanesCodebook(), or the raw CDC data.