This is just a primitive analysis of the NPDB data.
Recommend open it in colab by clicking https://colab.research.google.com/github/jokerkeny/NPDBresearch/blob/main/NPDBresearchEDA.ipynb, where you can run all cells and make your own modifications. However it is pre-run and rendered, you can just read it without running too.
If you use github to preview it, some format and plot can't show correctly.
There are some warning and running errors, but they don't affect the results
https://www.npdb.hrsa.gov/resources/publicData.jsp
NPDBresearchEDA.ipynb: the notebook containing the EDA
PublicUseDataFile-Format.pdf: the detailed variable explaination file