Juniper Lovato, Philip Mueller, Parisa Suchdev, Peter Dodds
These are some datasets that were generated from the original Amos et al. corpus to make analysis easier.
Based on: Original SQLite database
Additional processing steps:
extract/extract_from_sqlite.py
extract/join.py
Based on: Full Corpus CSV Data
Additional processing steps:
split_by_year.py
Based on: By-Year Data
Additional processing steps:
negation_filtered_policytext_PII.ipynb
Based on: By-Year Data
Additional processing steps:
negation_filtered_policytext_PII.ipynb
Based on: Negation and PII Filtered By-Year
Additional processing steps:
generate_cooccurrence_network.py
Based on: Negation and PII Filtered By-Year (1997 only)
Additional processing steps:
generate_cooccurrence_network.py
Based on: Negation and PII Filtered By-Year Data
Additional processing steps:
SBM_topic-Model_Privacy_Policy_Paper_2023.ipynb
Based on: By-Year Data
Additional processing steps:
SBM_topic-Model_Privacy_Policy_Paper_2023.ipynb
countwords_uniquewords_Privacy_Policy_Paper_2023.ipynb
For topics extracted from the whole corpus, prevalence in each year.
Based on: Negation and PII Filtered By-Year Data, ABM Topic Model Topics (whole corpus)
Additional processing steps:
calculate_topic_timeseries.py -s
Based on: Negation and PII Filtered By-Year
Additional processing steps:
Viz_PII_Frequency_Privacy_Policy_Paper_2023.ipynb
Based on: Topic Prevalence Over Time
Additional processing steps:
plot_topic_timeseries.py
Based on: Co-Occurrence Networks
Additional processing steps:
analyze_coocurrence_graph.py