This repository hosts Jupyter Notebooks introducing text data mining with Python on the newspaper collection. The work is part two projects:
- Using Newspapers as Data for Collaborative Pedagogy: A Multidisciplinary Interrogation of the Borderlands in Undergraduate Classrooms, funded in part by the Mellon Foundation through the Collections as Data program. More information about the project is available found at https://libguides.library.arizona.edu/newspapers-as-data.
- Reporting on Race and Ethnicity in the Borderlands (1882-1924): A Data-Driven Digital Storytelling Hub, funded by the Mellon Foundation through the Digital Borderlands program.
If you are looking for an introduction explaining the concept of text data mining, check out the StoryMap at https://storymaps.arcgis.com/stories/cd7e273c42cd4ab6b6ce3fa89c13132c.
The scripts responsible for downloading and assembling daily volumes are available in a separate repository, at https://github.com/jcoliver/borderlands-newspapers.
- Arizona Citizen, one of Arizona's earliest newspapers, published in Tucson
- Arizona Post, a Tucson newspaper by and for the Jewish community
- Arizona Sun, an African American newspaper published in Phoenix
- Apache Sentinel, published by African American soldiers stationed at Fort Huachuca
- Bisbee Daily Review, a newspaper published in Bisbee, a mining town at that time
- Border Vidette, a newspaper published in Nogales, Arizona, on the border with Nogales, Mexico
- Phoenix Tribune, the first African American newspaper published in Arizona
- El Fronterizo, a weekly Tucson Spanish-language paper
- El Mosquito, a Tucson paper including local news and news from Mexico
- El Sol, a Spanish-language, Mexican American newspaper published in Phoenix
- El Tucsonense, a Spanish-language, Mexican American newspaper published in Tucson
- The Daily Morning Oasis, a daily English paper from Nogales, Arizona
- The Oasis, an English-language paper published in Nogales, Arizona
- The Weekly Orb, a weekly paper from Bisbee, Arizona
- Tucson Citizen, a continuation of the Tucson newspaper, Arizona Citizen
The text for most of these newspapers is available at Chronicling America. Downloads of the texts used the API, documented at https://chroniclingamerica.loc.gov/about/api/. The entire data set is available from the UArizona Research Data Repository at https://doi.org/10.25422/azu.data.12735992.v3.
Lessons for using these data in text data mining are available in Jupyter Notebooks. All lessons are licensed under a CC-BY-4.0 license 2020 by Jeffrey C. Oliver. Translation for the Spanish version of the Text Mining Template was aided in part by the Python script by Fernando Marcos Wittmann, available at https://github.com/WittmannF/jupyter-translate.