/dig-coll-borderlands

Repository for text data mining borderlands newspapers

Primary LanguageJupyter NotebookMIT LicenseMIT

Borderlands newspaper data mining

This repository hosts Jupyter Notebooks introducing text data mining with Python on the newspaper collection. The work is part two projects:

  • Using Newspapers as Data for Collaborative Pedagogy: A Multidisciplinary Interrogation of the Borderlands in Undergraduate Classrooms, funded in part by the Mellon Foundation through the Collections as Data program. More information about the project is available found at https://libguides.library.arizona.edu/newspapers-as-data.
  • Reporting on Race and Ethnicity in the Borderlands (1882-1924): A Data-Driven Digital Storytelling Hub, funded by the Mellon Foundation through the Digital Borderlands program.

If you are looking for an introduction explaining the concept of text data mining, check out the StoryMap at https://storymaps.arcgis.com/stories/cd7e273c42cd4ab6b6ce3fa89c13132c.

The scripts responsible for downloading and assembling daily volumes are available in a separate repository, at https://github.com/jcoliver/borderlands-newspapers.

The work focuses on the following titles:

  • Arizona Citizen, one of Arizona's earliest newspapers, published in Tucson
  • Arizona Post, a Tucson newspaper by and for the Jewish community
  • Arizona Sun, an African American newspaper published in Phoenix
  • Apache Sentinel, published by African American soldiers stationed at Fort Huachuca
  • Bisbee Daily Review, a newspaper published in Bisbee, a mining town at that time
  • Border Vidette, a newspaper published in Nogales, Arizona, on the border with Nogales, Mexico
  • Phoenix Tribune, the first African American newspaper published in Arizona
  • El Fronterizo, a weekly Tucson Spanish-language paper
  • El Mosquito, a Tucson paper including local news and news from Mexico
  • El Sol, a Spanish-language, Mexican American newspaper published in Phoenix
  • El Tucsonense, a Spanish-language, Mexican American newspaper published in Tucson
  • The Daily Morning Oasis, a daily English paper from Nogales, Arizona
  • The Oasis, an English-language paper published in Nogales, Arizona
  • The Weekly Orb, a weekly paper from Bisbee, Arizona
  • Tucson Citizen, a continuation of the Tucson newspaper, Arizona Citizen

The text for most of these newspapers is available at Chronicling America. Downloads of the texts used the API, documented at https://chroniclingamerica.loc.gov/about/api/. The entire data set is available from the UArizona Research Data Repository at https://doi.org/10.25422/azu.data.12735992.v3.

Text data mining lessons

Lessons for using these data in text data mining are available in Jupyter Notebooks. All lessons are licensed under a CC-BY-4.0 license 2020 by Jeffrey C. Oliver. Translation for the Spanish version of the Text Mining Template was aided in part by the Python script by Fernando Marcos Wittmann, available at https://github.com/WittmannF/jupyter-translate.

Name Launch Description
Introduction to text mining (short) Binder A brief lesson introducing relative word frequencies and visual display of word use over time. Includes a subset of the titles (three) for the three year period 1917-1919.
Introduction to text mining (long) Binder An extended version of the short lesson, above. Time to complete the lesson is approximately two hours
Text mining template Binder A relatively lightweight notebook to explore text mining analyses on the full data set of 15 titles.
Plantilla de Minería de Texto Binder Un cuaderno relativamente liviano para explorar análisis de minería de texto en el conjunto completo de datos de 15 títulos. (BORRADOR)