/pq_parser

Script to parse text file downloads from ProQuest's Global Newsstream database into CSV of metadata and full text.

Primary LanguageJupyter NotebookOtherNOASSERTION

Parse ProQuest Metadata

This notebook includes a python function to parse newspaper articles downloaded from ProQuest Newsstream into a pandas dataframe (and save to CSV) with metadata and full text (when full text is available).

Created by Cody Hennesy and David Naughton (University of Minnesota, Twin Cities, Libraries). Email Cody (chennesy@umn.edu) with any questions.

For an alternative approach using R and saving documents as HTML files, Jae Yeon Kim's Tidy Ethnic News parser.

See also: Factiva parser