/USDACitation

Primary LanguageJupyter Notebook

USDACitation

  1. “test extension reports.zip”: This is a set of 10 randomly selected "PDF" files from the extension programs below: • 'https://4hansci.osu.edu/', • 'https://4hansci.osu.edu/', • 'https://agsafety.tamu.edu/', • 'https://extension.osu.edu', • 'https://ohioaglaw.wordpress.com', • 'https://ohioaglaw.wordpress.com', • 'https://woodlandstewards.osu.edu', • 'https://vegfruit.wordpress.com/', • 'https://treesmatter.osu.edu/', • 'https://sheepandgoat.com/',
  2. “PDF2TEXT_TEST_ExtensionReports_V2 .ipynb”: This is a Python script that can help to turn any PDF files (from extension programs) to clean txt files to be used in the prediction algorithm (the file is commented with the instructions on how to be used)
  3. “USDA_Scraping_Classification5_Level1_Offline_V2 .ipynb”: This is a Python script for the (miss) opportunity prediction.
  4. “CTEST_102.xlsx”: this is the cleaned text file extracted from the extension reports (first level).
  5. “df_pdf.pdf”: this is the cleaned text filed extracted from 21 USDA official reports.