Madhour/SeemsPhishy

Data retrieval pipeline

Closed this issue · 4 comments

  • Generate Bing Query using advanced search operators (e.g., filetype:pdf site:apple.com)
    • Implement Browsing through search results (limit to first n pages)
  • Download PDFs inside links (results)
  • Run downloaded PDFs through PDF-Miner
  • Created Class for enumerating files + links (3fc8c1d)
    • Currently search is limited to first 50 pages max

Remaining (Download & Merge w/ PDF Miner) features implemented in 26f3f08.

OWDSC commented

done