article-extraction

There are 3 repositories under article-extraction topic.

  • ieg-dhr/NLP-Course4Humanities_2024

    This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applies NLP methods to them. NLP tasks: Tokenization, Lemmatization, TF-IDF, Part-of-speech tagging, semantic search with transformers, article extraction and OCR post-correction with LLMs, NER and text classification

    Language:Jupyter Notebook16606
  • dstark5/gnews-scraper

    GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information

    Language:TypeScript10233
  • UtrechtUniversity/dataQuest

    A configurable pipeline for extracting and filtering articles from large corpora, tailored for the Delpher Kranten corpus, with support for features like keyword filtering and tf-idf-based relevance scoring.

    Language:Python10