article-extraction
There are 3 repositories under article-extraction topic.
ieg-dhr/NLP-Course4Humanities_2024
This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applies NLP methods to them. NLP tasks: Tokenization, Lemmatization, TF-IDF, Part-of-speech tagging, semantic search with transformers, article extraction and OCR post-correction with LLMs, NER and text classification
dstark5/gnews-scraper
GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information
UtrechtUniversity/dataQuest
A configurable pipeline for extracting and filtering articles from large corpora, tailored for the Delpher Kranten corpus, with support for features like keyword filtering and tf-idf-based relevance scoring.