text-as-data
There are 28 repositories under text-as-data topic.
JasonKessler/scattertext
Beautiful visualizations of how language differs among document types.
MilaNLProc/contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
jboynyc/textnets
Text analysis with networks.
ryanjgallagher/shifterator
Interpretable data visualizations for understanding how texts differ at the word level
JasonKessler/Scattertext-PyData
Notebooks for the Seattle PyData 2017 talk on Scattertext
chkla/CSS-Events
Summer/ winter schools, workshops and conferences in computational social science 🫂
umanlp/SemScale
A tool for Semantic Scaling of Political Text (branch of Topfish, a suite of tools for Political Text Analysis)
davidycliao/redguards
This is a designed package for replicating the estimates and findings in the article of Factionalism and the Red Guards under Mao's China: Ideal Point Estimation Using Text Data.
chkla/Populism-Text-Analysis
Literature 📄 and datasets 📚 on automatic populism detection
fedenanni/Computational-Text-Analysis-2018-19
2018 Computational Text Analysis Notebooks, University of Mannheim
tweedmann/3x8emotions
Code and models for 3 different tools to measure appeals to 8 discrete emotions in German political text
wesslen/summer2017-socialmedia
Summer 2017 Social Media Analytics Workshop Series
cjerzak/LinkOrgs-software
LinkOrgs: An R package for linking linking records on organizations using half a billion open-collaborated records from LinkedIn
davidycliao/bisCrawler
An Automation Webcrawler for Extracting Central Bankers' Speeches
thieled/dictvectoR
'dictvectoR' measures the similarity between a concept dictionary and documents, using fastText word vectors. Implements the "Distributed-Dictionary-Representation" (Garten et al. 2018) method in R.
aflueckiger/KED2022
The ABC of Computational Text Analysis. BA Seminar, Spring 2022, University of Lucerne
adamlauretig/gensim_in_R
Code for estimating word embeddings with gensim in R.
WZBSocialScienceCenter/tm_corona
A small showcase for topic modeling with the tmtoolkit Python package. I use a corpus of articles from the German online news website Spiegel Online (SPON) to create a topic model for before and during the COVID-19 pandemic.
jfjelstul/regular-expressions-tutorial
A tutorial on using regular expressions in R
thelautiff/UN_meeting_records
From using xpdf, rvest, and quanteda on United Nations Digital Library search results to applying dictionaries to speeches in United Nations meeting records
aflueckiger/KED2021
The ABC of Computational Text Analysis. BA Seminar, Spring 2021, University of Lucerne
BenjaminFReese/american_constitutional_praxis
This repository uses text-as-data methods alongside traditional primary source reading to analyze early American state constitutions. The R scripts create a function to scrape and clean the constitutional text, run sentiment analysis, calculate tf-idf, and perform LDA. This is a work-in-progress.
CT-P/portuguese_open_data
Empirical framework applied to parliament discourses and Twitter data, with a Discourse Polarization Index.
ivansabik/chairum-corpus
Collection of text corpora for publicly available speeches from Mexican president Andres Manuel Lopez Obrador (AMLO) sourced from YouTube. The dataset includes his daily morning conferences (conferencias mañaneras) 😴🪿
graceadcox/Refugee-Text-as-Data
Original corpus of articles relating to refugees scraped from Tennessee newspaper The Chattanoogan along with simple code for text-as-data word cloud.
Sam-Gartenstein/Machine-Learning-for-the-Social-Sciences
Material from my Machine Learning for the Social Sciences course
Jszabo16/NCSR_transcript_webscrapping
Replication script for the Webscrapping Transcripts of the Parliamentary Debates in the National Council of the Slovak Republic (1994-2023) and the ensuing sentiment analysis
smkerr/news-israel-gaza
🇮🇱🇵🇸 News coverage of Israel-Hamas War 🇵🇸🇮🇱