text-as-data

There are 28 repositories under text-as-data topic.

  • JasonKessler/scattertext

    Beautiful visualizations of how language differs among document types.

    Language:Python2.2k55101287
  • MilaNLProc/contextualized-topic-models

    A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

    Language:Python1.2k17108143
  • textnets

    jboynyc/textnets

    Text analysis with networks.

    Language:Python281134222
  • ryanjgallagher/shifterator

    Interpretable data visualizations for understanding how texts differ at the word level

    Language:Python27393029
  • JasonKessler/Scattertext-PyData

    Notebooks for the Seattle PyData 2017 talk on Scattertext

    Language:HTML1399452
  • chkla/CSS-Events

    Summer/ winter schools, workshops and conferences in computational social science 🫂

  • umanlp/SemScale

    A tool for Semantic Scaling of Political Text (branch of Topfish, a suite of tools for Political Text Analysis)

    Language:Python26414
  • davidycliao/redguards

    This is a designed package for replicating the estimates and findings in the article of Factionalism and the Red Guards under Mao's China: Ideal Point Estimation Using Text Data.

    Language:R20204
  • chkla/Populism-Text-Analysis

    Literature 📄 and datasets 📚 on automatic populism detection

  • fedenanni/Computational-Text-Analysis-2018-19

    2018 Computational Text Analysis Notebooks, University of Mannheim

    Language:Jupyter Notebook13608
  • tweedmann/3x8emotions

    Code and models for 3 different tools to measure appeals to 8 discrete emotions in German political text

    Language:Jupyter Notebook13130
  • wesslen/summer2017-socialmedia

    Summer 2017 Social Media Analytics Workshop Series

    Language:HTML113
  • LinkOrgs-software

    cjerzak/LinkOrgs-software

    LinkOrgs: An R package for linking linking records on organizations using half a billion open-collaborated records from LinkedIn

    Language:R101
  • davidycliao/bisCrawler

    An Automation Webcrawler for Extracting Central Bankers' Speeches

    Language:Python9202
  • thieled/dictvectoR

    'dictvectoR' measures the similarity between a concept dictionary and documents, using fastText word vectors. Implements the "Distributed-Dictionary-Representation" (Garten et al. 2018) method in R.

    Language:R8303
  • KED2022

    aflueckiger/KED2022

    The ABC of Computational Text Analysis. BA Seminar, Spring 2022, University of Lucerne

    Language:HTML4200
  • adamlauretig/gensim_in_R

    Code for estimating word embeddings with gensim in R.

  • WZBSocialScienceCenter/tm_corona

    A small showcase for topic modeling with the tmtoolkit Python package. I use a corpus of articles from the German online news website Spiegel Online (SPON) to create a topic model for before and during the COVID-19 pandemic.

    Language:Jupyter Notebook320
  • jfjelstul/regular-expressions-tutorial

    A tutorial on using regular expressions in R

  • thelautiff/UN_meeting_records

    From using xpdf, rvest, and quanteda on United Nations Digital Library search results to applying dictionaries to speeches in United Nations meeting records

    Language:R2000
  • aflueckiger/KED2021

    The ABC of Computational Text Analysis. BA Seminar, Spring 2021, University of Lucerne

    Language:HTML1210
  • BenjaminFReese/american_constitutional_praxis

    This repository uses text-as-data methods alongside traditional primary source reading to analyze early American state constitutions. The R scripts create a function to scrape and clean the constitutional text, run sentiment analysis, calculate tf-idf, and perform LDA. This is a work-in-progress.

    Language:HTML10
  • CT-P/portuguese_open_data

    Empirical framework applied to parliament discourses and Twitter data, with a Discourse Polarization Index.

    Language:Jupyter Notebook1100
  • ivansabik/chairum-corpus

    Collection of text corpora for publicly available speeches from Mexican president Andres Manuel Lopez Obrador (AMLO) sourced from YouTube. The dataset includes his daily morning conferences (conferencias mañaneras) 😴🪿

    Language:Python1110
  • Refugee-Text-as-Data

    graceadcox/Refugee-Text-as-Data

    Original corpus of articles relating to refugees scraped from Tennessee newspaper The Chattanoogan along with simple code for text-as-data word cloud.

    Language:R0100
  • Sam-Gartenstein/Machine-Learning-for-the-Social-Sciences

    Material from my Machine Learning for the Social Sciences course

    Language:Jupyter Notebook0100
  • Jszabo16/NCSR_transcript_webscrapping

    Replication script for the Webscrapping Transcripts of the Parliamentary Debates in the National Council of the Slovak Republic (1994-2023) and the ensuing sentiment analysis

    Language:R
  • smkerr/news-israel-gaza

    🇮🇱🇵🇸 News coverage of Israel-Hamas War 🇵🇸🇮🇱

    Language:R00