PaschalisAg
Quantifying language, plotting it, looking at my creations, and shouting "it's alive!"
Donostia International Physics Center (DIPC)Donostia-San Sebastian, Basque Country
PaschalisAg's Stars
brightmart/text_classification
all kinds of text classification models and more with deep learning
kk7nc/Text_Classification
Text Classification Algorithms: A Survey
peng-yiwen/WiKC
A cleaned version of Wikidata taxonomy - Refined using Large Language Models
jwngr/sdow
Six Degrees of Wikipedia
kavgan/nlp-in-practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Grasia/wiki-scripts
Miscellaneous scripts to gather and process data of wikis.
optuna/optuna
A hyperparameter optimization framework
mantasu/cs231n
Shortest solutions for CS231n 2021-2024
diffbot/knowledge-net
KnowledgeNet: A Benchmark Dataset for Knowledge Base Population
ericmjl/Network-Analysis-Made-Simple
An introduction to network analysis and applied graph theory using Python and NetworkX
alonnir/snacks
Snack size awesome list for Social Network Analysis resources
maxpumperla/hyperas
Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization
stanford-oval/WikiChat
WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.
PieterBeullens/medtrans_stylo
Files for stylometric analysis of medieval translators
AlexMoreo/diff-vectors
Diff-Vectors for Authorship Analysis
SupervisedStylometry/SuperStyl
Supervised Stylometry
urchade/GLiNER
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
attardi/wikiextractor
A tool for extracting plain text from Wikipedia dumps
sknetwork-team/scikit-network
Graph Algorithms
aditya-grover/node2vec
josh-ashkinaze/Normalized-Google-Distance
A python script to calculate normalized google distance (NGD). This is a semantic similarity metric based on Google search results
qcrit/DSH-2018-LatinProseVerse
Replication code for Chaudhuri et al., "A small set of stylometric features differentiates Latin prose and verse," Digital Scholarship in the Humanities 2018
tesserae/tesserae
The Tesserae project aims to provide a flexible and robust web interface for exploring intertextual parallels. Select two poems below to see a list of lines sharing two or more words (regardless of inflectional changes).
karpathy/minbpe
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
stanford-oval/wikidata-emnlp23
WikiSP, a semantic parser for Wikidata. WikiWebQuestions, a SPARQL-annotated dataset on Wikidata
MaartenGr/PolyFuzz
Fuzzy string matching, grouping, and evaluation.
keithmcnulty/ona_book
Handbook of Graphs and Networks in People Analytics
CambridgeUniversityPress/FirstCourseNetworkScience
Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis
practical-nlp/practical-nlp-code
Official Repository for Code associated with 'Practical Natural Language Processing' book by O'Reilly Media
kasparvonbeelen/ghi_python
Programming for Historians