Pinned Repositories
audio.whisper
Transcribe audio files using the "Whisper" Automatic Speech Recognition model from R
BTM
Biterm Topic Modelling for Short Text with R
image
Computer Vision and Image Recognition algorithms for R users
taskscheduleR
Schedule R scripts/processes with the Windows task scheduler.
udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
word2vec
Distributed Representations of Words using word2vec
ETLUtils
Utilities for easily loading big data from relational databases directly into ffdf objects in R.
Myrrix-R-interface
Let R talk to Myrrix. Myrrix is a complete, real-time, scalable clustering and recommender system, evolved from Apache Mahout.
RMOA
Connect R to MOA for massive online data stream mining
udpipe-spacy-comparison
Compare accuracies of udpipe models and spacy models which can be used for NLP annotation
jwijffels's Repositories
jwijffels/ETLUtils
Utilities for easily loading big data from relational databases directly into ffdf objects in R.
jwijffels/page_dewarp
Text page dewarping using a "cubic sheet" model
jwijffels/activelearning.nlp
ActiveLearning for training NLP models in R
jwijffels/ANMS-Codes
Efficient adaptive non-maximal suppression algorithms for homogeneous spatial keypoint distribution
jwijffels/av
Working with Video in R
jwijffels/berkeley-stat-157
Homepage for STAT 157 at UC Berkeley
jwijffels/Bi-Sent2Vec
Robust Cross-lingual Embeddings from Parallel Sentences
jwijffels/cloudfront-authorization-at-edge-keycloak
jwijffels/cpp-fstlib
A single file C++17 header-only Minimal Acyclic Subsequential Transducers, or Finite State Transducers
jwijffels/DETM
jwijffels/dhSegment
Generic framework for historical document processing
jwijffels/EasyOCR
Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai
jwijffels/fuzzy-search
Fuzzy search modules for searching lists of words in low quality OCR and HTR text.
jwijffels/koan
A word2vec negative sampling implementation with correct CBOW update.
jwijffels/librnnvad
Voice activity detection (VAD) library, based on WebRTC's VAD engine
jwijffels/LSTM-CRF-pytorch-faster
A more than 1000X faster paralleled LSTM-CRF implementation modified from the slower version in the Pytorch official tutorial (URL:https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html).
jwijffels/neural-acoustic-distance
Code associated with the paper: Neural Representations for Modeling Variation in English Speech.
jwijffels/nnutils
CPU & CUDA implementation of several neural network utils
jwijffels/ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
jwijffels/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
jwijffels/pageDistanceBasedContourGenerator
Program that calculates the extraction polygon of present text lines given an existing baseline in the page file
jwijffels/phonfieldwork
R package for phonetic research and experimenting
jwijffels/rticles
LaTeX Journal Article Templates for R Markdown
jwijffels/sent2vec
General purpose unsupervised sentence representations
jwijffels/speech-representations
Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)
jwijffels/test_doc2vec
Compare doc2vec R implementation (PVDM, PVDBOW) with mean of word embedding in a classification task.
jwijffels/text_analysis_for_social_science
Code for the book on python for social scientists
jwijffels/v4py.github.io
E-learning materials for the V4Py summer school (Python for linguists).
jwijffels/wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit
jwijffels/weirdai
Weird A.I. Yankovic neural-net based lyrics parody generator