andy-wagner
Serial entrepreneur, data, information retrieval and machine learning geek
CTO@searchhub.ioGermany
Pinned Repositories
byteseek
A Java library for byte pattern matching and searching
cqengine
Ultra-fast SQL-like queries on Java collections
danny
Dictionary Based Approach To Approximate Nearest Neighbors
elasticsearch-dynarank
This plugin provides a feature to change top N documents in a search result.
flashtext-java
A Java port of https://github.com/vi3k6i5/flashtext, Extract Keywords from sentence or Replace keywords in sentences
hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
homonym
A mini web crawler to get hundreds of websites' content based on a list of keywords.
hyperscan-java
Match tens of thousands of regular expressions within milliseconds - Java bindings for Intel's hyperscan 5
simple-unified-product-ontology-format
Draft of a simplified ontology format specially designed from a search and retrieval perspective
vec4ir
Word Embeddings for Information Retrieval
andy-wagner's Repositories
andy-wagner/big-phoney
Get phonetic spellings and syllable counts for any english word. Works with made-up and non-dictionary words
andy-wagner/bplustree
B+-tree in java that stores to disk using memory mapped files, supports range queries and duplicate keys
andy-wagner/browser-core
Cliqz features, shared across products including Cliqz browsers for Windows, Mac, Android and iOS
andy-wagner/clust4j
A suite of classification clustering algorithm implementations for Java. A number of partitional, hierarchical and density-based algorithms including DBSCAN, k-Means, k-Medoids, MeanShift, Affinity Propagation, HDBSCAN and more.
andy-wagner/elasticsearch-record-linkage
ElasticSearch plugin to expose scoring metrics useful for record linkage and deduplication
andy-wagner/grobid-quantities
GROBID extension for identifying and normalizing physical quantities.
andy-wagner/Interactive-Dictionary
In this program, the user interacts with a dictionary. The user can input a word, part of speech, and filter the dictionary by part of speech. The Java program interacts with an enum to pull data from. There are still a bit of fixes to make, but the program overall works.
andy-wagner/IRBandits
Java library for interactive recommendation.
andy-wagner/kenlm-jni
A Java JNI wrapper for KenLM: Faster and Smaller Language Model Queries
andy-wagner/liblevenshtein-java
Various utilities regarding Levenshtein transducers. (Java)
andy-wagner/montysolr
Solr for Astrophysics Data System
andy-wagner/nlprule
Rule-based grammatical error correction through parsing LanguageTool rules in Rust w/ bindings for Python.
andy-wagner/pyate
PYthon Automated Term Extraction
andy-wagner/query-suggestions
Produces meaningful completions for partial queries given by the user. Semester project for the course "Information Retrieval" at the University of Tübingen in the winter semester 2016/17.
andy-wagner/Recommendation-System
Hybrid RecSys, CF-based RecSys, Model-based RecSys, Content-based RecSys, Finding similar items using Jaccard similarity
andy-wagner/revizor
Ecommerce product title recognition package
andy-wagner/SCStemmers
A collection of stemmers for Serbian and Croatian
andy-wagner/search-solved-public-seo
andy-wagner/seldon-core
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
andy-wagner/spaczz
Fuzzy matching and more functionality for spaCy.
andy-wagner/spell-correction-gingerit-demo
Tutorial on creating a spelling correction Python application using Gingerit and Streamlit
andy-wagner/SpellGCN
SpellGCN
andy-wagner/Sudachi
A Japanese Tokenizer for Business
andy-wagner/Sux4J
Sux4J is an effort to bring succinct data structures to Java.
andy-wagner/tinspin-indexes
Spatial index library with R*Tree, STR-Tree, Quadtree, CritBit, KD-Tree, CoverTree
andy-wagner/tinyStats
Statistics about data (cardinality estimation, frequent item detection, approximate counting,...)
andy-wagner/universal-recommender
Java™ Programming Language™ library for recommendation engine implementation and scientific evaluation (2009–2010)
andy-wagner/vectorai
Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.
andy-wagner/wildcard-trie
String trie that supports wildcard search
andy-wagner/words-grouping
tool for listing most common words from a file with given tolerance for each group (using Levenshtein distance)