Pinned Repositories
dwtc-geo-parser
google-play-dataset-import
Script to import data from a Google Play Store Apps dataset to a PostgreSQL database (Dataset URL: https://www.kaggle.com/lava18/google-play-store-apps)
open-food-facts-postgresql-import
Script to import data from the Open Food Facts to PostgreSQL (Dataset URL: https://www.kaggle.com/openfoodfacts/world-food-facts)
postgres-retrofit
Tools to create database-specific text value embeddings from word embedding datasets
postgres-word2vec
utils to use word embedding models like word2vec vectors in a PostgreSQL database
table-embeddings
Tools for training schema-aware Web table embedding for unsupervised and supervised machine learning on tabular data
the-movie-database-import
Script to import data from the The Movie Database to PostgreSQL (Dataset URL: https://www.kaggle.com/rounakbanik/the-movies-dataset
late-chunking
Code for explaining and evaluating late chunking (chunked pooling)
SQID
A tool to analyse, browse and query Wikidata
Wikidata-Toolkit
Java library to interact with Wikibase
guenthermi's Repositories
guenthermi/postgres-word2vec
utils to use word embedding models like word2vec vectors in a PostgreSQL database
guenthermi/table-embeddings
Tools for training schema-aware Web table embedding for unsupervised and supervised machine learning on tabular data
guenthermi/the-movie-database-import
Script to import data from the The Movie Database to PostgreSQL (Dataset URL: https://www.kaggle.com/rounakbanik/the-movies-dataset
guenthermi/postgres-retrofit
Tools to create database-specific text value embeddings from word embedding datasets
guenthermi/dwtc-geo-parser
guenthermi/docarray
🧬 The data structure for unstructured multimodal data · Neural Search · Vector Search · Document Store
guenthermi/google-play-dataset-import
Script to import data from a Google Play Store Apps dataset to a PostgreSQL database (Dataset URL: https://www.kaggle.com/lava18/google-play-store-apps)
guenthermi/open-food-facts-postgresql-import
Script to import data from the Open Food Facts to PostgreSQL (Dataset URL: https://www.kaggle.com/openfoodfacts/world-food-facts)
guenthermi/fast_minh
Python package for fast MinHash calculation and operations
guenthermi/mteb
MTEB: Massive Text Embedding Benchmark
guenthermi/NLP-OSS
Democratizing NLP!
guenthermi/SimilarityMeasure
Compute for one node in a graph the most similar one
guenthermi/test-gradient-cache
Small test script of gradient cache (https://github.com/luyug/GradCache) applied to train a model for a retrieval task on the SciFact dataset (https://allenai.org/data/scifact)