Pinned Repositories
chunker
IPAddressZipCodeStateCountryLuceneJavaSearch
Automatically downloads a database of known IP addresses as well as other location data and creates a lucene index for spatial searching for IP addresses within a specific range (or other criteria). Stores lat/lon, zip, city, country, and IP addresses for fast lucene search. Index is 1GB upon completion.
java-wget
Dead simple java wget. Just one static class and a status enum. No dependencies.
JavaMongoDBOpLogReader
Simple MongoDB oplog reader written in java. Made to read the oplog from multiple sources.
rag-models
Rag Models for protocol buffers
search-api
Search API for the vector-based search engine ecosystem
search_indexer
RAG search engine based on wikipedia
solr-semantic-importer
Takes a collection from solr and creates a new collection with vectors calculated on the specified fields.
tika-parser
Wrapper for Tika made to work with the rag model ecosystem.
vectorizer
REST+gRPC+swagger ui version of a Sentence embedding service via docker or standalone
krickert's Repositories
krickert/search_indexer
RAG search engine based on wikipedia
krickert/rag-models
Rag Models for protocol buffers
krickert/vectorizer
REST+gRPC+swagger ui version of a Sentence embedding service via docker or standalone
krickert/chunker
krickert/collector-http
Norconex Web Crawler (or spider) is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.
krickert/grpc-client-discover-bug-recreate-krick
krickert/search-api
Search API for the vector-based search engine ecosystem
krickert/solr-semantic-importer
Takes a collection from solr and creates a new collection with vectors calculated on the specified fields.
krickert/tika-parser
Wrapper for Tika made to work with the rag model ecosystem.
krickert/crawler-job-creator
Create jobs to launch crawls for selenium.
krickert/crawler-manager
web crawler that works over selenium and extracts the text from the plain html
krickert/grpc_micronaut_420_broken_example
grpc micronaut serialization is broken in 4.2.0 - example code to demonstrate this behavior
krickert/markdown-parser
Takes in markdown documents and outputs well structured test. Meant for a precursor for chunking in a text processing pipeline.
krickert/micronaut-grpc
Integration between Micronaut and GRPC
krickert/micronaut-kafka-container-test-example
simple micronaut kafka container test for kafka unit testing. Examples include kafka serialization with strings, with avro, and with protocolbufs.
krickert/MSMARC_vectors
a place to store the vectorized documents for solr search
krickert/nlp-ner
NLP Named Entity Recognition Text Processor Microservice
krickert/pipeline-processor
Takes in a PIpeDocument for a PipeService and runs it through the configured stages.
krickert/search-front-end
Front end to the grpc search-api interface
krickert/search_presentation_helper
Introduction to search: from query to results
krickert/solr
Apache Solr open-source search software
krickert/solr-grpc-plugin
Use a gRPC service to parse a query or index a doc - used for semantic search
krickert/solr_dense_search_example
An example which indexes and queries a sample data set for dense vector searching
krickert/testcontainers_presentation
Presentation on testcontainers
krickert/tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
krickert/vector-embedder-grpc
Simple demo on how to use semantic search on solr in pure java
krickert/wiki-article-to-pipedocument-processor
krickert/wiki-download-dump-file-processor
krickert/wiki-download-request-creator
krickert/wiki-dump-file-to-wiki-article-processor