jamesdunham's Stars
atuinsh/atuin
✨ Magical shell history
stanfordnlp/dspy
DSPy: The framework for programming—not prompting—language models
deepset-ai/haystack
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
johnkerl/miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
tobymao/sqlglot
Python SQL Parser and Transpiler
opendatalab/PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
jupyter-book/jupyter-book
Create beautiful, publication-quality books and documents from computational content.
quadratichq/quadratic
Spreadsheet with AI, Code, Connections
jasonjmcghee/rem
An open source approach to locally record and enable searching everything you view on your Mac.
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
pocketpy/pocketpy
Portable Python 3.x Interpreter in Modern C for Game Scripting
nlpodyssey/spago
Self-contained Machine Learning and Natural Language Processing library in Go
bugbakery/audapolis
an editor for spoken-word audio with automatic transcription
yuchenlin/LLM-Blender
[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the diverse strengths of multiple open-source LLMs. LLM-Blender cut the weaknesses through ranking and integrate the strengths through fusing generation to enhance the capability of LLMs.
langchain-ai/langsmith-cookbook
integrii/flaggy
Idiomatic Go input parsing with subcommands, positional values, and flags at any position. No required project or package layout and no external dependencies.
macbre/sql-metadata
Uses tokenized query returned by python-sqlparse and generates query metadata
langchain-ai/weblangchain
LangChain-powered web researcher chatbot. Searches for sources on the web and cites them in generated answers.
james-bowman/nlp
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang
puhitaku/mtplvcap
Nikon to USB Webcam. Supports older models that Nikon WU does not. Windows/macOS/Linux. No HDMI capture dongle is needed. Ask me on Twitter @puhitaku
explosion/tokenizations
Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/
explosion/spacy-experimental
🧪 Cutting-edge experimental spaCy components and features
malcolmbarrett/dagtex
lightly opinionated LaTeX DAGs in R
explosion/spacy-alignments
💫 A spaCy package for Yohei Tamura's Rust tokenizations library
GU-DataLab/gdtm
A Python Package containing wrappers for topic models, including TND, NLDA, GTM, and temporal topic-noise models.
Curtin-Open-Knowledge-Initiative/open-metadata-report
Update of the Open Metadata Report looking at value add to Crossref from MAG, OpenAlex and others
ourresearch/openalex-dags
Airflow DAGs for OpenAlex
iNoBo/scinobo-fos-taxonomy
The taxonomy has 6 Levels (L1-L6). The levels from L1-L3 are static and stem from the OECD and ScienceMetrix taxonomy. The rest of the levels are algorithmically constructed utilizing publication-to-publications and venue-to-venue citation graph as well as clustering and topic modelling algorithms.
musurca/narwhal
A whale of an sqlite3 ORM for Python