rostandk's Stars
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
linexjlin/GPTs
leaked prompts of GPTs
conductor-oss/conductor
Conductor is an event driven orchestration platform
NielsRogge/Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
microsoft/SynapseML
Simple and Distributed Machine Learning
PaddlePaddle/PaddleRec
Recommendation Algorithm大规模推荐算法库,包含推荐系统经典及最新算法LR、Wide&Deep、DSSM、TDM、MIND、Word2Vec、Bert4Rec、DeepWalk、SSR、AITM,DSIN,SIGN,IPREC、GRU4Rec、Youtube_dnn、NCF、GNN、FM、FFM、DeepFM、DCN、DIN、DIEN、DLRM、MMOE、PLE、ESMM、ESCMM, MAML、xDeepFM、DeepFEFM、NFM、AFM、RALM、DMR、GateNet、NAML、DIFM、Deep Crossing、PNN、BST、AutoInt、FGCNN、FLEN、Fibinet、ListWise、DeepRec、ENSFM,TiSAS,AutoFIS等,包含经典推荐系统数据集criteo 、movielens等
benfred/implicit
Fast Python Collaborative Filtering for Implicit Feedback Datasets
microsoft/hummingbird
Hummingbird compiles trained ML models into tensor computation for faster inference.
grantjenks/python-diskcache
Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
pykeen/pykeen
🤖 A Python library for learning and evaluating knowledge graph embeddings
lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
jupyter-incubator/sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
NVIDIA-Merlin/NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
NVIDIA-Merlin/Merlin
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
LucaCanali/sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
facebookresearch/SONAR
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
grai-io/grai-core
NVIDIA-Merlin/models
Merlin Models is a collection of deep learning recommender system model reference implementations
nicholasmireles/DotDict
A simple Python library to make chained attributes possible.
adidas/lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
microsoft/MSMARCO-Question-Answering
MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension and question answering
haxsaw/hikaru
Move smoothly between Kubernetes YAML and Python for creating/updating/componentizing configurations.
cerndb/spark-dashboard
Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.
NVIDIA-Merlin/systems
Merlin Systems provides tools for combining recommendation models with other elements of production recommender systems (like feature stores, nearest neighbor search, and exploration strategies) into end-to-end recommendation pipelines that can be served with Triton Inference Server.
outlines-dev/functions
A collection of Outlines functions
benchsci/tinsel
PySpark schema generator
javiber/scrat
Persistent Caching of Expensive Function Results
NVIDIA-Merlin/core
Core Utilities for NVIDIA Merlin
truskovskiyk/ml-in-production-webinars