Pinned Repositories
amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
annoy
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
argo
Argo Workflows: Get stuff done with Kubernetes.
awesome-public-datasets
A topic-centric list of HQ open datasets.
awesome-workflow-engines
A curated list of awesome open source workflow engines
cortex
Deploy machine learning models to production
cryptdb
A database system that can process SQL queries over encrypted data.
cyberprobe
Capturing, analysing and responding to cyber attacks
dagster
A Python library for building data applications: ETL, ML, Data Pipelines, and more.
deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
sanjoy-bose's Repositories
sanjoy-bose/amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
sanjoy-bose/annoy
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
sanjoy-bose/argo
Argo Workflows: Get stuff done with Kubernetes.
sanjoy-bose/awesome-public-datasets
A topic-centric list of HQ open datasets.
sanjoy-bose/awesome-workflow-engines
A curated list of awesome open source workflow engines
sanjoy-bose/cortex
Deploy machine learning models to production
sanjoy-bose/cyberprobe
Capturing, analysing and responding to cyber attacks
sanjoy-bose/dagster
A Python library for building data applications: ETL, ML, Data Pipelines, and more.
sanjoy-bose/deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
sanjoy-bose/detect-secrets
An enterprise friendly way of detecting and preventing secrets in code.
sanjoy-bose/faas
OpenFaaS - Serverless Functions Made Simple
sanjoy-bose/fairlearn
A Python package to assess and improve fairness of machine learning models.
sanjoy-bose/flintrock
A command-line tool for launching Apache Spark clusters.
sanjoy-bose/forecasting
Time Series Forecasting Best Practices & Examples
sanjoy-bose/forwardsecrecy
The project aims to simplify the usage of ECC curve (curve25519) with Diffie-Hellman Key exchange. The work is inline with the Account Aggregator Specification.
sanjoy-bose/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
sanjoy-bose/indicnlp_catalog
A collaborative catalog of resources for Indian language NLP
sanjoy-bose/ludwig
Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
sanjoy-bose/machine-learning-systems-design
A booklet on machine learning systems design with exercises
sanjoy-bose/marquez
Collect, aggregate, and visualize a data ecosystem's metadata
sanjoy-bose/mediapipe
MediaPipe is the simplest way for researchers and developers to build world-class ML solutions and applications for mobile, edge, cloud and the web.
sanjoy-bose/ml-readings
A list of papers / videos / tutorials / blog posts on machine learning
sanjoy-bose/MLOps_VideoAnomalyDetection
Operationalize a video anomaly detection model with Azure ML
sanjoy-bose/open-data-registry
A registry of publicly available datasets on AWS
sanjoy-bose/rtb-papers
A collection of research and survey papers of real-time bidding (RTB) based display advertising techniques.
sanjoy-bose/sagemaker-spark
A Spark library for Amazon SageMaker.
sanjoy-bose/snowplow
Cloud-native web, mobile and event analytics, running on AWS and GCP
sanjoy-bose/sope
Apache Spark ETL Utilities
sanjoy-bose/tink
Tink is a multi-language, cross-platform, open source library that provides cryptographic APIs that are secure, easy to use correctly, and hard(er) to misuse.
sanjoy-bose/vaex
Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀