Pinned Repositories
awesome-opensource-data-engineering
An Awesome List of Open-Source Data Engineering Projects
da-rnn
Dual-Stage Attention-Based Recurrent Neural Net for Time Series Prediction
data-sentry
A project to build a machine learning pipeline to detect personal identifiable information (PII)
E2E-TBSA
A Unified Model for Opinion Target Extraction and Target Sentiment Prediction (AAAI 2019)
extremitypathfinder
python package for fast shortest path computation on 2D grid or polygon maps
goodtables-py
Validate tabular data in Python
piicatcher
A data catalog for database tables and columns to track PII and PHI.
pydqc
python automatic data quality check toolkit
scrubadub
Clean personally identifiable information from dirty dirty text.
WaveRL
bballamudi's Repositories
bballamudi/100DaysOfCode
PyBites #100DaysOfCode
bballamudi/amundsenfrontendlibrary
Front-end service library for Amundsen
bballamudi/awesome
😎 Awesome lists about all kinds of interesting topics
bballamudi/aws-toolbox
A collection of DevOps tools including shell & python scripts that automate the boring stuff in AWS.
bballamudi/BERT-for-RRC-ABSA
code for our NAACL 2019 paper: "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis"
bballamudi/cards-pytest
Project task tracking / todo list
bballamudi/cs-video-courses
List of Computer Science courses with video lectures.
bballamudi/data-discovery-api
bballamudi/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
bballamudi/datacatalog-tag-manager
Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources -- currently supports the CSV file format
bballamudi/DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
bballamudi/Edator
A python package that performs exploratory data analysis for users. Additionally, it generates 3 output files that comprise of a cleaned CSV, plots and a text report.
bballamudi/entity_resolution
Example entity resolution workflow using PySpark
bballamudi/example-fastapi
bballamudi/incubator-superset
Apache Superset is a Data Visualization and Data Exploration Platform
bballamudi/kpi-dashboard-plotly-dash
bballamudi/marquez-airflow
Airflow support for Marquez
bballamudi/marquez-python
Python client for Marquez
bballamudi/medium-search-app
A simple search engine to search medium stories built with streamlit and elasticsearch.
bballamudi/mobydq
:whale: Tool to automate data quality checks on data pipelines
bballamudi/multi-data-lineage-capture-py
IBM Multi-Lineage Data System
bballamudi/news-feed
bballamudi/practice
bballamudi/python-deequ
Python API for Deequ
bballamudi/rumbl
bballamudi/solar-irradiance
bballamudi/spark-deequ
bballamudi/streamlit-apps
bballamudi/text_similarity
A nlp library for text similarity based on Transformer models
bballamudi/zero-administration-inference-with-aws-lambda-for-hugging-face
spacy-ner-aws-lambda 🤗