Pinned Repositories
ads-privacy
airflow-plugin-demo
Example for an airflow plugin
airflow-rest-api-plugin-1
A plugin for Apache Airflow that exposes rest end points for the Command Line Interfaces
al-folio
A beautiful, simple, clean, and responsive Jekyll theme for academics
amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
apache-ranger-s3-plugin
Apache Ranger Plugin for S3
arrow-data-source
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
arrow-datafusion
Apache Arrow DataFusion and Ballista query engines
arx
ARX is a comprehensive open source data anonymization tool aiming to provide scalability and usability. It supports various anonymization techniques, methods for analyzing data quality and re-identification risks and it supports well-known privacy models, such as k-anonymity, l-diversity, t-closeness and differential privacy.
cloudmapper
CloudMapper helps you analyze your Amazon Web Services (AWS) environments.
yangchenghuang's Repositories
yangchenghuang/al-folio
A beautiful, simple, clean, and responsive Jekyll theme for academics
yangchenghuang/Best-README-Template
An awesome README template to jumpstart your projects!
yangchenghuang/boostFSLBert
Boosting BERT performances in FSL context
yangchenghuang/cadence
Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
yangchenghuang/climatenets-1
yangchenghuang/db-queue
Worker-queue implementation on top of Java and database
yangchenghuang/deid2_dpsyn
yangchenghuang/flask-vuejs-template
Flask + Vue JS Template
yangchenghuang/git-secrets
Prevents you from committing secrets and credentials into git repositories
yangchenghuang/gitrob
Reconnaissance tool for GitHub organizations
yangchenghuang/h3
Hexagonal hierarchical geospatial indexing system
yangchenghuang/hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
yangchenghuang/kq
Kafka-based Job Queue for Python
yangchenghuang/lightning
Lightning In-Memory Object Store
yangchenghuang/loglizer
A log analysis toolkit for automated anomaly detection [ISSRE'16]
yangchenghuang/nist-synthetic-data-2021
Source code for the second place submission in the third round of the 2021 NIST differential privacy temporal map challenge
yangchenghuang/OpenLineage
An Open Standard for lineage metadata collection
yangchenghuang/presto
The official home of the Presto distributed SQL query engine for big data
yangchenghuang/pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
yangchenghuang/rclone
"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files
yangchenghuang/sato
Code and data for Sato https://arxiv.org/abs/1911.06311.
yangchenghuang/spacy-clausie
Implementation of the ClausIE information extraction system for python+spacy
yangchenghuang/spark-atlas-connector
A Spark Atlas connector to track data lineage in Apache Atlas
yangchenghuang/spark-monitoring
Monitoring Azure Databricks jobs
yangchenghuang/spark-ocr-workshop
Public runnable examples of using John Snow Labs' OCR for Apache Spark.
yangchenghuang/sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
yangchenghuang/swarm-learning
A simplified library for decentralized, privacy preserving machine learning
yangchenghuang/tab-transformer-pytorch
Implementation of TabTransformer, attention network for tabular data, in Pytorch
yangchenghuang/templateNER
Source code for template-based NER
yangchenghuang/web-relationextraction