- LinkedIn’s Gobblin https://gobblin.apache.org/
- Uber’s Marmaray https://eng.uber.com/marmaray-hadoop-ingestion-open-source/
- https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained
- LinkedIn’s Datahub https://engineering.linkedin.com/blog/2019/data-hub
- Uber’s Databook https://eng.uber.com/metadata-insights-databook/
- Netflix’s Metacat https://netflixtechblog.com/metacat-making-big-data-discoverable-and-meaningful-at-netflix-56fb36a53520
- Uber Architecture https://www.infoq.com/articles/data-driven-privacy-architecture/
- Facebook https://engineering.fb.com/2020/07/21/security/data-classification-system/
- https://medium.com/airbnb-engineering/on-spark-hive-and-small-files-an-in-depth-look-at-spark-partitioning-strategies-a9a364f908
- https://medium.com/bigid-on-id-privacy/two-heads-are-better-than-one-when-data-discovery-meets-data-catalogs-bigid-8fc114aa4084
-
Entity Matching/Resolution:
-
Right to Delete:
-
POS Tagging:
-
BERT:
-
Key word Extraction: