sonalgoyal
Building Zingg - open source data mastering, deduplication and entity resolution with ML
https://github.com/zinggAI/zinggIndia
Pinned Repositories
categorizer
crux
Crux is a reporting application for HBase. Crux provides a simple web based graphical interface to access HBase, query data and create reports. Crux is open sourced under Apache Software Foundation License v2.0.
customer-er
Translating text attributes (like name, address, phone number) into quantifiable numerical representations Training ML models to determine if these numerical labels form a match Scoring the confidence of each match
dataengineeringweekly
Weekly Data Engineering Newsletter
explore
Community-curated topic and collection pages on GitHub
github-repo-stats
GitHub Action for advanced repository traffic analysis and reporting
github-traffic
Save information about traffic to a GitHub repository
hiho
Hadoop Data Integration with various databases, ftp servers, salesforce. Incremental update, dedup, append, merge your data on Hadoop.
zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
sonalgoyal's Repositories
sonalgoyal/crux
Crux is a reporting application for HBase. Crux provides a simple web based graphical interface to access HBase, query data and create reports. Crux is open sourced under Apache Software Foundation License v2.0.
sonalgoyal/hiho
Hadoop Data Integration with various databases, ftp servers, salesforce. Incremental update, dedup, append, merge your data on Hadoop.
sonalgoyal/categorizer
sonalgoyal/customer-er
Translating text attributes (like name, address, phone number) into quantifiable numerical representations Training ML models to determine if these numerical labels form a match Scoring the confidence of each match
sonalgoyal/dataengineeringweekly
Weekly Data Engineering Newsletter
sonalgoyal/explore
Community-curated topic and collection pages on GitHub
sonalgoyal/github-repo-stats
GitHub Action for advanced repository traffic analysis and reporting
sonalgoyal/github-traffic
Save information about traffic to a GitHub repository
sonalgoyal/pyspark-ai
English SDK for Apache Spark
sonalgoyal/sonalgoyal