Pinned Repositories
ds-algo
Data Structures and Algorithms for Coding Interviews (Java & Scala)
e-commerce-marketing-pipeline
Data Pipeline examples using Oozie, Spark and Hive on Cloudera VM and AWS EC2 (branch aws-ec2)
geo-search-spark
Apache Spark v2.0.0 application written in Scala to map given latitude longitude values to nearest latitude longitude values in a given set using broadcasted indexes of available geo coordinates.
hive-migration
Migrating Hive Tables from one Hadoop Cluster to another and across versions
spark-indexed-dedup
Using Hash table based indexes for optimising joins in Apache Spark
spark-site-catalyst
Spark Data Source package to read data warehouse exports from Site Catalyst written for Apache Spark v1.6 and earlier and compatible with Spark 2.0 and above.
spark-skew-join-examples
Simple examples on techniques for handling skewed data in Spark 2.0
spark2-etl-examples
A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0
scio
A Scala API for Apache Beam and Google Cloud Dataflow.
anish749's Repositories
anish749/hive-migration
Migrating Hive Tables from one Hadoop Cluster to another and across versions
anish749/spark-site-catalyst
Spark Data Source package to read data warehouse exports from Site Catalyst written for Apache Spark v1.6 and earlier and compatible with Spark 2.0 and above.
anish749/geo-search-spark
Apache Spark v2.0.0 application written in Scala to map given latitude longitude values to nearest latitude longitude values in a given set using broadcasted indexes of available geo coordinates.
anish749/arx
ARX is a comprehensive open source data anonymization tool that has been designed from the ground up to provide high scalability and ease of use. It supports risk-based anonymization, methods for analyzing data quality and re-identification risks, as well as privacy models, such as k-anonymity, l-diversity, t-closeness and differential privacy
anish749/atlas
This repository is to help with the Partner Demonstration of the Apache Atlas project.
anish749/flashback
mock the internet
anish749/geo-search-mapreduce
Map Reduce code to map given latitude longitude values to nearest latitude longitude values in a given set using Map side join of two data sets.
anish749/incubator-atlas
Mirror of Apache Atlas (Incubating)
anish749/spark-sql-pipelines
An approach for building clean, readable, testable, Spark SQL data pipelines using Scala implicit classes
anish749/spray-io-vs-akka-http
A comparison of use cases for Spray IO (on Akka Actors) and Akka Http (on Akka Streams) for creating rest APIs
anish749/tz-offset
Hive UDF to find the timezone offset in hours for a particular timezone and date.
anish749/validate-bigdata
Validation framework to impose popular relational database constraints on BigData / Hive tables.