Pinned Repositories
Analyze-sentiment-FLUME-HIVE
Twitter sentiment analysis by FLUME
awesome-github-wiki
:neckbeard: Awesome list GitHub Wikis
databricks-crt020-notes
docs, codes and resources to prepare for the CRT020: Databricks Certified Associate Developer for Apache Spark 2.4 with Python 3 certification
DataStructuresAndAlgorithmsInScala
This project contains snippets of Scala code for various problems available on LeetCode,Hackerrank and also Data structures and algorithms implementation that required to solve those these problems.
leetcode-patterns
A pattern-based approach for learning technical interview questions
LeetCode-SQL-50
MonitoredStructuredStreaming
Repository for Spark structured streaming use case implementations.
PythonLearningSnippets
Spark_Incremental_Load_Automated_POC
This repository contains project of 'Automated Spark incremental data ingestion' from FileSystem to HDFS. The inbound folder will contains the input csv files. When you trigger the spark job , following steps will takes place. Spark will pick the latest arrived file in the inbound folder automatically and validate,process and ingest to HDFS. During the validation, if you found that file is already loaded to HDFS, then you can request new load from spark-submit optional parameters.This optional parameters are developed by scala's scopt library.When you request a new load flag, scala script will fetch a new file from external location(as this is a poc, It is simulated as some other directory than inbound within same file system) to Inbound and load that file to HDFS table. Once the data is read and validated , it will insert into given parameterized avro table or overwrite if table already exists.
rajeshsantha's Repositories
rajeshsantha/DataStructuresAndAlgorithmsInScala
This project contains snippets of Scala code for various problems available on LeetCode,Hackerrank and also Data structures and algorithms implementation that required to solve those these problems.
rajeshsantha/Spark_Incremental_Load_Automated_POC
This repository contains project of 'Automated Spark incremental data ingestion' from FileSystem to HDFS. The inbound folder will contains the input csv files. When you trigger the spark job , following steps will takes place. Spark will pick the latest arrived file in the inbound folder automatically and validate,process and ingest to HDFS. During the validation, if you found that file is already loaded to HDFS, then you can request new load from spark-submit optional parameters.This optional parameters are developed by scala's scopt library.When you request a new load flag, scala script will fetch a new file from external location(as this is a poc, It is simulated as some other directory than inbound within same file system) to Inbound and load that file to HDFS table. Once the data is read and validated , it will insert into given parameterized avro table or overwrite if table already exists.
rajeshsantha/databricks-crt020-notes
docs, codes and resources to prepare for the CRT020: Databricks Certified Associate Developer for Apache Spark 2.4 with Python 3 certification
rajeshsantha/leetcode-patterns
A pattern-based approach for learning technical interview questions
rajeshsantha/MonitoredStructuredStreaming
Repository for Spark structured streaming use case implementations.
rajeshsantha/PythonLearningSnippets
rajeshsantha/Analyze-sentiment-FLUME-HIVE
Twitter sentiment analysis by FLUME
rajeshsantha/awesome-github-wiki
:neckbeard: Awesome list GitHub Wikis
rajeshsantha/awesome-guidelines
A curated list of high quality coding style conventions and standards.
rajeshsantha/KafkaSparkStructuredStreaming
rajeshsantha/LeetCode-SQL-50
rajeshsantha/Scala-for-the-Impatient-my-solutions
rajeshsantha/coding-interview-university
A complete computer science study plan to become a software engineer.
rajeshsantha/computer-science
:mortar_board: Path to a free self-taught education in Computer Science!
rajeshsantha/covid19
Resources for the Udemy Course - Azure Data Factory For Data Engineers - Project on Covid19 by Ramesh Retnasamy
rajeshsantha/data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
rajeshsantha/Data-Science--Cheat-Sheet
Cheat Sheets
rajeshsantha/DatabricksIntegration
This repo is integrated with Databricks course learning
rajeshsantha/fpinscala
Code, exercises, answers, and hints to go along with the book "Functional Programming in Scala"
rajeshsantha/free-programming-books
:books: Freely available programming books
rajeshsantha/githubDemo
Its a demo repository to practice advanced git commands.
rajeshsantha/gitPracticeDir
delete later
rajeshsantha/Hive_Poc
POC Created from labs, consists of hive data loading with UDFs and UDAFs
rajeshsantha/InputData
This repo consists of sample datasets and Data that required for POCs and assignments
rajeshsantha/IntellijAssignments
rajeshsantha/JupyterRepo
rajeshsantha/LearningScala
My journey to learn Scala.
rajeshsantha/Notes
This repo contains preparation notes and logs for POC programs.
rajeshsantha/professional-programming
A collection of full-stack resources for programmers.
rajeshsantha/terraform-azure
terraform - azure - build