SreekarJammula
Computer Science Graduate- interested in Data analysis and engineering. Google certified professional-Data Engineer BIg Data|Google Cloud|Hadoop|Apache Spark|
Houston,TX
Pinned Repositories
Comparing-Translation-APIs
In the fierce cloud market, both the big tech companies Microsoft and Google have introduced their translation API’s as services on their cloud platforms. So, I decided to try them both and make a comparison. In this small blog, I have tried comparing them on parameters such as ease of use, punctuations, accuracy and speed.
DataflowSDK-examples
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines. This repository hosts a few example pipelines to get you started with Dataflow.
Depth-Perception-with-Intel-Realsense
Identification of surgery markers using intel's Realsense and Google Tango SDK's
ETL-BigQuery
The main purpose of this project is to discuss a project that processes huge amounts of weather data provided by NOAA (National Oceanic and Atmospheric Administration) and try to find out the hottest, coldest and windiest states in the United States.
Find-the-root
The purpose of this project is to design an Android app that solves polynomial equations and helps users find the roots for the equations in a user-friendly accessible way
Flight-Data-Analytics-
Python scripts utilizing the PySpark API to convert a huge data set (about 3.5 GB) of flight data into various data storage formats such as CSV, JSON, Sequence file system
python-docs-samples
Code samples used on cloud.google.com
SreekarJammula-Multi-Threading-protocol-for-restricted-access
• Developed a proof of concept that utilizes POSIX threads and semaphores to restrict access for the user defined threads • The aim was not to allow two dissimilar threads to have access to the same place at the same time • Major challenge was to ensure that there was no deadlock situation among the threads
tf-idf-
The current assignment is to write the python scripts for Apache Spark. The tasks are divided into three parts as below: WordCount-To count the occurrences of words in a book on a per-book basis and compare the results with those of Assignment1. pyspark.ml. feature- To count the tf-idf values for the unigram and bigrams using the pyspark.ml.feature p ackage of Mlib library of Spark. Find the execution time using 5,10 and 15 reducers. Word2Vec-Find the feature vectors of words using the word2vec class of Mlib library
Word-Count
The current task is to compute the word counts on a huge number of books in the Hadoop MapReduce environment. A detailed description of the problem is given below: This assignment has been split into three tasks: To count the occurrences of words in a book on a per-book basis. To count the number of books in which a particular word occurs. Find the execution times of above programs on 2,5,10 number of reducers.
SreekarJammula's Repositories
SreekarJammula/Flight-Data-Analytics-
Python scripts utilizing the PySpark API to convert a huge data set (about 3.5 GB) of flight data into various data storage formats such as CSV, JSON, Sequence file system
SreekarJammula/ETL-BigQuery
The main purpose of this project is to discuss a project that processes huge amounts of weather data provided by NOAA (National Oceanic and Atmospheric Administration) and try to find out the hottest, coldest and windiest states in the United States.
SreekarJammula/Find-the-root
The purpose of this project is to design an Android app that solves polynomial equations and helps users find the roots for the equations in a user-friendly accessible way
SreekarJammula/Comparing-Translation-APIs
In the fierce cloud market, both the big tech companies Microsoft and Google have introduced their translation API’s as services on their cloud platforms. So, I decided to try them both and make a comparison. In this small blog, I have tried comparing them on parameters such as ease of use, punctuations, accuracy and speed.
SreekarJammula/DataflowSDK-examples
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines. This repository hosts a few example pipelines to get you started with Dataflow.
SreekarJammula/Depth-Perception-with-Intel-Realsense
Identification of surgery markers using intel's Realsense and Google Tango SDK's
SreekarJammula/python-docs-samples
Code samples used on cloud.google.com
SreekarJammula/SreekarJammula-Multi-Threading-protocol-for-restricted-access
• Developed a proof of concept that utilizes POSIX threads and semaphores to restrict access for the user defined threads • The aim was not to allow two dissimilar threads to have access to the same place at the same time • Major challenge was to ensure that there was no deadlock situation among the threads
SreekarJammula/tf-idf-
The current assignment is to write the python scripts for Apache Spark. The tasks are divided into three parts as below: WordCount-To count the occurrences of words in a book on a per-book basis and compare the results with those of Assignment1. pyspark.ml. feature- To count the tf-idf values for the unigram and bigrams using the pyspark.ml.feature p ackage of Mlib library of Spark. Find the execution time using 5,10 and 15 reducers. Word2Vec-Find the feature vectors of words using the word2vec class of Mlib library
SreekarJammula/Word-Count
The current task is to compute the word counts on a huge number of books in the Hadoop MapReduce environment. A detailed description of the problem is given below: This assignment has been split into three tasks: To count the occurrences of words in a book on a per-book basis. To count the number of books in which a particular word occurs. Find the execution times of above programs on 2,5,10 number of reducers.