hadoop-filesystem
There are 102 repositories under hadoop-filesystem topic.
H1B_VisaProject
This repository contains the H1B_Visa Applicants Data Analysis project/case study using Hadoop undertaken during the training at NIIT. MapReduce,Hive,Pig,Scoop and Shell-scripting are the technologies used.
quickorc
Easy way to write java objects to apache orc files.
Project_1-Spark-using-Scala-API-
Problem statement, get the revenue and number of orders from order_items on daily basis.
Airbnb-Big-Data-Management
To develop an Airbnb database and create a pipeline using MongoDB and Hadoop architecture to ease the process of managing, loading, processing, querying, and analyzing Airbnb data based on location
scala-dfs-lib
DFS-Lib is a scala flavoured api to the Hadoop java filesystem api
BIG-DATA-HADOOP-MAPREDUCE-PROJECT
Implementation and comparison using python matplotlib of average alphabet count program of 3 languages English, French, Spanish in Hadoop MapReduce.
Flume-Service
Getting tweets using Flume service and analyzing tweets
UK-Climate-Analysis
The aim is to provide the flexibility to the users to develop their own hypotheses about climate, validate them if they are true or false and project a forecast of the same using machine learning algorithm
mastering-spark
mastering spark
SearchEngine
Search Engine implemented with Hadoop Map Reduce using TF/IDF
HadoopEcosystem
Hadoop 生态体系(ecosystem)
ApacheHadoop
Exercise files for Apache Hadoop Big Data Training
EEG_ClientGUI
A Java Swing GUI for building EEG data analysis workflows
hdfs-secure-erase
Secure Erase utility for HDFS
Hadoop
Hadoop and Hive fundamental commands
hadoop-crud-api
Une API en Java pour interagir avec le Hadoop Distributed File System (HDFS). Cette API offre des fonctionnalités pour la lecture et l'écriture de données dans le HDFS
COVID-19-MapReduce-project
Design and implementation of different MapReduce jobs used to analyze a dataset on Covid-19 disease created by Our World In Data
Hadoop
Worked on Hadoop file streaming
Insurance_marketplace_analytics
The project was aimed to help the consumer find the best suitable health insurance plans amongst the pool of such plans. Thus, utilized Hadoop and HiveQL to store, process, and analyze large amounts of health insurance marketplace data, resulting in a 40% increase in data processing efficiency.
BLM5127_Big_Data_Analytics
Average Temperature - Hadoop - Mapper - Reducer
Ethereum-analysis
ETH analysis using big data for the QMUL Big Data Processing module. Intended to promote analysis of data retrieved via big data processing
WebHDFSClient
Big Data project. Web client for HDFS. Working in the terminal. Has ability to manipulate local and Hadoop storage
terraform-azurerm-hdinsight
Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.
hadoop.3-config
My Apache Hadoop 3 config files.
hadoop-mr-example-currency
Hadoop MapReduce, Read currency.txt and driver, mapper, and reducer
Shridoop
A simulated Distributed File-System
spark_streaming_of_twitter_data
A Spark streaming pipeline to ingest the twitter data for a particular hashtag using twitter APIs into a csv file in real-time into Hadoop filesystem , and then creating a Hive External table with the csv file.
batch_processing_of_twitter_data
A Batch Processing Data pipeline to ingest the twitter data for a particular hashtag using twitter API’s into a csv file in batches into Hadoop filesystem.
Hadoop-project-Map-Reduce-project-NCDC-data-set
Implement & Evaluate performance of MySQL, Hadoop MapReduce and Sqoop with HDFS for functions like max temperature on NCDC dataset for large data (20GB).
MapReduce
This repo contains implementations of Mapreduce program in a large text corpus with Apache Hadoop Environment | Nilufa Yeasmin | https://www.linkedin.com/in/nilufayeasmin/
apache-hbase
apache-hbase imports data from csv files, include create table and fetch relevant data.
COL733-Cloud-Computing
Collection of assignments offered under COL733 - Cloud Computing by Prof. Suresh Chand Gupta
Running-a-Spark-Job-on-AWS-Cluster
When dealing with huge datasets, it is quite impossible that the code successfully executes on your personal desktop. You either need a locally installed clustered environment i.e. Hadoop Map-Reduce or a Cloud such as AWS. Here's an example of running such Job on AWS cloud.