hadoop-filesystem

There are 102 repositories under hadoop-filesystem topic.

BigDataAnalytics
Language:Jupyter Notebook
popular-baby-names
Language:PigLatin
Hadoop_Mapreduce_BerkleyGraphDataset
Language:Java
H1B_VisaProject
This repository contains the H1B_Visa Applicants Data Analysis project/case study using Hadoop undertaken during the training at NIIT. MapReduce,Hive,Pig,Scoop and Shell-scripting are the technologies used.
Language:Shell2
quickorc
Easy way to write java objects to apache orc files.
Language:Java2
Project_1-Spark-using-Scala-API-
Problem statement, get the revenue and number of orders from order_items on daily basis.
2
Airbnb-Big-Data-Management
To develop an Airbnb database and create a pipeline using MongoDB and Hadoop architecture to ease the process of managing, loading, processing, querying, and analyzing Airbnb data based on location
Language:Jupyter Notebook1
scala-dfs-lib
DFS-Lib is a scala flavoured api to the Hadoop java filesystem api
Language:Scala1
BIG-DATA-HADOOP-MAPREDUCE-PROJECT
Implementation and comparison using python matplotlib of average alphabet count program of 3 languages English, French, Spanish in Hadoop MapReduce.
Language:HTML1
Flume-Service
Getting tweets using Flume service and analyzing tweets
1
UK-Climate-Analysis
The aim is to provide the flexibility to the users to develop their own hypotheses about climate, validate them if they are true or false and project a forecast of the same using machine learning algorithm
Language:Jupyter Notebook1
mastering-spark
mastering spark
Language:Java1
SearchEngine
Search Engine implemented with Hadoop Map Reduce using TF/IDF
Language:Java1
HadoopEcosystem
Hadoop 生态体系(ecosystem)
Language:JavaScript1
ApacheHadoop
Exercise files for Apache Hadoop Big Data Training
1
EEG_ClientGUI
A Java Swing GUI for building EEG data analysis workflows
Language:Java1
hdfs-secure-erase
Secure Erase utility for HDFS
Language:Java1
Hadoop
Hadoop and Hive fundamental commands
Language:Shell
hadoop-crud-api
Une API en Java pour interagir avec le Hadoop Distributed File System (HDFS). Cette API offre des fonctionnalités pour la lecture et l'écriture de données dans le HDFS
Language:Java
COVID-19-MapReduce-project
Design and implementation of different MapReduce jobs used to analyze a dataset on Covid-19 disease created by Our World In Data
Language:Java
Hadoop
Worked on Hadoop file streaming
Language:Python
Insurance_marketplace_analytics
The project was aimed to help the consumer find the best suitable health insurance plans amongst the pool of such plans. Thus, utilized Hadoop and HiveQL to store, process, and analyze large amounts of health insurance marketplace data, resulting in a 40% increase in data processing efficiency.
BLM5127_Big_Data_Analytics
Average Temperature - Hadoop - Mapper - Reducer
Language:Scala
Ethereum-analysis
ETH analysis using big data for the QMUL Big Data Processing module. Intended to promote analysis of data retrieved via big data processing
Language:Jupyter Notebook
WebHDFSClient
Big Data project. Web client for HDFS. Working in the terminal. Has ability to manipulate local and Hadoop storage
Language:Python
terraform-azurerm-hdinsight
Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.
Language:HCL
hadoop.3-config
My Apache Hadoop 3 config files.
Language:Shell
hadoop-mr-example-currency
Hadoop MapReduce, Read currency.txt and driver, mapper, and reducer
Language:Java
Shridoop
A simulated Distributed File-System
Language:Java
spark_streaming_of_twitter_data
A Spark streaming pipeline to ingest the twitter data for a particular hashtag using twitter APIs into a csv file in real-time into Hadoop filesystem , and then creating a Hive External table with the csv file.
Language:Scala
batch_processing_of_twitter_data
A Batch Processing Data pipeline to ingest the twitter data for a particular hashtag using twitter API’s into a csv file in batches into Hadoop filesystem.
Language:Scala
Hadoop-project-Map-Reduce-project-NCDC-data-set
Implement & Evaluate performance of MySQL, Hadoop MapReduce and Sqoop with HDFS for functions like max temperature on NCDC dataset for large data (20GB).
Language:Java
MapReduce
This repo contains implementations of Mapreduce program in a large text corpus with Apache Hadoop Environment | Nilufa Yeasmin | https://www.linkedin.com/in/nilufayeasmin/
Language:CSS
apache-hbase
apache-hbase imports data from csv files, include create table and fetch relevant data.
Language:Java
COL733-Cloud-Computing
Collection of assignments offered under COL733 - Cloud Computing by Prof. Suresh Chand Gupta
Language:Python
Running-a-Spark-Job-on-AWS-Cluster
When dealing with huge datasets, it is quite impossible that the code successfully executes on your personal desktop. You either need a locally installed clustered environment i.e. Hadoop Map-Reduce or a Cloud such as AWS. Here's an example of running such Job on AWS cloud.
Language:Python