hadoop-filesystem

There are 102 repositories under hadoop-filesystem topic.

  • BigDataAnalytics

    Language:Jupyter Notebook
  • popular-baby-names

    Language:PigLatin
  • H1B_VisaProject

    This repository contains the H1B_Visa Applicants Data Analysis project/case study using Hadoop undertaken during the training at NIIT. MapReduce,Hive,Pig,Scoop and Shell-scripting are the technologies used.

    Language:Shell2
  • quickorc

    Easy way to write java objects to apache orc files.

    Language:Java2
  • Project_1-Spark-using-Scala-API-

    Problem statement, get the revenue and number of orders from order_items on daily basis.

  • Airbnb-Big-Data-Management

    To develop an Airbnb database and create a pipeline using MongoDB and Hadoop architecture to ease the process of managing, loading, processing, querying, and analyzing Airbnb data based on location

    Language:Jupyter Notebook1
  • scala-dfs-lib

    DFS-Lib is a scala flavoured api to the Hadoop java filesystem api

    Language:Scala1
  • BIG-DATA-HADOOP-MAPREDUCE-PROJECT

    Implementation and comparison using python matplotlib of average alphabet count program of 3 languages English, French, Spanish in Hadoop MapReduce.

    Language:HTML1
  • Flume-Service

    Getting tweets using Flume service and analyzing tweets

  • UK-Climate-Analysis

    The aim is to provide the flexibility to the users to develop their own hypotheses about climate, validate them if they are true or false and project a forecast of the same using machine learning algorithm

    Language:Jupyter Notebook1
  • mastering-spark

    mastering spark

    Language:Java1
  • SearchEngine

    Search Engine implemented with Hadoop Map Reduce using TF/IDF

    Language:Java1
  • HadoopEcosystem

    Hadoop 生态体系(ecosystem)

    Language:JavaScript1
  • ApacheHadoop

    Exercise files for Apache Hadoop Big Data Training

  • EEG_ClientGUI

    A Java Swing GUI for building EEG data analysis workflows

    Language:Java1
  • hdfs-secure-erase

    Secure Erase utility for HDFS

    Language:Java1
  • Hadoop

    Hadoop and Hive fundamental commands

    Language:Shell
  • hadoop-crud-api

    Une API en Java pour interagir avec le Hadoop Distributed File System (HDFS). Cette API offre des fonctionnalités pour la lecture et l'écriture de données dans le HDFS

    Language:Java
  • COVID-19-MapReduce-project

    Design and implementation of different MapReduce jobs used to analyze a dataset on Covid-19 disease created by Our World In Data

    Language:Java
  • Hadoop

    Worked on Hadoop file streaming

    Language:Python
  • Insurance_marketplace_analytics

    The project was aimed to help the consumer find the best suitable health insurance plans amongst the pool of such plans. Thus, utilized Hadoop and HiveQL to store, process, and analyze large amounts of health insurance marketplace data, resulting in a 40% increase in data processing efficiency.

  • BLM5127_Big_Data_Analytics

    Average Temperature - Hadoop - Mapper - Reducer

    Language:Scala
  • Ethereum-analysis

    ETH analysis using big data for the QMUL Big Data Processing module. Intended to promote analysis of data retrieved via big data processing

    Language:Jupyter Notebook
  • WebHDFSClient

    Big Data project. Web client for HDFS. Working in the terminal. Has ability to manipulate local and Hadoop storage

    Language:Python
  • terraform-azurerm-hdinsight

    Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.

    Language:HCL
  • hadoop.3-config

    My Apache Hadoop 3 config files.

    Language:Shell
  • hadoop-mr-example-currency

    Hadoop MapReduce, Read currency.txt and driver, mapper, and reducer

    Language:Java
  • Shridoop

    A simulated Distributed File-System

    Language:Java
  • spark_streaming_of_twitter_data

    A Spark streaming pipeline to ingest the twitter data for a particular hashtag using twitter APIs into a csv file in real-time into Hadoop filesystem , and then creating a Hive External table with the csv file.

    Language:Scala
  • batch_processing_of_twitter_data

    A Batch Processing Data pipeline to ingest the twitter data for a particular hashtag using twitter API’s into a csv file in batches into Hadoop filesystem.

    Language:Scala
  • Hadoop-project-Map-Reduce-project-NCDC-data-set

    Implement & Evaluate performance of MySQL, Hadoop MapReduce and Sqoop with HDFS for functions like max temperature on NCDC dataset for large data (20GB).

    Language:Java
  • MapReduce

    This repo contains implementations of Mapreduce program in a large text corpus with Apache Hadoop Environment | Nilufa Yeasmin | https://www.linkedin.com/in/nilufayeasmin/

    Language:CSS
  • apache-hbase

    apache-hbase imports data from csv files, include create table and fetch relevant data.

    Language:Java
  • COL733-Cloud-Computing

    Collection of assignments offered under COL733 - Cloud Computing by Prof. Suresh Chand Gupta

    Language:Python
  • Running-a-Spark-Job-on-AWS-Cluster

    When dealing with huge datasets, it is quite impossible that the code successfully executes on your personal desktop. You either need a locally installed clustered environment i.e. Hadoop Map-Reduce or a Cloud such as AWS. Here's an example of running such Job on AWS cloud.

    Language:Python