apache-hadoop

There are 85 repositories under apache-hadoop topic.

mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Language:Java1.1k 126 26659
mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language:HTML161 26 0143
tencentyun/hadoop-cos
hadoop-cos（CosN文件系统）为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持，可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage
Language:Java91 40 2154
s911415/apache-hadoop-3.1.0-winutils
HADOOP 3.1.0 winutils
Language:Batchfile76 1 0102
PBWebMedia/yarn-prometheus-exporter
Export Hadoop YARN (resource-manager) metrics in prometheus format
Language:Go56 2 320
mohammadtavakoli78/Cloud-Computing
This is projects of Cloud Computing Course
Language:Python11 1 02
realtimedatalake/hive-metastore-docker
Containerized Apache Hive Metastore for horizontally scalable Hive Metastore deployments
Language:Dockerfile9 1 09
Guru107/hadoop-small-files-merger
A Spark application to merge small files on Hadoop
Language:Scala8 2 23
Coursal/Hadoop-Examples
Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.
Language:Java6 1 11
jagdish4501/Network-intrusion-Detection
This repository provides a guide to preprocess and analyze the network intrusion data set using NumPy, Pandas, and matplotlib, and implement a random forest classifier machine learning model using Scikit-learn.
Language:Jupyter Notebook6 1 00
kowaalczyk/spark-minimal-algorithms
An python implementation of Minimal Mapreduce Algorithms for Apache Spark
Language:Python5 2 00
RBC-DSAI-IITM/DCEIL
A fast, scalable and distributed community detection algorithm based on CEIL scoring function.
Language:Scala5 4 03
bdoepf/aws-emr-prometheus
Language:HCL4 1 11
chriskery/hadoop-operator
Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.
Language:Go4 1 01
haodemon/HadoopStreaming
Set of Input Formats for Hadoop Streaming
Language:Java4 1 00
nghoanglongde/spark-cluster-with-docker
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
Language:Shell4 1 02
Ren294/ECommerce-Insights-LakeHoues
This repository showcases a Medallion Architecture Data Lakehouse designed for both batch and real-time processing of e-commerce and marketing data. It supports comprehensive data analysis, reporting, and monitoring, providing a scalable solution for deriving insights from integrated datasets.
Language:Jupyter Notebook4 1 00
whoami-anoint/EasyHadoop
Simplified Hadoop Setup and Configuration Automation
Language:Shell4 1 01
felidsche/mail-spam-filter
An email spam filter using Apache Spark’s ML library
Language:Python3 1 01
Jordan396/Giraph-1.2.0-Installation
Instructions for Installing Giraph-1.2.0
3 0 00
sawadogosalif/Big-Data-Technologies
Big Data Technologies can be defined as software tools for analyzing, processing, and extracting data from an extremely complex and large data set with which traditional management tools can never deal
Language:Python3 1 00
Abdelhakim-gh/BigData_Project
This project aims to establish a data streaming pipeline with storage, processing, and visualization
Language:Python2 1 00
Coursal/Text-Sentiment-Analysis-In-Hadoop-And-Spark
The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.
Language:Java2 1 12
Lucass97/FlightAnalysis
This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.
Language:Jupyter Notebook2 1 01
Narius2030/Hive-DataWarehouse-Analysis
Implement a Hive data warehouse to store meaningful data, apply Machine Learning like Clustering or Regression for dealing with business problems
Language:Jupyter Notebook2 1 03
saitejavishalj/Hotspot-analysis-of-Geospatial-data
Built a Large Scale Distributed Data Processing system for Streaming Analytics using Hadoop Ecosystem (Apache Spark and HDFS), in Cloud for real-time spatial analytics.
Language:Scala2 1 01
surbhitawasthi/MiniProject-AadharCensusDataValidation
A small code to validate the Census data on the basis of Aadhar Data
Language:Java2 0 00
yingzhuo/logback-flume-appender
logback appender for apache-flume
Language:Java2 2 00
aaqib-ahmed-nazir/Naive_Search_Engine
This repository aims to develop a basic search engine utilizing Hadoop's MapReduce framework to index and process extensive text corpora efficiently. The dataset used for this project is a subset of the English Wikipedia dump, totaling 5.2 GB in size. The project focuses on implementing a naive search algorithm to address challenges in information.
Language:Jupyter Notebook1 1 00
Abdelhakim-gh/Spark_MinProject
The goal of this project is to learn data processing using Spark with practical examples on datasets and also apply programming with Scala.
Language:HTML1 1 00
bdbao/Hadoop-VM
This project sets up a Hadoop (v3.2.3) cluster on a virtual machine (Multipass) on macOS. It includes instructions for configuring HDFS, YARN, and uploading files via command-line and web interface.
Language:Shell1 1 0
carlosemsantana/docker-hadoop
Preparação de um ambiente de desenvolvimento e testes para Apache Hadoop.
1 1 00
esakik/data-engineering-essentials
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Language:Python1 1 01
TrentBrunson/TrentBrunson.github.io
My portfolio | under development
Language:HTML1 1 0
Trisha11r/covid_data_analysis_mapreduce
COVID-19 data analysis with MapReduce
Language:Java1 1 00
tspannhw/links
Links
Language:Scala1 2 01

apache-hadoop

mahmoudparsian/data-algorithms-book

mahmoudparsian/big-data-mapreduce-course

tencentyun/hadoop-cos

s911415/apache-hadoop-3.1.0-winutils

PBWebMedia/yarn-prometheus-exporter

mohammadtavakoli78/Cloud-Computing

realtimedatalake/hive-metastore-docker

Guru107/hadoop-small-files-merger

Coursal/Hadoop-Examples

jagdish4501/Network-intrusion-Detection

kowaalczyk/spark-minimal-algorithms

RBC-DSAI-IITM/DCEIL

bdoepf/aws-emr-prometheus

chriskery/hadoop-operator

haodemon/HadoopStreaming

nghoanglongde/spark-cluster-with-docker

Ren294/ECommerce-Insights-LakeHoues

whoami-anoint/EasyHadoop

felidsche/mail-spam-filter

Jordan396/Giraph-1.2.0-Installation

sawadogosalif/Big-Data-Technologies

Abdelhakim-gh/BigData_Project

Coursal/Text-Sentiment-Analysis-In-Hadoop-And-Spark

Lucass97/FlightAnalysis

Narius2030/Hive-DataWarehouse-Analysis

saitejavishalj/Hotspot-analysis-of-Geospatial-data

surbhitawasthi/MiniProject-AadharCensusDataValidation

yingzhuo/logback-flume-appender

aaqib-ahmed-nazir/Naive_Search_Engine

Abdelhakim-gh/Spark_MinProject

bdbao/Hadoop-VM

carlosemsantana/docker-hadoop

esakik/data-engineering-essentials

TrentBrunson/TrentBrunson.github.io

Trisha11r/covid_data_analysis_mapreduce

tspannhw/links