hadoop-hdfs

There are 303 repositories under hadoop-hdfs topic.

seaweedfs/seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Language:Go23.3k 537 2.9k2.3k
OBenner/data-engineering-interview-questions
More than 2000+ Data engineer interview questions.
1.2k 21 2423
Morphl-AI/MorphL-Community-Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Language:Python262 35 036
linkedin/dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Language:Java131 18 4834
IBM/sparksql-for-hbase
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
69 31 727
groda/big_data
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Language:Jupyter Notebook67 4 226
vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Language:Python53 6 137
maniram-yadav/Big_DataHadoop_Projects
Big data projects implemented by Maniram yadav
Language:PigLatin51 4 036
AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
Language:Python38 2 07
jarlor/TravelWebsite_BigDataAnalysis
旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)
Language:Java32 1 21
hokstack/hok-helm
HokStack - Run Hadoop Stack on Kubernetes
Language:Shell24 4 88
hundredlabs/console
Open source data infrastructure platform. Designed for developers, built for speed.
Language:TypeScript22 1 44
hadoop-sandbox/hadoop-sandbox
A fully-functional Hadoop Yarn cluster as docker-compose deployment.
Language:Shell17 3 15
SepehrImanian/ansible-hadoop-hdfs
Ansible Playbook For Setup Hadoop HDFS
Language:Jinja17 2 02
lucas91batista/twitter-hashtag-graph
Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton
Language:JavaScript15 4 00
pfisterer/apache-hadoop-helm
Helm chart for Apache Hadoop using multi-arch docker images
Language:Dockerfile15 3 012
PChou/marayarn
Marathon on yarn
Language:Java13 3 07
alagrede/HdfsClient
A Java Hdfs client example and full Kerberos example for call hadoop commands directly in java code or on your local machine.
Language:Java12 4 29
waltherg/distributable_docker_sql_on_hadoop
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Language:Shell12 5 04
Areesha-Tahir/Hadoop-MapReduce-Sentiment-Analysis-Through-Keywords
A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.
Language:Java11 1 00
jodth07/hadoop-installation
Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04
Language:Shell8 2 015
leibniz21c/mammoth
Mammoth is a container based hadoop distributed system log analyzer. Sponsed by Mantech and Naver Cloud Platform.
Language:Dart8 1 45
LMAPcoder/Hadoop-on-Colab
Installation and configuration of Hadoop on Google Colaboratory
Language:Jupyter Notebook7 1 06
mgarralda/hadoop-spark-cluster
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
Language:Dockerfile7 2 03
aadishgoel/Hadoop-Codes
Neat and Handy Place for all Hadoop codes
Language:Java6 0 03
HxnDev/Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword
In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.
Language:Java6 1 0
Mahmoud-nfz/football-big-data
This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.
Language:TypeScript6 1 92
prabal03/python-automation-in-linux
Python automation in linux
Language:Python6 1 01
Ren294/Covid-Data-Process
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.
Language:Shell6 1 00
berksudan/Distributed-Environment-Installation-Guide
Install Hadoop, HDFS, Yarn and Spark on 3 Ubuntu 18.04 Machines
5 1 00
briandi26/Machine-Learning-for-Forest-Fire-Prediction
Machine Learning for Forest Fire Prediction using Hadoop ecosystems and Spark Tools (Pyspark)
Language:Python5 1 03
HxnDev/Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.
Language:Java5 1 0
Ren294/Log-Analysis-Project
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
Language:Python5 2 00
Reza-Marzban/Vehicle-Fuel-Hadoop-MapReduce
Vehicle Fuel Hadoop MapReduce
Language:Java5 1 04
prithvianilk/rdfs
An attempt to make a reliable, distributed file system inspired by HDFS
Language:Java4 1 32
waikeungt/hdfs-spring-boot-starter
用于spring boot快捷使用HDFS的starter
Language:Java4 1 00

hadoop-hdfs

seaweedfs/seaweedfs

OBenner/data-engineering-interview-questions

Morphl-AI/MorphL-Community-Edition

linkedin/dynamometer

IBM/sparksql-for-hbase

groda/big_data

vim89/datapipelines-essentials-python

maniram-yadav/Big_DataHadoop_Projects

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

jarlor/TravelWebsite_BigDataAnalysis

hokstack/hok-helm

hundredlabs/console

hadoop-sandbox/hadoop-sandbox

SepehrImanian/ansible-hadoop-hdfs

lucas91batista/twitter-hashtag-graph

pfisterer/apache-hadoop-helm

PChou/marayarn

alagrede/HdfsClient

waltherg/distributable_docker_sql_on_hadoop

Areesha-Tahir/Hadoop-MapReduce-Sentiment-Analysis-Through-Keywords

jodth07/hadoop-installation

leibniz21c/mammoth

LMAPcoder/Hadoop-on-Colab

mgarralda/hadoop-spark-cluster

aadishgoel/Hadoop-Codes

HxnDev/Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword

Mahmoud-nfz/football-big-data

prabal03/python-automation-in-linux

Ren294/Covid-Data-Process

berksudan/Distributed-Environment-Installation-Guide

briandi26/Machine-Learning-for-Forest-Fire-Prediction

HxnDev/Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS

Ren294/Log-Analysis-Project

Reza-Marzban/Vehicle-Fuel-Hadoop-MapReduce

prithvianilk/rdfs

waikeungt/hdfs-spring-boot-starter