hadoop-hdfs
There are 303 repositories under hadoop-hdfs topic.
seaweedfs/seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
OBenner/data-engineering-interview-questions
More than 2000+ Data engineer interview questions.
Morphl-AI/MorphL-Community-Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
linkedin/dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
IBM/sparksql-for-hbase
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
groda/big_data
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
maniram-yadav/Big_DataHadoop_Projects
Big data projects implemented by Maniram yadav
AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
jarlor/TravelWebsite_BigDataAnalysis
旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)
hokstack/hok-helm
HokStack - Run Hadoop Stack on Kubernetes
hundredlabs/console
Open source data infrastructure platform. Designed for developers, built for speed.
hadoop-sandbox/hadoop-sandbox
A fully-functional Hadoop Yarn cluster as docker-compose deployment.
SepehrImanian/ansible-hadoop-hdfs
Ansible Playbook For Setup Hadoop HDFS
lucas91batista/twitter-hashtag-graph
Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton
pfisterer/apache-hadoop-helm
Helm chart for Apache Hadoop using multi-arch docker images
PChou/marayarn
Marathon on yarn
alagrede/HdfsClient
A Java Hdfs client example and full Kerberos example for call hadoop commands directly in java code or on your local machine.
waltherg/distributable_docker_sql_on_hadoop
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Areesha-Tahir/Hadoop-MapReduce-Sentiment-Analysis-Through-Keywords
A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.
jodth07/hadoop-installation
Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04
leibniz21c/mammoth
Mammoth is a container based hadoop distributed system log analyzer. Sponsed by Mantech and Naver Cloud Platform.
LMAPcoder/Hadoop-on-Colab
Installation and configuration of Hadoop on Google Colaboratory
mgarralda/hadoop-spark-cluster
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
aadishgoel/Hadoop-Codes
Neat and Handy Place for all Hadoop codes
HxnDev/Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword
In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.
Mahmoud-nfz/football-big-data
This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.
prabal03/python-automation-in-linux
Python automation in linux
Ren294/Covid-Data-Process
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.
berksudan/Distributed-Environment-Installation-Guide
Install Hadoop, HDFS, Yarn and Spark on 3 Ubuntu 18.04 Machines
briandi26/Machine-Learning-for-Forest-Fire-Prediction
Machine Learning for Forest Fire Prediction using Hadoop ecosystems and Spark Tools (Pyspark)
HxnDev/Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.
Ren294/Log-Analysis-Project
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
Reza-Marzban/Vehicle-Fuel-Hadoop-MapReduce
Vehicle Fuel Hadoop MapReduce
prithvianilk/rdfs
An attempt to make a reliable, distributed file system inspired by HDFS
waikeungt/hdfs-spring-boot-starter
用于spring boot快捷使用HDFS的starter