hadoop-hdfs
There are 323 repositories under hadoop-hdfs topic.
seaweedfs/seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.
OBenner/data-engineering-interview-questions
More than 2000+ Data engineer interview questions.
Morphl-AI/MorphL-Community-Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
linkedin/dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
groda/big_data
Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.
IBM/sparksql-for-hbase
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
maniram-yadav/Big_DataHadoop_Projects
Big data projects implemented by Maniram yadav
jarlor/TravelWebsite_BigDataAnalysis
旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)
Smart-Shaped/chaM3Leon
By Smart Shaped s.r.l. (https://www.smartshaped.com/)
hokstack/hok-helm
HokStack - Run Hadoop Stack on Kubernetes
hadoop-sandbox/hadoop-sandbox
A fully-functional Hadoop Yarn cluster as docker-compose deployment.
SepehrImanian/ansible-hadoop-hdfs
Ansible Playbook For Setup Hadoop HDFS
torqbit/databox
Open source data infrastructure platform. Designed for developers, built for speed.
pfisterer/apache-hadoop-helm
Helm chart for Apache Hadoop using multi-arch docker images
lucas91batista/twitter-hashtag-graph
Twitter + Flume + Hadoop (HDFS, MapReduce) + Neo4j + Pyhton
PChou/marayarn
Marathon on yarn
alagrede/HdfsClient
A Java Hdfs client example and full Kerberos example for call hadoop commands directly in java code or on your local machine.
waltherg/distributable_docker_sql_on_hadoop
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Areesha-Tahir/Hadoop-MapReduce-Sentiment-Analysis-Through-Keywords
A MapReduce program to conduct sentiment analysis of a keyword from a list of comments.
ds2-lab/LambdaFS
λFS: an elastic, high-performance, serverless-function-based metadata service for large-scale distributed file systems (ACM ASPLOS'23)
Mahmoud-nfz/football-big-data
This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.
Amir2244/movies-rating
"movies-rating" is a recommendation system project that leverages distributed frameworks. Which includes services such as Hadoop Namenode, Hadoop Datanode, Spark Master, Spark Worker, and Redis.
jodth07/hadoop-installation
Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04
leibniz21c/mammoth
Mammoth is a container based hadoop distributed system log analyzer. Sponsed by Mantech and Naver Cloud Platform.
mgarralda/hadoop-spark-cluster
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
LMAPcoder/Hadoop-on-Colab
Installation and configuration of Hadoop on Google Colaboratory
Ren294/Covid-Data-Process
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.
briandi26/Machine-Learning-for-Forest-Fire-Prediction
Machine Learning for Forest Fire Prediction using Hadoop ecosystems and Spark Tools (Pyspark)
hadoop-sandbox/hadoop-sandbox-images
Docker image builds for Hadoop sandbox.
HxnDev/Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword
In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.
prabal03/python-automation-in-linux
Python automation in linux
Ren294/Log-Analysis-Project
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
berksudan/Distributed-Environment-Installation-Guide
Install Hadoop, HDFS, Yarn and Spark on 3 Ubuntu 18.04 Machines
HxnDev/Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.