hadoop-filesystem
There are 101 repositories under hadoop-filesystem topic.
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
GoogleCloudDataproc/hadoop-connectors
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
linkedin/dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
mmolimar/kafka-connect-fs
Kafka Connect FileSystem Connector
jingw/pyhdfs
Python HDFS client
longshilin/HDFS-Netdisc
基于Hadoop的分布式云存储系统 :palm_tree:
vivek2319/Learn-Hadoop-and-Spark
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
palantir/hadoop-crypto
Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
ExpediaGroup/datasqueeze
Hadoop utility to compact small files
pfisterer/apache-hadoop-helm
Helm chart for Apache Hadoop using multi-arch docker images
averyzhong/hdfs-over-sftp
SFTP server which works on the top of HDFS,It is based on Apache sshd to access and operate HDFS through SFTP protocol
procter-gamble-oss/octopufs
OctopuFS library helps managing cloud storage, ADLSgen2 specifically. It allows you to operate on files (moving, copying, setting ACLs) in very efficient manner. Designed to work on databricks, but should work on any other platform as well.
waltherg/distributable_docker_sql_on_hadoop
Toy Hadoop cluster combining various SQL-on-Hadoop variants
AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
Tapad/sbt-hadoop-oss
An sbt plugin for publishing artifacts to HDFS.
fasouto/webhdfspy
Python wrapper to access Hadoop HDFS REST API
christopherkindl/twitter-data-pipeline-using-airflow-and-apache-spark
Data pipeline to process and analyse Twitter data in a distributed fashion using Apache Spark and Airflow in AWS environment
jazzwang/hadoop_labs
MapReduce Java Code Examples to learn Hadoop
aadishgoel/Hadoop-Codes
Neat and Handy Place for all Hadoop codes
TritonDataCenter/hadoop-manta
Hadoop Filesystem Driver for Manta
HxnDev/Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.
CUBigDataClass/soccer-tweet-analysis
Ingestion pipeline to analyze soccer tweets
HxnDev/Hadoop-MapReduce-to-Find-Average-Length-of-Comments
In this task, we had to find the average length of comments given in the dataset. It was done using Hadoop MapReduce and Hadoop HDFS.
Mohammed-siddiq/hadoop-XMLInputFormatWithMultipleTags
Mahout's XMLInputFormat with support for multiple input and output tags.
swan815/MyFirstHadoopYunpan
基于hadoop的简易云盘实现
tchaye59/Hadoop-Perfect-File
A Fast access container for small files
rshad/OpenCCML
Category: Cloud Computing and Machine Learning Application - Subject: A cloud platform to make data processing with machine learning algorithms, built on Openstack, using Spark for data distribution and Hadoop Filesystem for data storage
SarahAyaz/YouTube_Data_Analysis
Analysis of YouTube Data using Hadoop Mapreduce framework in Java.
alex-ber/docker-hive
EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5
huangyueranbbc/hadoop05_pagerank
pagerank hadoop
humanbeeng/hadoop-auto-install
A small helper script that can save your valuable time during installation of Apache Hadoop.
HwaiTengTeoh/Airbnb-Big-Data-Management
To develop an Airbnb database and create a pipeline using MongoDB and Hadoop architecture to ease the process of managing, loading, processing, querying, and analyzing Airbnb data based on location
mikeroyal/Apache-Hadoop-Guide
Apache Hadoop Guide
Niranjankumar-c/DataAnalytics_using_ClickstreamData
Casestudy completed as part of BigData training from analytix labs
Rohit9314/my-hadoop
Setup hadoop cluster manually and automatically
samarthtambad/big-data-pl
Analysing programming languages by community characteristics on Github and StackOverflow