hdfs
There are 1027 repositories under hdfs topic.
seaweedfs/seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.
heibaiying/BigData-Notes
大数据入门指南 :star:
ceph/ceph
Ceph is a distributed object, block, and file storage platform
juicedata/juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
wangzhiwubigdata/God-Of-BigData
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
piskvorky/smart_open
Utils for streaming large files (S3, HDFS, gzip, bz2...)
TileDB-Inc/TileDB
The Universal Storage Engine
collabH/bigdata-growth
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
water8394/BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
colinmarc/hdfs
A native go client for HDFS
wgzhao/Addax
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL seamlessly
spotify/snakebite
A pure python HDFS client
HariSekhon/DevOps-Python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
sunnyandgood/BigData
💎🔥大数据学习笔记
Stratio/sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
lensesio/kafka-connect-ui
Deprecated - See Lenses.io Community Edition
dromara/CloudEon
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
fabiogjardim/bigdata_docker
Big Data Ecosystem Docker
uber/storagetapper
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
datawhalechina/juicy-bigdata
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
hegongshan/File-System-Paper
Must-read Papers for File System (FS)
Eugene-Mark/bigdata-file-viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
wradlib/wradlib
weather radar data processing - python package
divolte/divolte-collector
Divolte Collector
mtth/hdfs
API and command line interface for HDFS
RumbleDB/rumble
Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more
breuner/elbencho
A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
helyim/helyim
seaweedfs implemented in pure Rust
TileDB-Inc/TileDB-Py
Python interface to the TileDB storage engine
PaddlePaddle/ElasticCTR
ElasticCTR,即飞桨弹性计算推荐系统,是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务,帮助用户在Kubernetes环境中一键完成推荐系统部署,具备高性能、工业级部署、端到端体验的特点,并且作为开源套件,满足二次深度开发的需求。
apssouza22/big-data-pipeline-lambda-arch
A hybrid Big Data pipeline architecture that combines a real-time streaming layer with a batch layer to process large datasets(Lambda Architecture)
d2iq-archive/dcos-commons
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
marcelmay/hadoop-hdfs-fsimage-exporter
Exports Hadoop HDFS content statistics to Prometheus
megvii-research/megfile
Megvii FILE Library - Working with Files in Python same as the standard library
avast/hdfs-shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS