hdfs

There are 1027 repositories under hdfs topic.

seaweedfs/seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.
Language:Go25.8k 530 3.2k2.5k
heibaiying/BigData-Notes
大数据入门指南 :star:
Language:Java16.7k 448 424.3k
ceph/ceph
Ceph is a distributed object, block, and file storage platform
Language:C++15.5k 642 06.2k
juicedata/juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Language:Go12.2k 113 1.7k1.1k
wangzhiwubigdata/God-Of-BigData
专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
10.3k 331 03.2k
piskvorky/smart_open
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Language:Python3.4k 46 415386
TileDB-Inc/TileDB
The Universal Storage Engine
Language:C++2k 70 951197
collabH/bigdata-growth
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Language:Shell1.7k 34 4384
water8394/BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
1.6k 53 3447
colinmarc/hdfs
A native go client for HDFS
Language:Go1.4k 36 198357
wgzhao/Addax
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL seamlessly
Language:Java1.3k 35 320318
spotify/snakebite
A pure python HDFS client
Language:Python857 128 132216
HariSekhon/DevOps-Python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Language:Python806 41 6348
sunnyandgood/BigData
💎🔥大数据学习笔记
Language:Java681 30 2231
Stratio/sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
Language:Scala528 136 542196
lensesio/kafka-connect-ui
Deprecated - See Lenses.io Community Edition
Language:JavaScript515 24 64132
dromara/CloudEon
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Language:FreeMarker479 13 104119
fabiogjardim/bigdata_docker
Big Data Ecosystem Docker
Language:VBA423 25 9327
uber/storagetapper
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Language:Go360 22 2866
tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
Language:Jupyter Notebook352 10 0273
datawhalechina/juicy-bigdata
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Language:Python336 5 343
hegongshan/File-System-Paper
Must-read Papers for File System (FS)
302 3 031
Eugene-Mark/bigdata-file-viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Language:Java301 3 3155
wradlib/wradlib
weather radar data processing - python package
Language:Python288 23 24885
divolte/divolte-collector
Divolte Collector
Language:Java282 29 14576
mtth/hdfs
API and command line interface for HDFS
Language:Python274 14 151103
RumbleDB/rumble
Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more
Language:Java227 24 34584
breuner/elbencho
A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
Language:C++225 14 4528
helyim/helyim
seaweedfs implemented in pure Rust
Language:Rust202 6 3222
TileDB-Inc/TileDB-Py
Python interface to the TileDB storage engine
Language:Python197 27 96436
PaddlePaddle/ElasticCTR
ElasticCTR，即飞桨弹性计算推荐系统，是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务，帮助用户在Kubernetes环境中一键完成推荐系统部署，具备高性能、工业级部署、端到端体验的特点，并且作为开源套件，满足二次深度开发的需求。
Language:Python185 8 344
apssouza22/big-data-pipeline-lambda-arch
A hybrid Big Data pipeline architecture that combines a real-time streaming layer with a batch layer to process large datasets(Lambda Architecture)
Language:Java182 9 283
d2iq-archive/dcos-commons
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Language:Java157 89 0170
marcelmay/hadoop-hdfs-fsimage-exporter
Exports Hadoop HDFS content statistics to Prometheus
Language:Java157 5 8948
megvii-research/megfile
Megvii FILE Library - Working with Files in Python same as the standard library
Language:Python153 6 5619
avast/hdfs-shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Language:Java151 30 1436

hdfs

seaweedfs/seaweedfs

heibaiying/BigData-Notes

ceph/ceph

juicedata/juicefs

wangzhiwubigdata/God-Of-BigData

piskvorky/smart_open

TileDB-Inc/TileDB

collabH/bigdata-growth

water8394/BigData-Interview

colinmarc/hdfs

wgzhao/Addax

spotify/snakebite

HariSekhon/DevOps-Python-tools

sunnyandgood/BigData

Stratio/sparta

lensesio/kafka-connect-ui

dromara/CloudEon

fabiogjardim/bigdata_docker

uber/storagetapper

tirthajyoti/Spark-with-Python

datawhalechina/juicy-bigdata

hegongshan/File-System-Paper

Eugene-Mark/bigdata-file-viewer

wradlib/wradlib

divolte/divolte-collector

mtth/hdfs

RumbleDB/rumble

breuner/elbencho

helyim/helyim

TileDB-Inc/TileDB-Py

PaddlePaddle/ElasticCTR

apssouza22/big-data-pipeline-lambda-arch

d2iq-archive/dcos-commons

marcelmay/hadoop-hdfs-fsimage-exporter

megvii-research/megfile

avast/hdfs-shell