hdfs
There are 918 repositories under hdfs topic.
seaweedfs/seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
heibaiying/BigData-Notes
大数据入门指南 :star:
ceph/ceph
Ceph is a distributed object, block, and file storage platform
juicedata/juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
wangzhiwubigdata/God-Of-BigData
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
piskvorky/smart_open
Utils for streaming large files (S3, HDFS, gzip, bz2...)
TileDB-Inc/TileDB
The Universal Storage Engine
water8394/BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
colinmarc/hdfs
A native go client for HDFS
collabH/bigdata-growth
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
wgzhao/Addax
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
spotify/snakebite
A pure python HDFS client
HariSekhon/DevOps-Python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
sunnyandgood/BigData
💎🔥大数据学习笔记
Stratio/sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
lensesio/kafka-connect-ui
Web tool for Kafka Connect |
confluentinc/kafka-connect-hdfs
Kafka Connect HDFS connector
dromara/CloudEon
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
fabiogjardim/bigdata_docker
Big Data Ecosystem Docker
uber/storagetapper
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
divolte/divolte-collector
Divolte Collector
Eugene-Mark/bigdata-file-viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
mtth/hdfs
API and command line interface for HDFS
wradlib/wradlib
weather radar data processing - python package
datawhalechina/juicy-bigdata
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
RumbleDB/rumble
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
hegongshan/File-System-Paper
Must-read Papers for File System (FS)
PaddlePaddle/ElasticCTR
ElasticCTR,即飞桨弹性计算推荐系统,是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务,帮助用户在Kubernetes环境中一键完成推荐系统部署,具备高性能、工业级部署、端到端体验的特点,并且作为开源套件,满足二次深度开发的需求。
TileDB-Inc/TileDB-Py
Python interface to the TileDB storage engine
mesosphere/dcos-commons
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
breuner/elbencho
A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
avast/hdfs-shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
jcrist/skein
A tool and library for easily deploying applications on Apache YARN
marcelmay/hadoop-hdfs-fsimage-exporter
Exports Hadoop HDFS content statistics to Prometheus
mullerhai/HsunTzu
HDFS compress tar zip snappy gzip uncompress untar codec hadoop spark