hdfs

There are 1027 repositories under hdfs topic.

  • seaweedfs

    seaweedfs/seaweedfs

    SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.

    Language:Go25.8k5303.2k2.5k
  • heibaiying/BigData-Notes

    大数据入门指南 :star:

    Language:Java16.7k448424.3k
  • ceph

    ceph/ceph

    Ceph is a distributed object, block, and file storage platform

    Language:C++15.5k64206.2k
  • juicefs

    juicedata/juicefs

    JuiceFS is a distributed POSIX file system built on top of Redis and S3.

    Language:Go12.2k1131.7k1.1k
  • wangzhiwubigdata/God-Of-BigData

    专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

  • piskvorky/smart_open

    Utils for streaming large files (S3, HDFS, gzip, bz2...)

    Language:Python3.4k46415386
  • TileDB-Inc/TileDB

    The Universal Storage Engine

    Language:C++2k70951197
  • collabH/bigdata-growth

    大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

    Language:Shell1.7k344384
  • water8394/BigData-Interview

    :dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

  • colinmarc/hdfs

    A native go client for HDFS

    Language:Go1.4k36198357
  • Addax

    wgzhao/Addax

    A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL seamlessly

    Language:Java1.3k35320318
  • spotify/snakebite

    A pure python HDFS client

    Language:Python857128132216
  • HariSekhon/DevOps-Python-tools

    80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

    Language:Python806416348
  • sunnyandgood/BigData

    💎🔥大数据学习笔记

    Language:Java681302231
  • Stratio/sparta

    Real Time Analytics and Data Pipelines based on Spark Streaming

    Language:Scala528136542196
  • lensesio/kafka-connect-ui

    Deprecated - See Lenses.io Community Edition

    Language:JavaScript5152464132
  • dromara/CloudEon

    CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

    Language:FreeMarker47913104119
  • fabiogjardim/bigdata_docker

    Big Data Ecosystem Docker

    Language:VBA423259327
  • uber/storagetapper

    StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

    Language:Go360222866
  • Spark-with-Python

    tirthajyoti/Spark-with-Python

    Fundamentals of Spark with Python (using PySpark), code examples

    Language:Jupyter Notebook352100273
  • datawhalechina/juicy-bigdata

    🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

    Language:Python3365343
  • hegongshan/File-System-Paper

    Must-read Papers for File System (FS)

  • Eugene-Mark/bigdata-file-viewer

    A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

    Language:Java30133155
  • wradlib/wradlib

    weather radar data processing - python package

    Language:Python2882324885
  • divolte/divolte-collector

    Divolte Collector

    Language:Java2822914576
  • mtth/hdfs

    API and command line interface for HDFS

    Language:Python27414151103
  • RumbleDB/rumble

    Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more

    Language:Java2272434584
  • elbencho

    breuner/elbencho

    A distributed storage benchmark for file systems, object stores & block devices with support for GPUs

    Language:C++225144528
  • helyim/helyim

    seaweedfs implemented in pure Rust

    Language:Rust20263222
  • TileDB-Inc/TileDB-Py

    Python interface to the TileDB storage engine

    Language:Python1972796436
  • PaddlePaddle/ElasticCTR

    ElasticCTR,即飞桨弹性计算推荐系统,是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务,帮助用户在Kubernetes环境中一键完成推荐系统部署,具备高性能、工业级部署、端到端体验的特点,并且作为开源套件,满足二次深度开发的需求。

    Language:Python1858344
  • apssouza22/big-data-pipeline-lambda-arch

    A hybrid Big Data pipeline architecture that combines a real-time streaming layer with a batch layer to process large datasets(Lambda Architecture)

    Language:Java1829283
  • d2iq-archive/dcos-commons

    DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.

    Language:Java157890170
  • marcelmay/hadoop-hdfs-fsimage-exporter

    Exports Hadoop HDFS content statistics to Prometheus

    Language:Java15758948
  • megvii-research/megfile

    Megvii FILE Library - Working with Files in Python same as the standard library

    Language:Python15365619
  • avast/hdfs-shell

    HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

    Language:Java151301436