hdfs

There are 918 repositories under hdfs topic.

  • seaweedfs

    seaweedfs/seaweedfs

    SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

    Language:Go21.4k5352.7k2.2k
  • heibaiying/BigData-Notes

    大数据入门指南 :star:

    Language:Java15.4k443434.2k
  • ceph

    ceph/ceph

    Ceph is a distributed object, block, and file storage platform

    Language:C++13.4k65305.9k
  • juicefs

    juicedata/juicefs

    JuiceFS is a distributed POSIX file system built on top of Redis and S3.

    Language:Go9.9k1121.3k877
  • wangzhiwubigdata/God-Of-BigData

    专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

  • piskvorky/smart_open

    Utils for streaming large files (S3, HDFS, gzip, bz2...)

    Language:Python3.1k48391379
  • TileDB-Inc/TileDB

    The Universal Storage Engine

    Language:C++1.8k71887179
  • water8394/BigData-Interview

    :dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

  • colinmarc/hdfs

    A native go client for HDFS

    Language:Go1.4k38192339
  • collabH/bigdata-growth

    大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

    Language:Shell1.3k304331
  • Addax

    wgzhao/Addax

    Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.

    Language:Java1.1k32282288
  • spotify/snakebite

    A pure python HDFS client

    Language:Python858128132216
  • HariSekhon/DevOps-Python-tools

    80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

    Language:Python736426335
  • sunnyandgood/BigData

    💎🔥大数据学习笔记

    Language:Java665302225
  • Stratio/sparta

    Real Time Analytics and Data Pipelines based on Spark Streaming

    Language:Scala524138542197
  • lensesio/kafka-connect-ui

    Web tool for Kafka Connect |

    Language:JavaScript4962563131
  • confluentinc/kafka-connect-hdfs

    Kafka Connect HDFS connector

    Language:Java476315319396
  • dromara/CloudEon

    CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

    Language:Java396138793
  • fabiogjardim/bigdata_docker

    Big Data Ecosystem Docker

    Language:VBA373259301
  • uber/storagetapper

    StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

    Language:Go335242867
  • Spark-with-Python

    tirthajyoti/Spark-with-Python

    Fundamentals of Spark with Python (using PySpark), code examples

    Language:Jupyter Notebook324100259
  • divolte/divolte-collector

    Divolte Collector

    Language:Java2833014578
  • Eugene-Mark/bigdata-file-viewer

    A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

    Language:Java28233054
  • mtth/hdfs

    API and command line interface for HDFS

    Language:Python2671515198
  • wradlib/wradlib

    weather radar data processing - python package

    Language:Python2522523877
  • datawhalechina/juicy-bigdata

    🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

    Language:Python2425335
  • RumbleDB/rumble

    ⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

    Language:Java2082731480
  • hegongshan/File-System-Paper

    Must-read Papers for File System (FS)

  • PaddlePaddle/ElasticCTR

    ElasticCTR,即飞桨弹性计算推荐系统,是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务,帮助用户在Kubernetes环境中一键完成推荐系统部署,具备高性能、工业级部署、端到端体验的特点,并且作为开源套件,满足二次深度开发的需求。

    Language:Python1799345
  • TileDB-Inc/TileDB-Py

    Python interface to the TileDB storage engine

    Language:Python1783290731
  • mesosphere/dcos-commons

    DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.

    Language:Java157930175
  • elbencho

    breuner/elbencho

    A distributed storage benchmark for file systems, object stores & block devices with support for GPUs

    Language:C++152152721
  • avast/hdfs-shell

    HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

    Language:Java150321433
  • jcrist/skein

    A tool and library for easily deploying applications on Apache YARN

    Language:Python1411111638
  • marcelmay/hadoop-hdfs-fsimage-exporter

    Exports Hadoop HDFS content statistics to Prometheus

    Language:Java14168445
  • mullerhai/HsunTzu

    HDFS compress tar zip snappy gzip uncompress untar codec hadoop spark

    Language:Scala1349038