hadoop

There are 3343 repositories under hadoop topic.

  • Nagios-Plugins

    450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

    Language:Python1.1k
  • Addax

    Addax

    Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.

    Language:Java1.1k
  • kylo

    Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

    Language:Java1.1k
  • UserActionAnalyzePlatform

    电商用户行为分析大数据平台

    Language:Java934
  • data-engineering-interview-questions

    More than 2000+ Data engineer interview questions.

  • hadoop_study

    定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)

    Language:Java912
  • ozone

    Scalable, redundant, and distributed object store for Apache Hadoop

    Language:Java783
  • DevOps-Python-tools

    80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

    Language:Python736
  • TonY

    TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

    Language:Java698
  • BigData

    💎🔥大数据学习笔记

    Language:Java664
  • WeDataSphere

    WeDataSphere is a financial grade, one-stop big data platform suite.

  • Data-Science-EBooks

    Data Science E-books, Interview Resources and Cheat-sheets

  • dist-keras

    Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

    Language:Python623
  • spline

    Data Lineage Tracking And Visualization Solution

    Language:Scala584
  • gis-tools-for-hadoop

    The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.

  • xichuan_note

    xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件等📚

    Language:Java488
  • BiSheServer

    本系统是我的毕业设计项目,题目为“基于用户画像的电影推荐系统的设计与实现”。主要是以Django作为基础框架,采用MTV模式,数据库使用MongoDB、MySQL和Redis,以从豆瓣平台爬取的电影数据作为基础数据源,主要基于用户的基本信息和使用操作记录等行为信息来开发用户标签,并使用Hadoop、Spark大数据组件进行分析和处理的推荐系统。管理系统使用的是Django自带的管理系统,并使用simpleui进行了美化。

    Language:Python477
  • kafka-connect-hdfs

    Kafka Connect HDFS connector

    Language:Java476
  • marmaray

    Generic Data Ingestion & Dispersal Library for Hadoop

    Language:Java472
  • iceberg

    Iceberg is a table format for large, slow-moving tabular data

    Language:Java467
  • tez

    Apache Tez

    Language:Java465
  • big_data_architect_skills

    一个大数据架构师应该掌握的技能

  • venice

    venice

    Venice, Derived Data Platform for Planet-Scale Workloads.

    Language:Java432
  • shopzz

    后端使用 SpringCloud Alibaba 开发,移动端使用 React Native 构建,管理后台使用 Arco Design 进行构建,并在支付上接入数字货币(比特币、以太坊UDST、平台Token)支付,后端采用 Hadoop 与 Flink 等大数据框架构建实时计算与离线计算体系。

    Language:Java403
  • CloudEon

    CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

    Language:Java399
  • bigdata_docker

    Big Data Ecosystem Docker

    Language:VBA375
  • cloudbreak

    cloudbreak

    CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.

    Language:Java351
  • ytk-learn

    Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

    Language:Java347
  • cascading

    Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

    Language:Java342
  • caelus

    Set of Kubernetes solutions for reusing idle resources of nodes by running extra batch jobs

    Language:Go337
  • elasticluster

    Create clusters of VMs on the cloud and configure them with Ansible.

    Language:Python335
  • cascading

    All development now happens over here: https://github.com/cwensel/cascading. Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms.

    Language:Java331
  • Spark-with-Python

    Spark-with-Python

    Fundamentals of Spark with Python (using PySpark), code examples

    Language:Jupyter Notebook324
  • compass

    Compass is a task diagnosis platform for bigdata

    Language:Java321
  • big-whale

    Spark、Flink等离线任务的调度以及实时任务的监控

    Language:Java294
  • hadoop-mini-clusters

    hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE

    Language:Java289