hadoop

There are 3343 repositories under hadoop topic.

Nagios-Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Language:Python1.1k
Addax
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Language:Java1.1k
kylo
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Language:Java1.1k
UserActionAnalyzePlatform
电商用户行为分析大数据平台
Language:Java934
data-engineering-interview-questions
More than 2000+ Data engineer interview questions.
917
hadoop_study
定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)
Language:Java912
ozone
Scalable, redundant, and distributed object store for Apache Hadoop
Language:Java783
DevOps-Python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Language:Python736
TonY
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Language:Java698
BigData
💎🔥大数据学习笔记
Language:Java664
WeDataSphere
WeDataSphere is a financial grade, one-stop big data platform suite.
639
Data-Science-EBooks
Data Science E-books, Interview Resources and Cheat-sheets
623
dist-keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Language:Python623
spline
Data Lineage Tracking And Visualization Solution
Language:Scala584
gis-tools-for-hadoop
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
514
xichuan_note
xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件等📚
Language:Java488
BiSheServer
本系统是我的毕业设计项目，题目为“基于用户画像的电影推荐系统的设计与实现”。主要是以Django作为基础框架，采用MTV模式，数据库使用MongoDB、MySQL和Redis，以从豆瓣平台爬取的电影数据作为基础数据源，主要基于用户的基本信息和使用操作记录等行为信息来开发用户标签，并使用Hadoop、Spark大数据组件进行分析和处理的推荐系统。管理系统使用的是Django自带的管理系统，并使用simpleui进行了美化。
Language:Python477
kafka-connect-hdfs
Kafka Connect HDFS connector
Language:Java476
marmaray
Generic Data Ingestion & Dispersal Library for Hadoop
Language:Java472
iceberg
Iceberg is a table format for large, slow-moving tabular data
Language:Java467
tez
Apache Tez
Language:Java465
big_data_architect_skills
一个大数据架构师应该掌握的技能
458
venice
Venice, Derived Data Platform for Planet-Scale Workloads.
Language:Java432
shopzz
后端使用 SpringCloud Alibaba 开发，移动端使用 React Native 构建，管理后台使用 Arco Design 进行构建，并在支付上接入数字货币（比特币、以太坊UDST、平台Token）支付，后端采用 Hadoop 与 Flink 等大数据框架构建实时计算与离线计算体系。
Language:Java403
CloudEon
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.
Language:Java399
bigdata_docker
Big Data Ecosystem Docker
Language:VBA375
cloudbreak
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Language:Java351
ytk-learn
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Language:Java347
cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Language:Java342
caelus
Set of Kubernetes solutions for reusing idle resources of nodes by running extra batch jobs
Language:Go337
elasticluster
Create clusters of VMs on the cloud and configure them with Ansible.
Language:Python335
cascading
All development now happens over here: https://github.com/cwensel/cascading. Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms.
Language:Java331
Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
Language:Jupyter Notebook324
compass
Compass is a task diagnosis platform for bigdata
Language:Java321
big-whale
Spark、Flink等离线任务的调度以及实时任务的监控
Language:Java294
hadoop-mini-clusters
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Language:Java289