mapreduce
There are 1347 repositories under mapreduce topic.
donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
heibaiying/BigData-Notes
大数据入门指南 :star:
PowerJob/PowerJob
Enterprise job scheduling middleware with distributed computing ability.
douban/dpark
Python clone of Spark, a MapReduce alike framework in Python
water8394/BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
collabH/bigdata-growth
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
happyer/distributed-computing
distributed_computing include mapreduce kvstore etc.
cdapio/cdap
An open source framework for building data analytic applications.
bcongdon/corral
🐎 A serverless MapReduce framework written for AWS Lambda
sunnyandgood/BigData
💎🔥大数据学习笔记
grailbio/bigslice
A serverless cluster computing system for the Go programming language
apache/incubator-uniffle
Uniffle is a high performance, general purpose Remote Shuffle Service.
CamDavidsonPilon/tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
RedisGears/RedisGears
Dynamic execution framework for your Redis data
cubefs/compass
Compass is a task diagnosis platform for bigdata
cwensel/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
datawhalechina/juicy-bigdata
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
DigitalPebble/behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Tencent/Firestorm
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
BWbwchen/MapReduce
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
xingdl2007/6.824-2017
:zap: 6.824: Distributed Systems (Spring 2017). A course which present abstractions and implementation techniques for engineering distributed systems.
mahmoudparsian/data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
lynnlangit/learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
kevwan/mapreduce
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
razertory/MIT6.824-Java
Java 实现的分布式系统课程(MIT6.824)
touero/ctenopharyngodon-idella
Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities.
mimecast/dtail
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
CocaineCong/tangseng
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统
asakusafw/asakusafw
Asakusa Framework
miguno/avro-hadoop-starter
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
feng-li/Distributed-Statistical-Computing
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
maxis42/Big-Data-Engineering-Coursera-Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Refefer/Dampr
Python Data Processing library