mapreduce

There are 1413 repositories under mapreduce topic.

donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Language:Python28.6k 1.6k 418k
heibaiying/BigData-Notes
大数据入门指南 :star:
Language:Java16.7k 448 424.3k
PowerJob/PowerJob
Enterprise job scheduling middleware with distributed computing ability.
Language:Java7.6k 123 9871.3k
douban/dpark
Python clone of Spark, a MapReduce alike framework in Python
Language:Python2.7k 263 61530
collabH/bigdata-growth
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Language:Shell1.7k 34 4386
water8394/BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
1.6k 53 3446
mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Language:Java1.1k 126 26659
microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
Language:C#940 139 195208
happyer/distributed-computing
distributed_computing include mapreduce kvstore etc.
Language:Go843 26 8214
cdapio/cdap
An open source framework for building data analytic applications.
Language:Java783 93 0349
bcongdon/corral
🐎 A serverless MapReduce framework written for AWS Lambda
Language:Go694 20 1040
sunnyandgood/BigData
💎🔥大数据学习笔记
Language:Java681 29 2229
grailbio/bigslice
A serverless cluster computing system for the Go programming language
Language:Go556 24 2335
apache/uniffle
Uniffle is a high performance, general purpose Remote Shuffle Service.
Language:Java429 17 1.1k161
CamDavidsonPilon/tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Language:Python403 9 3654
cubefs/compass
Compass is a task diagnosis platform for bigdata
Language:Java403 18 148148
RedisGears/RedisGears
Dynamic execution framework for your Redis data
Language:Rust382 18 27668
cwensel/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Language:Java352 33 23219
datawhalechina/juicy-bigdata
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Language:Python343 4 345
DigitalPebble/behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Language:Java283 40 4259
Tencent/Firestorm
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Language:Java257 9 4871
BWbwchen/MapReduce
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
Language:Go225 8 113
mahmoudparsian/data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Language:Python223 12 497
xingdl2007/6.824-2017
:zap: 6.824: Distributed Systems (Spring 2017). A course which present abstractions and implementation techniques for engineering distributed systems.
214 4 277
lynnlangit/learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Language:HTML202 16 0167
kevwan/mapreduce
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Language:Go174 2 124
mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language:HTML161 26 0143
razertory/MIT6.824-Java
Java 实现的分布式系统课程(MIT6.824)
Language:Java152 2 827
CocaineCong/tangseng
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎，信息检索系统
Language:Go135 0 1241
touero/ctenopharyngodon-idella
Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.
Language:Java135 1 11
mimecast/dtail
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Language:Go130 8 1110
asakusafw/asakusafw
Asakusa Framework
Language:Java117 12 47014
miguno/avro-hadoop-starter
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Language:Java115 18 283
feng-li/Distributed-Statistical-Computing
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
Language:HTML108 4 066
maxis42/Big-Data-Engineering-Coursera-Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Language:Jupyter Notebook102 2 474
Refefer/Dampr
Python Data Processing library
Language:Python102 7 06

mapreduce

donnemartin/data-science-ipython-notebooks

heibaiying/BigData-Notes

PowerJob/PowerJob

douban/dpark

collabH/bigdata-growth

water8394/BigData-Interview

mahmoudparsian/data-algorithms-book

microsoft/Mobius

happyer/distributed-computing

cdapio/cdap

bcongdon/corral

sunnyandgood/BigData

grailbio/bigslice

apache/uniffle

CamDavidsonPilon/tdigest

cubefs/compass

RedisGears/RedisGears

cwensel/cascading

datawhalechina/juicy-bigdata

DigitalPebble/behemoth

Tencent/Firestorm

BWbwchen/MapReduce

mahmoudparsian/data-algorithms-with-spark

xingdl2007/6.824-2017

lynnlangit/learning-hadoop-and-spark

kevwan/mapreduce

mahmoudparsian/big-data-mapreduce-course

razertory/MIT6.824-Java

CocaineCong/tangseng

touero/ctenopharyngodon-idella

mimecast/dtail

asakusafw/asakusafw

miguno/avro-hadoop-starter

feng-li/Distributed-Statistical-Computing

maxis42/Big-Data-Engineering-Coursera-Yandex

Refefer/Dampr