Big Data

Spark

How-to: Tune Your Apache Spark Jobs

Spark性能优化:开发调优篇

Spark性能优化:资源调优篇

Spark性能优化:数据倾斜调优

Spark性能优化:shuffle调优

Spark性能优化指南——基础篇 by 李雪蕤

Spark性能优化指南——高级篇 by 李雪蕤

Spark on Angel:Spark机器学习的核心加速器

Zeppelin

Demo notebooks for Apache Zeppelin

Data App development with Zeppelin & AngularJS

A comprehensive comparison of Jupyter vs. Zeppelin

Kafka

深入浅出理解基于 Kafka 和 ZooKeeper 的分布式消息队列

Zookeeper

Kafka 架构中 ZooKeeper 以怎样的形式存在?

ML & DL

SGD

深入浅出--梯度下降法及其实现

Evaluation

机器学习中的偏差和方差

Machine Learning Basic

机器学习中的数学(3)-模型组合(Model Combining)之Boosting与Gradient Boosting

Learning Rate

Understanding Learning Rates and How It Improves Performance in Deep Learning

Feature Embedding

推荐系统遇上深度学习(一)--FM模型理论和实践

推荐系统遇上深度学习(二)--FFM模型理论和实践

推荐系统遇上深度学习(三)--DeepFM模型理论和实践

推荐系统遇上深度学习(四)--多值离散特征的embedding解决方案

Machine Learning

LightGBM

Complete Guide to Parameter Tuning in Gradient Boosting (GBM) in Python

GBM vs xgboost vs lightGBM

Feisky

机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 1)

机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)

Machine Learning in Action(机器学习实战)

Spark 随机森林算法原理、源码分析及案例实战

K-means聚类算法的三种改进(K-means++,ISODATA,Kernel K-means)介绍与对比

Deep Learning

TensorFlow Machine Learning Cookbook

斗大的熊猫

Auto ML

Auto-sklearn

AUTOML

TPOT

H2O Auto-ML

Penn AI

Hyperopt

Hyperparameter optimization

How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras

AutoML comparison

Xcessiv

Serving

ModelDB: A system to manage ML models

NLP

displaCy Named Entity Visualizer

Natural Language Processing With Apache Spark

码农场

Document Similarity

WMD(Word Mover’s Distanc)

Similarity measure of textual documents

similarities.docsim 文档相似查询

Compute sentence similarity using Wordnet

文本主题模型之潜在语义索引(LSI)

OpenKG.CN

Face Recognition

OpenFace

FaceNet

人脸验证算法Joint Bayesian详解及实现

Micro Servcie & Container

Kubernates

Migrating applications, clusters, and Kubernetes to etcd v3

TensorFlow on Kubernetes

Monitoring Kubernetes performance metrics

Tutorial : Getting Started with Kubernetes on your Windows Laptop with Minikube

Micro Service

The twelve-factor App

12因素应用

Front-End

StaticGen

Docs

H2O Tutorials

DMTK

Dataiku

Products

Deep Learning Studio

Architecture

Stack Overflow: The Architecture - 2016 Edition

Raft

Raft Understandable Distributed Consensus 可视化演示

SOFAJRaft 实现原理 - 生产级 Raft 算法库存储模块剖析

蚂蚁金服开源 SOFAJRaft 详解| 生产级高性能 Java 实现

SOFAJRaft 实现原理 - SOFAJRaft-RheaKV 是如何使用 Raft 的

Raft 理论基础

Gitbook

Python文本数据分析初学指南

SparkInternals@JerryLead

Spark 编程指南简体中文版

spark机器学习算法研究和源码分析

Kubernetes中文文档

Kubernetes Handbook

Paper

Distributed TensorFlow with MPI

A summary of deep models for face recognition

Awesome

awesome-nlp

awesome-chinese-nlp

awesome-deeplearing-resources

awesome-streaming

awesome-cheetsheets

Slides

MLlib and All-pairs Similarity

NoSQL

NoSQL Databases: a Survey and Decision Guidance

Cloud

Selecting a Cloud Provider

JAVA

Vectorization

Code vectorization in the JVM: Auto-vectorization, intrinsics, Vector API

JAVA and SIMD

深入拆解Java 虚拟机 - 26 | 向量化

JEP 338: Vector API (Incubator)

Project Panama Early-Access Builds

Vector API Developer Program for Java* Software

并发编程

Java-concurrency

Java-concurrency 知识图谱

I/O

Zero Copy

原来 8 张图,就可以搞懂「零拷贝」了

推荐系统

在线服务平台

算法平台在线服务体系的演进与实践

实验平台

美团点评效果广告实验配置平台的设计与实现