README

参考/文章/工具

Hadoop history /可视化/ structure design 描述参考

  1. comp5349 week8, week9, week10 etc(都是后半部分例子)
  2. Understanding your Apache Spark Application Through Visualization
  3. Spark Basics : RDDs,Stages,Tasks and DAG

Enviroment

S3 Storage

EMR

m4.xlarge 8 cores, 16Gb

Hadoop, Spark, lily, Tensorflow,

Stanford stop words, NTLK

Build project May 10th

Date: May 12th 2020

Realized 3.1 first task

note

CC_Hadoop_history 是3.1 前半部分的记录

code: pyspark python, jupyter notebook 形式

Date: May 13th 2020