/rdd

类似于spark流式迭代计算的python单机实现

Primary LanguagePython

rdd

类似于spark流式迭代计算的python单机实现

RDD

RDD lib, write python code easy way

example word count in file

from .DStream import RDD

def parse_line(line):
    r = line.strip().split()
    return r

RDD.TextFile('/home/wangjinxiang/workspace/RDD/dict.txt'
        ).map(parse_line
        ).flatmap(lambda x:(x,1)
        ).reduceBykey(lambda x,y:x+y
        ).sort(key=lambda x:x[1]
        ).map(lambda x:"%s,%d\n"%(x[0], x[1])
        ).saveAsTextFile('result')