LSDS2021

Large Scale Distributed Systems 2022

Seed project for Lab1

Use this seed project for your submission of Lab1. Benchmark

Language: es. Output file: C:\Users\aleix\Desktop\output-es.txt. Destination bucket: pg.cm.am.lsds

Processing: Eurovision3.json There are 23848 in that language es

Processing: Eurovision4.json There are 78433 in that language es

Processing: Eurovision5.json There are 45800 in that language es

Processing: Eurovision6.json There are 71677 in that language es

Processing: Eurovision7.json There are 54969 in that language es

Processing: Eurovision8.json There are 38805 in that language es

Processing: Eurovision9.json There are 26244 in that language es

Processing: Eurovision10.json There are 169659 in that language es

Duration filter files: 141 s

Language: hu. Output file: C:\Users\aleix\Desktop\output-hu.txt. Destination bucket: pg.cm.am.lsds

Processing: Eurovision3.json There are 37 in that language hu

Processing: Eurovision4.json There are 227 in that language hu

Processing: Eurovision5.json There are 116 in that language hu

Processing: Eurovision6.json There are 244 in that language hu

Processing: Eurovision7.json There are 171 in that language hu

Processing: Eurovision8.json There are 142 in that language hu

Processing: Eurovision9.json There are 19 in that language hu

Processing: Eurovision10.json There are 101 in that language hu

Duration filter files: 121 s

Language: pt. Output file: C:\Users\aleix\Desktop\output-pt.txt. Destination bucket: pg.cm.am.lsds

Processing: Eurovision3.json There are 624 in that language pt

Processing: Eurovision4.json There are 2663 in that language pt

Processing: Eurovision5.json There are 1726 in that language pt

Processing: Eurovision6.json There are 2370 in that language pt

Processing: Eurovision7.json There are 4114 in that language pt

Processing: Eurovision8.json There are 5999 in that language pt

Processing: Eurovision9.json There are 3611 in that language pt

Processing: Eurovision10.json There are 16516 in that language pt

Duration filter files: 124 s

CONCLUSION

As we can see the time it takes for all the files is quite similar, this is because you have to constantly read all the files line by line. The only difference is that in some cases, you will have to write more to the output file causing this time to increase significantly. One of the problems is that the reading is being done sequentially, if we split the file into blocks and read them independently it would improve the time.