PStormScheduler

Project includes special spout that reads events from CSV file as rows for emitting using timestamp column in input CSV file [SampleSpout in samples folder]
Takes arguments as the X-factor (0.5x,3x) to increase or dicrease the input rate in spout using same CSV file as input (0.5 as arg to double the rate)

How to USE -

In local mode args passed in IDE -

example


<Topo-mode L-local,C-cluster>   <topo name>   <input CSV file path>    <datatype-ExperimentID>  <rate>   <output log file path>

L   IdentityTopology   /Users/anshushukla/Downloads/Incomplete/stream/PStormScheduler/src/test/java/operation/output/eventDist.csv     PLUG-210  1.0   /Users/anshushukla/data/output/temp

In global mode args passed -

example-


<storm command>  jar  <jar file path>   <class path>   <Topo-mode>   <toponame>  <csv data path >  <PLUG-expid>  <rate>  <log file path>
 
storm jar $stormJarpath  in.dream_lab.bm.uidai.auth.topology.$i  C  $i  $plugdatapath   PLUG-$expNum   1.0     $outputpath

Limitations
- Must pass expid folllowed by PLUG keyword it is being used to identify data type while emitting - (will correct this soon)
- Timestamp must be in milliseconds
- sample csv file entries - for more columns please add them after these 2 columns
  
  | msgId | timestamp | | ------------- |:-------------:| | 1 | 1443033000 | | 2 | 1443033001 | | 3 | 14430330002 |
- After cloning the code please comment the below line in pom.xml to run in IDE and reverse to run in Cluster mode
```
<scope>provided</scope>  
```
- Note Input CSV file must be present in all the locations of cluster nodes , because we dont know in case of default scheduler where spout is going to run

anshuiisc/PStormScheduler

PStormScheduler