/PStormScheduler

Apache Storm : Custom Timestamp based spout (with example topo )

Primary LanguageJava

PStormScheduler

  • Project includes special spout that reads events from CSV file as rows for emitting using timestamp column in input CSV file [SampleSpout in samples folder]

  • Takes arguments as the X-factor (0.5x,3x) to increase or dicrease the input rate in spout using same CSV file as input (0.5 as arg to double the rate)

  • How to USE -

    In local mode args passed in IDE -

    example

    
    <Topo-mode L-local,C-cluster>   <topo name>   <input CSV file path>    <datatype-ExperimentID>  <rate>   <output log file path>
    
    L   IdentityTopology   /Users/anshushukla/Downloads/Incomplete/stream/PStormScheduler/src/test/java/operation/output/eventDist.csv     PLUG-210  1.0   /Users/anshushukla/data/output/temp
    

    In global mode args passed -

    example-

    
    <storm command>  jar  <jar file path>   <class path>   <Topo-mode>   <toponame>  <csv data path >  <PLUG-expid>  <rate>  <log file path>
     
    storm jar $stormJarpath  in.dream_lab.bm.uidai.auth.topology.$i  C  $i  $plugdatapath   PLUG-$expNum   1.0     $outputpath
    
    
  • Limitations

    • Must pass expid folllowed by PLUG keyword it is being used to identify data type while emitting - (will correct this soon)

    • Timestamp must be in milliseconds

    • sample csv file entries - for more columns please add them after these 2 columns

      | msgId | timestamp | | ------------- |:-------------:| | 1 | 1443033000 | | 2 | 1443033001 | | 3 | 14430330002 |

    • After cloning the code please comment the below line in pom.xml to run in IDE and reverse to run in Cluster mode

    <scope>provided</scope>  <!-- IMPORTANT -->
    
    • Note Input CSV file must be present in all the locations of cluster nodes , because we dont know in case of default scheduler where spout is going to run