/Hydra

Distributed processing framework for search solutions

Primary LanguageJavaOtherNOASSERTION

Hydra Processing Framework

Getting started
1. Install MongoDB, and start it
2. Check out and build hydra-core
3. Start the built jar with 
	java -jar hydra-core-jar-with-dependencies.jar
	or just run the Main class from your IDE
	
You now have a running pipeline system. Try adding some documents using the API classes or the input-stages and play around with it.

For a basic pipeline you will need the following jars:
	hydra-core-jar-with-dependencies - the framework itself
	basic-stages-jar-with-dependencies - a library of reusable stages
	solr-out-jar-with-dependencies - a library containing the SolrOutputStage which sends documents to Solr

Configuring your pipeline								
Let's say we want to change the field source in all documents that have been touched by the stage extractPDFMetadata to pdf. It should modify documents in the database intranet. Since our stage makes static changes to fields, we can use the stage SetStaticFieldStage (included in basic-stages) instead of building our own. Since all configuration is done using Json, you'll need a Json representation of your configuration for the stage. It could look like this:
	
{
	stageClass: "com.findwise.hydra.stage.SetStaticFieldStage",
	queryOptions: ["touched(extractTitles,true)"],
	fieldNames: ["source"],
	fieldValues: ["web"]
}
	
The easiest way to do this without coding is to use the CmdlineInserter-class found in the examples library:
	
1. Add the library with the stage you want to add, in this case the basic-stages jar:
	Program arguments: -a -p pipeline -l -i myLibaryId basic-stages-jar-with-dependencies.jar 
	You should see the following output:
	> Added stage library with id: myLibaryId
2. Now, add your stage, by referencing the id to your library of stages, again using CmdlineInserter:
	Program arguments: -a -p pipeline -s -i myLibraryId -n myStage myStage.properties
	In this example, myStage.properties contains the configuration of the stage.

It is also possible to combine these two operations into one:
	Program arguments: -a -p pipeline -l -i myLibraryId basic-stages-jar-with-dependencies.jar -s -n myStage myStage.properties

For more information on the CmdlineInserter class, try invoking it and it should be fairly self-explanatory.

For further information, refer to the Wiki on http://github.com/Findwise/Hydra/wiki