This project is a simple wrapper around the very excellent and robust Tika text extraction Java library.
##Building TikaOnDotNet##
This project uses rake for build automation.
- Install Ruby
- Install Rake
gem install bundler
- Run
bundle install
- Run
rake
If successful this should build and run the Tika text extraction integration tests.
To ensure you have all the required gems installed Bundler is used and should be automatically installed and setup the first time you rake the project. To manage our Nuget dependencies we are using a tool called Ripple but you should hopefully not have to worry about that unless you are updating dependencies.
##Updating the IKVM Nuget dependency##
ripple update -n IKVM -V {version}
##Building the Tika-App .NET Assembly##
You should only need to do this step to upgrade the version of Tika being used by this project.
At it's core this project simply wraps the Java Tika library. To accomplish this the tika-app-{version}.jar is transpiled into a .Net assembly using the IKVM compiler.
Please ensure that your version of IKVM binaries match the Nuget dependency's version
ikvmc.exe -target:library -assembly:tika-app -classloader:ikvm.runtime.AppDomainAssemblyClassLoader tika-app-{version}.jar
The result of this process is a .NET assembly tika-app.dll
which is stored in this repo's lib directory.
The tika-app .jar file can be downloaded from the Tika Download page.
##Releasing TikaOnDotNet##
There is a handy release.bat
which will create a release build and package the nuget. The resulting nuget package will be in the artifacts directory.