/tikaondotnet

Use the Java Tika text extraction library on the .NET platform

Primary LanguageC#Apache License 2.0Apache-2.0

Developers Guide to Tika on .NET

This project is a simple wrapper around the very excellent and robust Tika text extraction Java library.

##Building TikaOnDotNet##

This project uses rake for build automation.

  1. Install Ruby
  2. Install Rake gem install bundler
  3. Run bundle install
  4. Run rake

If successful this should build and run the Tika text extraction integration tests.

To ensure you have all the required gems installed Bundler is used and should be automatically installed and setup the first time you rake the project. To manage our Nuget dependencies we are using a tool called Ripple but you should hopefully not have to worry about that unless you are updating dependencies.

##Updating the IKVM Nuget dependency##

ripple update -n IKVM -V {version}

##Building the Tika-App .NET Assembly##

You should only need to do this step to upgrade the version of Tika being used by this project.

At it's core this project simply wraps the Java Tika library. To accomplish this the tika-app-{version}.jar is transpiled into a .Net assembly using the IKVM compiler.

Please ensure that your version of IKVM binaries match the Nuget dependency's version

ikvmc.exe -target:library -assembly:tika-app -classloader:ikvm.runtime.AppDomainAssemblyClassLoader tika-app-{version}.jar

The result of this process is a .NET assembly tika-app.dll which is stored in this repo's lib directory.

The tika-app .jar file can be downloaded from the Tika Download page.

##Releasing TikaOnDotNet##

There is a handy release.bat which will create a release build and package the nuget. The resulting nuget package will be in the artifacts directory.