/goose

Html Content / Article Extractor in Java open sourced from Gravity Labs - http://gravity.com

Primary LanguageJavaApache License 2.0Apache-2.0

Try it out online!
http://jimplush.com/blog/goose


Please view the wiki pages for all the details on the project :)

Wiki can be found by clicking the Wiki link or going here: https://github.com/jiminoc/goose/wiki

If you find Goose useful or have issues please drop me a line, I'd love to hear how you're using it or what features should be improved

Goose is licensed by Gravity.com under the Apache 2.0 license, see the LICENSE file for more details

To use goose from the command line:

cd into the goose directory
mvn compile
MAVEN_OPTS="-Xms256m -Xmx2000m" mvn exec:java -Dexec.mainClass=com.jimplush.goose.TalkToMeGoose  -Dexec.args="http://techcrunch.com/2011/05/13/native-apps-or-web-apps-particle-code-wants-you-to-do-both/" -e -q > ~/Desktop/gooseresult.txt