ziqizhang/jate

API?

Closed this issue · 5 comments

I'm wondering if there is any way that I can use it as a library in my application? Could you provide some basic example codes in Wiki? I just want to use the algorithmics.

Thank you.

Sorry of lacking sufficient documentation. JATE2 can be used as a library without much effort.

As mentioned in Quick Start, You can either 1) download jar from maven repository or add following configuration in your maven project along with Dragontools.

<dependency>
    <groupId>uk.ac.shef.dcs</groupId>
    <artifactId>jate</artifactId>
    <version>2.0-beta.1</version>
</dependency>

Once you have setup JATE2 libraries, you are able to use all the available ATE algorithms in your application/project. Our App* shows the example how to use and integrate ATE algorithms with Apache Solr. All the available ATE implementations are subclass of uk.ac.shef.dcs.jate.algorithm.Algorithm in the package of uk.ac.shef.dcs.jate.algorithm.*. Current method/interface should be fairly straightforward to use by simply providing a list of candidate terms and corresponding features. The method will then return ranked terms modelled by uk.ac.shef.dcs.jate.model.JATETerm with scores and other features/metadata. Since JATE2 relies on Solr to perform pre-processing and feature extraction, you have to implement your own method or use Solr or our embedded Solr implementation (i.e., App* ) to parse and extract candidates and features from your corpus.

We will introduce more documentations in near future.

Thanks for your interests.

I tried

AppCValue.main(("uk.ac.shef.dcs.jate.app.AppCValue -corpusDir " + corpusDir + " -o cvalue-terms.json " + solrDir + "/testdata/solr-testbed ACLRDTEC").split(" "));

but
uk.ac.shef.dcs.jate.JATEException: Cannot find expected field: jate_ngraminfo
at uk.ac.shef.dcs.jate.util.SolrUtil.getTermVector(SolrUtil.java:36)
at uk.ac.shef.dcs.jate.feature.FrequencyTermBasedFBMaster.build(FrequencyTermBasedFBMaster.java:39)
at com.scholarfriend.maven.Epollo.Tools.AppCValue.extract(AppCValue.java:93)
at com.scholarfriend.maven.Epollo.Tools.AppCValue.extract(AppCValue.java:85)
at uk.ac.shef.dcs.jate.app.App.extract(App.java:285)

I have pdf, txt, and html file under the folder.

Logger: com.softcorporation.util.Logger
Mon Feb 27 01:33:46 EST 2017 loading exception data for lemmatiser...
Mon Feb 27 01:33:46 EST 2017 loading exception data for lemmatiser...
Mon Feb 27 01:33:47 EST 2017 loading exception data for lemmatiser...
Mon Feb 27 01:33:47 EST 2017 loading done
Mon Feb 27 01:33:47 EST 2017 loading done
Mon Feb 27 01:33:47 EST 2017 loading done
Mon Feb 27 01:33:47 EST 2017 loading exception data for lemmatiser...
Mon Feb 27 01:33:48 EST 2017 loading exception data for lemmatiser...
Mon Feb 27 01:33:48 EST 2017 loading exception data for lemmatiser...
Mon Feb 27 01:33:48 EST 2017 loading done
Mon Feb 27 01:33:48 EST 2017 loading done
2017-02-27 01:33:48 ERROR SolrCore:525 - [jateCore] Solr index directory 'A:\eclipse\lib\jate-master\testdata\solr-testbed\jateCore\data\index/' is locked. Throwing exception.
2017-02-27 01:33:48 ERROR CoreContainer:740 - Error creating core [jateCore]: Index locked for write for core 'jateCore'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
org.apache.solr.common.SolrException: Index locked for write for core 'jateCore'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
at org.apache.solr.core.SolrCore.(SolrCore.java:820)
at org.apache.solr.core.SolrCore.(SolrCore.java:659)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:727)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:447)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:438)
at java.util.concurrent.FutureTask.run(Unknown Source)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked for write for core 'jateCore'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528)
at org.apache.solr.core.SolrCore.(SolrCore.java:761)
... 9 more
Mon Feb 27 01:33:48 EST 2017 loading done
2017-02-27 01:33:48 ERROR SolrCore:525 - [GENIA] Solr index directory 'A:\eclipse\lib\jate-master\testdata\solr-testbed\GENIA\data\index/' is locked. Throwing exception.
2017-02-27 01:33:48 ERROR CoreContainer:740 - Error creating core [GENIA]: Index locked for write for core 'GENIA'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
org.apache.solr.common.SolrException: Index locked for write for core 'GENIA'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
at org.apache.solr.core.SolrCore.(SolrCore.java:820)
at org.apache.solr.core.SolrCore.(SolrCore.java:659)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:727)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:447)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:438)
at java.util.concurrent.FutureTask.run(Unknown Source)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked for write for core 'GENIA'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528)
at org.apache.solr.core.SolrCore.(SolrCore.java:761)
... 9 more
2017-02-27 01:33:48 INFO AppCValue:72 - Start CValue term ranking and filtering for whole index ...
uk.ac.shef.dcs.jate.JATEException: Cannot find expected field: jate_ngraminfo
at uk.ac.shef.dcs.jate.util.SolrUtil.getTermVector(SolrUtil.java:36)
at uk.ac.shef.dcs.jate.feature.FrequencyTermBasedFBMaster.build(FrequencyTermBasedFBMaster.java:39)
at uk.ac.shef.dcs.jate.app.AppCValue.extract(AppCValue.java:86)
at uk.ac.shef.dcs.jate.app.AppCValue.extract(AppCValue.java:77)
at uk.ac.shef.dcs.jate.app.App.extract(App.java:285)
at uk.ac.shef.dcs.jate.app.AppCValue.main(AppCValue.java:48)

I removed all the file in the data folder but still got these messages.

To run AppCValue programmatically, the main method accepts run-time parameters from the string array with the same order as the command line format.

The problem of your implements is that you should not provide class name as parameter if you directly run AppCValue programmatically.

So try with the following:

AppCValue.main(("-corpusDir " + corpusDir + " -o cvalue-terms.json " + solrDir + "/testdata/solr-testbed ACLRDTEC").split(" "));

To make it more clearly, you can try with the following code:

String[] cvalueArgs = new String[6];
cvalueArgs[0] = "-corpusDir";
cvalueArgs[1] = <YOUR_CORPUS_DIR>;
cvalueArgs[2] = "-o";
cvalueArgs[3] = <YOUR_JSON_FILE_PATH>;
cvalueArgs[4] = <YOUR_SOLR_HOME_PATH>;
cvalueArgs[5] = <YOUR_SOLR_CORE_NAME>;

AppCValue.main(cvalueArgs);

Hope it helps.

Thank you it works.