A package for targeted topic modeling for focused analysis.
- A JAVA implementation for targeted topic modeling;
- Used for focused analysis purpose;
- By specifying the target (aspect) word to obtain its target-related topics
We are glad if the package helps your projects or research. Please cite our paper with the following information. You are welcome to contact Shuai Wang (shuaiwanghk@gmail.com) if you have any question.
Shuai Wang, Zhiyuan Chen, Geli Fei, Bing Liu and Sherry Emery, "Targeted Topic Modeling for Focused Analysis", SIGKDD 2016.
## Input Data Format(1) docs.txt
Every single file in a given (domain) corpus is arranged in the following format.
Line 1: #numOfSent. The number of sentences for one review (always one for a tweet from Twitter)
Line 2: dummy field (Just a place holder. Not useful for modeling, but currently we still need to put it in the raw data. It was used for my debugging. Forgive me that I have not eliminated it. I might fix it and pull a new version later.)
Line 3 to (3+#numOfSent-1): content/text of a sentence.
(repeating the above format for all files)
Example:
3 // number of sentence for review 1;
0 // dummy field
1 2 3 // sentence 1 (in review 1)
4 5 // sentence 2 (in review 1)
6 7 8 // sentence 3 (in review 1)
2 // number of sentence for review 2
0 // dummy field
3 4 // sentence 1 (in review 2)
5 8// sentence 2 (in review 2)
....
The values like "1 2 3", "4 5" are word indexes corresponding to line numbers in the wordlist.txt file.
(2) wordlist.txt
a. This is a vocabulary file, which indexes words in a given domain.
b. The stop words and infrequent words have been removed.
A corresponding model will be saved.The parameters and argument settings are set in argument -> ProgramArgument.java file.
Among them, the most important settings are:<br > a. domainName (the domain/dataset name)<br > b. targetWord (the keyword of the targeted aspect)<br > c. tTopicNum (targeted topic number)<br > Please refer to ProgramArgument.java for details.
(2) Single task<br > The task file locates in task -> Execute TTMwithOneSingleTask, which is for running a single task. A corresponding model will be also saved.
(3) Multiple tasks/threads<br > We also provide a multiple tasks/threads entry so that we can target at different aspects parallelly.The task file locates in task -> RunTTMwithMultiTasks, which is for running multiple tasks.
## Output File An output file with targeted topic-word distribution will be generated in a file under data/output folder.Note that I have rewritten my codes with some code optimization and reconstruction so the final produced results might be slightly different from my previous ones.
## Run Demo/Entry File (1) Run in IDE.Two files are provided. You should be able to run them (with libraries in the lib folder added). They are:
src -> launcher -> TTMSingleTaskEntry
src -> launcher -> TTMMultipleTasksEntry
(2) Run in Terminal by command lines.
Under the TTM root directory in Windows:
java -cp bin;lib/* launcher.TTMSingleTaskEntry
Under the TTM root directory in Unix/Linux:
java -cp bin:lib/* launcher.TTMSingleTaskEntry
Have fun!
- Author: Shuai Wang
- Affiliation: University of Illinois at Chicago
- Email: shuaiwanghk@gmail.com