reisepass/ETHz_HeadlineGenerator

NLP 2013 INF.ETHz

Java

Issues

Out of bounds in FeatureBasedSummary.getTopEntity()
#20 opened 11 years ago by reisepass
2
ArticleTopicNGramSum and MostProbSentBasedOnTopicDocProb used to have a constructor which without specifying an TreeMap corpus. This was based on test data i made inside the class. We must change all these instantations to also pass a the corpus as a T
#21 opened 11 years ago by reisepass
0
index out of bounds error in DocNGramSimple on line 59 ngramWords[i] = words[i]; See below
#19 opened 11 years ago by reisepass
8
getTopicDocProb has a null pointer exception sometimes. I suggest we let it return zero when it cant find topic - doc probability . See comment
#18 opened 11 years ago by reisepass
1
Regex not doing what we want when parsing News200 corp
#17 opened 11 years ago by reisepass
0
Change the most likely ngram sentence summary to use Treemap.lowerENtry( key==WildCard+query) instead of Collections.Sort
#16 opened 12 years ago by reisepass
0
Change all the Comparator implementations for the ngram TreeMap<ArrayList<String>,double> stuff so that they ignore Lower Case upper Case, Punctuation, white space
#15 opened 12 years ago by reisepass
0
//TODO Append all the ngrams from the original document into the outNgram. Add frequencies if an ngram alreayd excists. Weigh down the ngrams from the query doc because its frequencies are not weighted by that probability of topic to doc which all the ngr
#14 opened 12 years ago by reisepass
0
Hmmm it looks like Doc.annotation is not necessarily set. Maybe we should change it so that if you do doc.getAno() it checks if the annotation is null and if it is then it does all the stanford NLP stuff
#13 opened 12 years ago by reisepass
0
Configuration
#5 opened 12 years ago by jarednieder
1
Import an external corpus similar to our own. Derive a set of categories, using ( LSI | NMF | LDA). And create a method which does not change the categories but just classifies a new article to one of them.
#12 opened 12 years ago by reisepass
0
Create a corssvalidation screipt which runs through all Summarizers and over all their parameters, executing the rouge script each time. Store these values in a sorted list for optimization.
#11 opened 12 years ago by reisepass
0
IN summary method 2 NeFreq and NounFreq include information about how offten two Ne or an Ne and a Noun occure together in a sentence.
#10 opened 12 years ago by reisepass
0
Create new summarizer which is like the NeFreq one but also considers non NE in the list of most used words.
#9 opened 12 years ago by reisepass
0
Adjust the NE count functions to include prepositions which refer to the NE. Use this: http://nlp.stanford.edu/software/dcoref.shtml
#8 opened 12 years ago by reisepass
0
Implement Significant phrase annotation
#6 opened 12 years ago by reisepass
0
Extractive summarizer based on NE frequency
#7 opened 12 years ago by reisepass
0
Sentence Trimming
#3 opened 12 years ago by jarednieder
1
Clauses
#4 opened 12 years ago by jarednieder
0
Implement Summarizer : Naive, First Sentence
#1 opened 12 years ago by reisepass
0
Implement Summarizer : Naive, First Sentence
#2 opened 12 years ago by reisepass
0