neulab/compare-mt

A tool for holistic analysis of language generations systems

PythonBSD-3-Clause

Issues

Ability to click through to individual examples
#26 opened 6 years ago by neubig
1
Consider model variance in bootstrap resampling test
#126 opened 3 years ago by odashi
6
Unit tests are failing
#130 opened 3 years ago by neubig
1
pip install errors
#127 opened 3 years ago by pfliu-nlp
6
pip install does not auto-install requirements
#118 opened 3 years ago by neubig
1
Bug in sentence level BLEU comparison
#114 opened 5 years ago by madaan
1
bootstrap sample size
#124 opened 4 years ago by ozancaglayan
0
stand-alone command line scorer
#119 opened 5 years ago by neubig
1
Some measure of bucketed statistic reliability
#88 opened 5 years ago by neubig
2
Multiple references
#7 opened 6 years ago by zdou0830
3
Bootstrap resampling without replacement
#108 opened 5 years ago by pmichel31415
5
Errors resulting from command line formatting can be opaque
#87 opened 5 years ago by neubig
0
"external program" Scorer
#27 opened 6 years ago by neubig
6
Ability to name analyses
#85 opened 6 years ago by neubig
1
Ability to bucket sentences by external label
#89 opened 6 years ago by neubig
0
No way to specify p-value threshold for significance tests
#83 opened 6 years ago by pmichel31415
1
Sentence report should de-duplicate identical sentences
#84 opened 6 years ago by neubig
0
Formatting table results
#14 opened 6 years ago by rooa
2
Undefined name 'sys_names' in compare_mt_main.py
#81 opened 6 years ago by cclauss
0
Implementation of Word Error Rate
#79 opened 6 years ago by neubig
1
Make number format consistent and configurable
#50 opened 6 years ago by neubig
2
Upload to PyPI
#64 opened 6 years ago by pmichel31415
3
ROUGE evaluation measure
#66 opened 6 years ago by zdou0830
8
Encoding issues in non-ASCII output?
#70 opened 6 years ago by neubig
2
Ability to analyze just one system
#54 opened 6 years ago by zdou0830
1
Ability to set custom bucket_cutoffs
#22 opened 6 years ago by danishpruthi
1
Output source sentences when printing sentence examples
#53 opened 6 years ago by zdou0830
1
Bug in plotting
#57 opened 6 years ago by pmichel31415
0
Ability to print LaTeX tables
#24 opened 6 years ago by neubig
1
Ability to name systems
#38 opened 6 years ago by neubig
1
No longer possible to run `python compare_mt.py`
#41 opened 6 years ago by neubig
0
Ability to specify more than 2 systems to analyze
#39 opened 6 years ago by neubig
3
Ability to print graphical (HTML?) reports
#6 opened 6 years ago by neubig
1
Faster significance tests
#25 opened 6 years ago by neubig
0
Make it a python module
#9 opened 6 years ago by pmichel31415
4
Word frequency analysis doesn't tell where frequencies came from
#28 opened 6 years ago by neubig
0
RIBES doesn't match official implementation's scores
#35 opened 6 years ago by neubig
1
NLTK BLEU can probably be removed
#31 opened 6 years ago by neubig
1
Case-insensitive option
#13 opened 6 years ago by zdou0830
2
More explicit implementation of alignment between reference and system output
#17 opened 6 years ago by neubig
0
Analysis of reordering errors
#16 opened 6 years ago by neubig
2
Analysis over word likelihoods
#4 opened 6 years ago by neubig
1
FreqWordBucketer doesn't build the frequency counts correctly from a corpus file
#20 opened 6 years ago by danishpruthi
3
Consider alignments, making it possible to analyze source words
#2 opened 6 years ago by neubig
2
Statistical significance tests for scores
#5 opened 6 years ago by neubig
2
Allow analysis over abstract tags
#3 opened 6 years ago by neubig
1
Refactoring
#1 opened 6 years ago by neubig
1