Issues
- 1
Ability to click through to individual examples
#26 opened by neubig - 6
Consider model variance in bootstrap resampling test
#126 opened by odashi - 1
Unit tests are failing
#130 opened by neubig - 6
pip install errors
#127 opened by pfliu-nlp - 1
pip install does not auto-install requirements
#118 opened by neubig - 1
Bug in sentence level BLEU comparison
#114 opened by madaan - 0
bootstrap sample size
#124 opened by ozancaglayan - 1
stand-alone command line scorer
#119 opened by neubig - 2
Some measure of bucketed statistic reliability
#88 opened by neubig - 3
Multiple references
#7 opened by zdou0830 - 5
Bootstrap resampling without replacement
#108 opened by pmichel31415 - 0
- 6
"external program" Scorer
#27 opened by neubig - 1
Ability to name analyses
#85 opened by neubig - 0
Ability to bucket sentences by external label
#89 opened by neubig - 1
- 0
- 2
Formatting table results
#14 opened by rooa - 0
Undefined name 'sys_names' in compare_mt_main.py
#81 opened by cclauss - 1
Implementation of Word Error Rate
#79 opened by neubig - 2
Make number format consistent and configurable
#50 opened by neubig - 3
Upload to PyPI
#64 opened by pmichel31415 - 8
ROUGE evaluation measure
#66 opened by zdou0830 - 2
Encoding issues in non-ASCII output?
#70 opened by neubig - 1
Ability to analyze just one system
#54 opened by zdou0830 - 1
Ability to set custom bucket_cutoffs
#22 opened by danishpruthi - 1
- 0
Bug in plotting
#57 opened by pmichel31415 - 1
Ability to print LaTeX tables
#24 opened by neubig - 1
Ability to name systems
#38 opened by neubig - 0
No longer possible to run `python compare_mt.py`
#41 opened by neubig - 3
Ability to specify more than 2 systems to analyze
#39 opened by neubig - 1
Ability to print graphical (HTML?) reports
#6 opened by neubig - 0
Faster significance tests
#25 opened by neubig - 4
Make it a python module
#9 opened by pmichel31415 - 0
- 1
- 1
NLTK BLEU can probably be removed
#31 opened by neubig - 2
Case-insensitive option
#13 opened by zdou0830 - 0
- 2
Analysis of reordering errors
#16 opened by neubig - 1
Analysis over word likelihoods
#4 opened by neubig - 3
FreqWordBucketer doesn't build the frequency counts correctly from a corpus file
#20 opened by danishpruthi - 2
- 2
Statistical significance tests for scores
#5 opened by neubig - 1
Allow analysis over abstract tags
#3 opened by neubig - 1
Refactoring
#1 opened by neubig