lochenchou/MOSNet

How to comprehend Output?

Closed this issue · 3 comments

I am using [(https://github.com/aliutkus/speechmetrics)], which is a kind of wrapper for your repository, to evaluate the results and the output is as follows, I am not able to understand why there is a 5 value array as output to a single input.

{'mosnet': array([4.98537636, 4.95263338, 4.69211102, 5.06538916, 5.01724768])}

Without diving into the speechmetrics repo, keep in mind the model can report a frame-level and utterance-level MOS value. You could be receiving the frame-wise scoring. The utterance score is simply the average of the frame-level scores. See data_generator in utils.py in the mosnet repo for reference.

can we run the code on a Intel® Core™ i7-8700 CPU @ 3.20GHz × 12 instead of GPU: GeForce RTX 2080 Ti as specified in the requirements.

Sure. Just install tensorflow instead of tensorflow-gpu and remove some of the gpu management stuff in train.py, particularly the memory growth block.