Make sure reconstruction test is runnable end to end

Question

Make sure reconstruction test is runnable end to end

Opened this issue 5 years ago · 8 comments

@soroushzargar perhaps you could do this.. it would be better if the reconstruction-test were easier to run. For example, have a script that downloads and suitably formats the appropriate data and then runs the test. Right now there is some hacky stuff going on with a temp dir that probably isn't in the source tarball.

I was only printing out the bits in the file as a debugging step. But I think we should be displaying the file sizes or bits per unit time in some form alongside the various compression methods; this will make comparisons more meaningful.

I just set the bits per sample to 6 in reconstruction-test.py but we may want to test various settings. What I would really like is some info about PSNR's and file sizes formatted in a way that's easily readable. IMO a simple Python program that prints out a summary would be preferable to this CSV stuff, in that it would be clearer. But as long as you do print out something readable at some point (maybe as a post processing step that might be in a separate script) it's OK with me. Make sure it's all documented with a top-level run.sh or something.

Answer 1 · 2019-10-22T07:28:38.000Z

@soroushzargar perhaps you could do this.. it would be better if the reconstruction-test were easier to run. For example, have a script that downloads and suitably formats the appropriate data and then runs the test. Right now there is some hacky stuff going on with a temp dir that probably isn't in the source tarball.

Ok I think a better way is that the file itself should be a runnable file with passing arguments; something like

`./reconstruction-test --dataset-dir --sample-rate

arguments passed by should be

--dataset-dir: An address to a dataset directory which includes some .wav files.
--sample-rate: The sample rate in which the comparison is supposed to work with.
--release-df: If this flag is appeared, then the procedure will create a set of pandas dataframes saved in .csv format which could be used with pandas later.
--release-log It prints out a tab delimited log file which will show the output of comparison.

This will shorten the code and also reduces dependancies.

I was only printing out the bits in the file as a debugging step. But I think we should be displaying the file sizes or bits per unit time in some form alongside the various compression methods; this will make comparisons more meaningful.

The output of the file can be in following shape

Results on sampling rate = <sampling rate>
<Filename>    <algorithm>   <bitrate>   <psnr> [repeating for each compression algorithm]

I just set the bits per sample to 6 in reconstruction-test.py

By that you mean 16? Sorry here I just got confused.

but we may want to test various settings. What I would really like is some info about PSNR's and file sizes formatted in a way that's easily readable.

Did I cover that in previous part?

IMO a simple Python program that prints out a summary would be preferable to this CSV stuff, in that it would be clearer. But as long as you do print out something readable at some point (maybe as a post processing step that might be in a separate script) it's OK with me. Make sure it's all documented with a top-level run.sh or something.

Ok I think a top level .sh with man documentation is what you want here. Right?

Answer 2 · 2019-10-22T20:14:29.000Z

`./reconstruction-test --dataset-dir --sample-rate

Good idea.. but be sure to have a top-level run.sh that runs the complete test.

arguments passed by should be

--dataset-dir: An address to a dataset directory which includes some .wav files.

--sample-rate: The sample rate in which the comparison is supposed to work with.

--release-df: If this flag is appeared, then the procedure will create a set of pandas dataframes saved in .csv format which could be used with pandas later.

--release-log It prints out a tab delimited log file which will show the output of comparison.

Sure.

I just set the bits per sample to 6 in reconstruction-test.py

By that you mean 16? Sorry here I just got confused.

No I meant 6- look at the code. The bits-per-sample used by lilcom to compress.

but we may want to test various settings. What I would really like is some info about PSNR's and file sizes formatted in a way that's easily readable.

Did I cover that in previous part?
Yes, I think so.

IMO a simple Python program that prints out a summary would be preferable to this CSV stuff, in that it would be clearer. But as long as you do print out something readable at some point (maybe as a post processing step that might be in a separate script) it's OK with me. Make sure it's all documented with a top-level run.sh or something.

Ok I think a top level .sh with man documentation is what you want here. Right?

Yes, a top-level sh. Just put comments in where needed. man documentation is super overkill for a test script.

Answer 3 · 2019-10-24T15:03:44.000Z

`./reconstruction-test --dataset-dir --sample-rate

Good idea.. but be sure to have a top-level run.sh that runs the complete test.

We can make the python as a runnable shell file just with ! /bin/python or sth, but I think you want sth more than just adding this and saving as a sh file. Can you say what you expect from the sh file which this additional line can not do?

I just set the bits per sample to 6 in reconstruction-test.py

By that you mean 16? Sorry here I just got confused.
No I meant 6- look at the code. The bits-per-sample used by lilcom to compress.

Yes found it. we can add another passing parameter to let the user change this value. Is it Ok or sth else is needed.

Ok I think a top level .sh with man documentation is what you want here. Right?

Yes, a top-level sh. Just put comments in where needed. man documentation is super overkill for a test script.

The argument parser gives a help function in which the user can easily find out how to work with the test.

Answer 4 · 2019-10-24T17:46:28.000Z

The idea of run.sh is that it should just run automatically when you run
./run.sh
so the user does not have to spend time figuring out documentation.

Answer 5 · 2019-10-25T05:46:56.000Z

Ok I got it so there is now difference here.

…

On Thu, 24 Oct 2019, 21:16 Daniel Povey, ***@***.***> wrote: The idea of run.sh is that it should just run automatically when you run ./run.sh so the user does not have to spend time figuring out documentation. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16?email_source=notifications&email_token=AB7HSSDIMLJZFXHT4GM4SQLQQHNPLA5CNFSM4JDKVTA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECF3SUY#issuecomment-546027859>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB7HSSFJAGA34H23IFTKKEDQQHNPLANCNFSM4JDKVTAQ> .

Answer 6 · 2019-11-11T10:31:09.000Z

In addition to the assumption that MP3 will compress with the exact bitrate passed also there would be an issue with sample rate. Let me test it on long audio too and I'll return the code with true bitrates for mp3.

Do you think we should substitute pydub and librosa with sox. If it is necessary I try to figure it out.

Answer 7 · 2019-11-11T19:58:16.000Z

Sox is OK, it's just for testing.

…

On Mon, Nov 11, 2019 at 2:31 AM Soroush Zargar ***@***.***> wrote: In addition to the assumption that MP3 will compress with the exact bitrate passed also there would be an issue with sample rate. Let me test it on long audio too and I'll return the code with true bitrates for mp3. Do you think we should substitute pydub and librosa with sox. If it is necessary I try to figure it out. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16?email_source=notifications&email_token=AAZFLO3QSJNLAXGPCOO6SMDQTEX63A5CNFSM4JDKVTA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWL36Q#issuecomment-552386042>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOYTTG4ZAVXTP2R3UQDQTEX63ANCNFSM4JDKVTAQ> .

Answer 8 · 2019-11-11T19:59:21.000Z

Oh, you mean the other way round. I think the way it is now is fine. mp3 seems to ignore the bitrate passed for some sample rates. You need to find the size on disk using os.path.[something].

…

On Mon, Nov 11, 2019 at 11:58 AM Daniel Povey ***@***.***> wrote: Sox is OK, it's just for testing. On Mon, Nov 11, 2019 at 2:31 AM Soroush Zargar ***@***.***> wrote: > In addition to the assumption that MP3 will compress with the exact > bitrate passed also there would be an issue with sample rate. Let me test > it on long audio too and I'll return the code with true bitrates for mp3. > > Do you think we should substitute pydub and librosa with sox. If it is > necessary I try to figure it out. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#16?email_source=notifications&email_token=AAZFLO3QSJNLAXGPCOO6SMDQTEX63A5CNFSM4JDKVTA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWL36Q#issuecomment-552386042>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAZFLOYTTG4ZAVXTP2R3UQDQTEX63ANCNFSM4JDKVTAQ> > . >