Ujjawal-K-Panchal/coshnet

Problem with shear levels

Closed this issue ยท 24 comments

I'm visualizing the shearlets generated and either I'm doing something wrong or I'm not sure that shearing is being applied as intended.

grid_imag
grid_real

From debugging, it seems the shearing itself does work, but it's shearing the shearlets in the wrong direction: horizontal shearlets sheared horizontally, and vertical shearlets sheared vertically makes it a no-op basically.

Have you checked this works in the current repository?

Switching the axis of the roll in construct_shearlet() seems to fix the shearing. Here's what it looks like without the Hilbert matrix applied, here with 2 shear_levels and 6 shearlets:

grid_real_s2
grid_imag_s2

I wonder if the Hilbert matrix should be applied before the shearing though... It looks weird when applied afterwards.

Here's with the Hilbert matrix applied as it is currently in the code. Real & Imag components respectively:

grid_real_h
grid_imag_h

It doesn't look right to me, I would also apply shearing to the hilbert matrix?

As far as I can tell, the shearing bug is in the coshrem library โ€” not this one. But it seems to affect the results.

For the Hilbert matrix, I'm still trying to understand how it should be applied from the papers...

Hi @alexjc, I am not entirely sure what you are visualizing there, are those the shearlet filters that are produced with construct_shearlet. Take into account that those shearlets are represented in the Fourier domain, which means that shearings or rotations in the spatial domain will be done in the opposite direction in the frequency domain. The function that we used in the backend to generate shearlets, _single_shearlet is also part of the coshrem library. I also find it weird that the Hilbert matrix is applied afterward and not previously sheared. This construction was actually originally presented in the paper from the authors of coshrem. At the end of Section 3.1. they say "However, Theorem 3.1 also shows that changing the symmetry properties of a generator by applying the Hilbert transform reduces smoothness in the Fourier domain and thus localization in the time domain."

Just to make clear, this shearlet construction has shown high performance in oriented edge detection, which is the main reason we have chosen it over the classical real-valued shearlets.

Now, if you found out (as I saw in #5) that the reconstruction is better when the Hilbert matrix is sheared, and the axis of the roll in construct_shearlet is way better, then the authors of coshrem should try to follow this approach and adapt their theory accordingly.

@arsenal9971 Yes, I'm visualizing the shearlets as produced by the code that's available in this repository.

I can't reproduce the results from the paper with this repository, the scores are off by 2% to 3%. I suspect it's because of the underlying shearlets implementation, I have fixed multiple bugs so far and still can't match the paper.

What code did you use to obtain the results in the paper? If it's this repository, can you confirm that installing and running from source right now gives you the results in the paper?

I'm strongly in favour of your theories and the motivations in the paper... but if you used the coshrem library from the code in this repository to obtain the results, then fixing the bugs could likely make the results even better ;-)

If you can confirm the results of the paper, then something is wrong in my installation. If not, then something is wrong in the code. How can I best help?

I created a new environment from scratch with clean repository, and my attempts to reproduce the results are 10% off:

python test_fashion.py --dataset=fashion --recipe CoShCVNN --epochs 20 --trset_size 60000 --batchsize 192

Results of testing:

>>>>>>Test(60000): , CoShCVNN
(A) 88.5% (4.99e-01)
INFO: <trace>: Confusion Matrix:
[[5083   19  164  183   26    6  464    0   55    0]
 [  19 5883    7   65    9    0    4    0   13    0]
 [ 107    4 5307   70  370    0  114    0   28    0]
 [ 133  115   68 5340  233    0   89    0   19    3]
 [  17   22  538  149 5109    0  138    0   26    1]
 [   3    2    0    5    1 5747    0  149   28   65]
 [ 743   24  889  228  721    0 3324    0   70    1]
 [   0    2    0    0    0   78    0 5690    6  224]
 [  43   12   22   23   30   14   26    5 5821    4]
 [   0    0    0    5    2   45    1  142    8 5797]]
INFO: <trace>: (A) 88.5% (4.99e-01)
INFO: <trace>: (P) [ 85%  98%  88%  89%  85%  96%  55%  95%  97%  97%]
INFO: <trace>: (R) [ 83%  97%  76%  88%  79%  98%  80%  95%  96%  95%]
time spend on test time method = 1.34s

These are the shearlets generated by a clean repository in a fresh environment.

Top two rows are real components, bottom two are imaginary:
shearlets

Does this look correct to you? The shearing is ever so slightly visible and looks nothing like the visualizations in paper, which leads me to believe it was maybe applied to the wrong dimension? Fixing that made the reconstructions work...

Also, please note that _single_shearlet() and getcomplexshearlets2D() are in the included shnetutil library which has copyright by Ujjawal.K.Panchal & Manny Ko from 2020... so at least part of the code that's being used to produce the shearlets is by the same author(s) as the paper.

Hi @alexjc, so I helped to implement the shearlet part of the paper, but @Ujjawal-K-Panchal ran the last reported experiments, currently, I am not fully versed in the latest version of the repo. Maybe he could share with you his exact source code. The functions _single_shearlet() is an adaptation of the one from the original library, where the roll is done in the same dimension, so that exact bug is inherited from the original implementation. We will also run our training fixing the roll dimension and report back to you about possible improvements. Thanks a lot for the help!

It's not quite as trivial as fixing the roll direction. That only works for shear_level=2. Beyond that, the shearlets have the incorrect angle too, so other parts of the function need to be changed. Also, when you do that the Hilbert matrix applied after shearing makes the results look very weird.

I have not fully fixed it yet, I suspect I'm missing the root cause of this bug...

Maybe he could share with you his exact source code.

Yes, please! If the repository is linked the paper, then I would assume it's possible to reproduce the results described by running this code. (This may come up during reviews if that's still on the horizon for this paper.)

Hi @alexjc thanks for the issue. I just ran the best stats mentioned in paper 60k20E for Fashion and MNIST again. I am able to reproduce the paper results, infact am getting .2% higher. Perhaps due to some dependency change.
Here are my results:

>>>>>>Test(10000): , CoShCVNN
(A) 92.4% (4.38e-01)
INFO: <trace>: Confusion Matrix:
[[869   0  23  19   1   1  80   0   7   0]
 [  2 986   1   7   1   1   2   0   0   0]
 [ 15   0 861  12  46   0  66   0   0   0]
 [ 10   3   6 940  22   1  16   0   2   0]
 [  1   0  36  24 888   0  48   0   3   0]
 [  0   0   0   0   0 984   0  13   0   3]
 [ 72   1  37  34  58   0 796   0   2   0]
 [  0   0   0   0   0  10   0 976   1  13]
 [  3   0   0   6   2   2   3   1 983   0]
 [  0   0   1   0   0   8   0  30   0 961]]
INFO: <trace>: (A) 92.4% (4.38e-01)
INFO: <trace>: (P) [ 87%  99%  86%  94%  89%  98%  80%  98%  98%  96%]
INFO: <trace>: (R) [ 89% 100%  89%  90%  87%  98%  79%  96%  98%  98%]
time spend on test time method = 0.50s

Please use the following command:
python test_fashion.py --trset train --validate 1 --testmode 2

FYI, the cmdline args do the following:

--validate sets the validation interval.
--testmode if 1 (default), use the model acquired after the final epoch of training. if 2 use the best model found.
--trset : accepts test (default) or train. This is what is supposed to be used to set the training set size. --trset replaced trset_size in a later

Explanation for the results you are seeing:-
Since you are not using the --trset parameter, your training set is actually the test set of Fashion MNIST.

The parameter --trset_size is actually used for selecting a subset from the training set (which for you is set to the 10k large test set for fashion mnist. So even if you pass it 60000, it is 10k large and it only remains as large. Passing more than the number of elements doesn't throw an exception. Which perhaps we should have put in place so user can understand the use of the parameter properly.

Btw, thanks for getting me to run it again, running this for you, I actually found a bug elsewhere in the codebase; mentioned here: #7.

For --trset I would set that to train by default because that's what people would expect.

dear alex, so great to see you found this repo very interesting. It will be super interesting to see what the results become when you have fixed all the 'bugs' in coshrem. May I suggest you try to reproduce our results first with the coshrem as is. Sounds like you managed to do so now. The commands lines for the main results are in the readme. CoShNet identity is its parameter efficiency and 98% of our results and tests were done using the 10k as training. Yea, I was concerned about it causing confusion.

dear alex, the original authors of coshrem current repo/project page has renamed it to SymFD: https://github.com/rgcda/SymFD. Might be a helpful resource.

Yes, I just reproduced MNIST too with 99.4%.

There's only one plausible bug and it's not major. For the rest I don't understand the intent how it should work well enough yet. I will add more visualizations in different domains, maybe that will help ;-)

I'm closing this thread as I think the problems are in dependencies or other libraries. I'm not sure if my intuitions are theoretically correct either, and the resulting improvements are not significant.

Thank you everyone for helping reproduce the results in the paper!

plausible bug and it's not major.

@alexjc can you share about the other minor plausible bug that you mentioned, which is not on coshrem repo but exists in our repo i.e. coshnet?

Also, I highly recommend opening an issue about this on coshrem repo. Even if it is somehow not a bug, and if authors have some reason for the same, it will still aid the community's understanding on this issue.

Sure @Ujjawal-K-Panchal! It could be that I'm using coshrem wrong or in unexpected ways. It's possible to get correct output from the library, so I need to experiment a bit more, then report it.

Do you know any other downstream users of the library? I don't see a very big community around that library, there are no changes since 5 years.

You are right. We are few and far between. I had some difficulty finding them. Perhaps if you're interested in this small compilation:

  1. PyCoShREM: repo (Not active).
  2. AuScalable: repo (Active).
  3. Automatic Fracture Detection by Prabhakaran: repo |paper (Not active).
  4. Stegano framework: repo (Not active).

Probable user (found from used by page of pycoshrem):
5. CartoonX: repo | paper (Active).

dear alex, I have been going back to the source literature especially those on phase-congruency to see if shearlets/Fourier should be applied before or after Hilbert. I consulted -

"Image Features from Phase Congruency" Kovesi 1999.
"On the classification of image features" Venkatesh & Owens 1990
"Feature detection in human vision: a phase-dependent energy model" Morrone, Burr 1988.

All seems to suggest Hilbert transform should be applied to the Fourier coefficients or the log-Gabor filter responses expanded in Fourier series. CoShRem use \alpha-molecules which is basically a log-Gabor as far as I can tell. Venkatesh & Owens is very good read.

A sanity check is to see if the current CoShRem filters remain to be orthogonal in phase to each other (a Hilbert quadrature pair). Based on my reading of the theory that is the most important characteristic. One should be even-symmetric the other should be odd-symmetric (or asymmetric).

cheers

Thank you @mannykao. So far I've been looking at this from an engineering perspective; does this make sense code-wise? How do the features look? How does it affect results? I don't know how that helps but I learned a few things for myself!

I will try to read more about the theory before I take this further, I'm glad to see there's a solid body of knowledge.

it is great that you are doing this kind of detailed and sensible analysis. Something we should have done more of. The handling of phase using odd even filters date back to Bracewell's seminal textbook. I copied a few paragraphs from it. Might be useful:

Hilbert-xform.pdf
odd-even.pdf