tf.keras implementation
Closed this issue ยท 19 comments
Hello,
Is there possibly a tf.keras implementation of the ordinal layer, or any interest in implementing one? I am hoping to use the CORAL algorithm in an item response theory-based multitask model for hate speech measurement (https://hatespeech.berkeley.edu), but all of our code is in Keras currently. I don't know that we have the low-level technical capacity to make the conversion ourselves unfortunately.
Thanks,
Chris
That's a nice project, and I would be happy to assist with getting CORAL to work for this.
It has been a long time since I used Keras for more serious projects, and I am not experienced with customizing it. However, I just wrote a quick recipe here that outlines the different aspects of a "regular" deep neural network classifier that need to be changed: https://github.com/Raschka-research-group/coral-cnn/blob/master/coral-implementation-recipe.ipynb
Maybe this is sufficient for you to take it from here and implement CORAL. Otherwise, @vmirly and I could potentially look into converting one of our CORAL PyTorch models from this repository using tf.keras this some time this summer. I think it would be useful to have an example in this repo as others may find it helpful as well.
Thank you @rasbt for such a fast and thorough response, this is incredibly helpful. The notebook really improves my understanding of the PyTorch implementation, so I think there is some possibility that I can figure out enough tf.keras details to get a port working in the short term. I can update this issue with any solutions that seem to work, or additional questions.
And a backup option of an official implementation sometime later in the summer sounds great, since I might end up running out of time/capacity on my end.
Much appreciated,
Chris
Ok, without fully internalizing how the math works I believe that I was able to successfully convert steps 1 and 2 to tf.keras so far (Colab notebook):
https://drive.google.com/file/d/1-jkKxUOrXBya_dDkWQN6qgI-R5B3Q2AQ/view?usp=sharing
I will see if I can get 3 and 4 to work in an actual model - to be determined.
I think I have part 3 ported ~correctly to create a custom CORAL output layer:
https://colab.research.google.com/drive/1-jkKxUOrXBya_dDkWQN6qgI-R5B3Q2AQ#scrollTo=KNEsk3_0eWc7
Would it be possible to create a simple test case for parts 3/4 in pytorch that I can try to port over and confirm that I get the right answer?
Nice, let me take a look!
So, I just extended parts of the Jupyter Notebook at https://github.com/Raschka-research-group/coral-cnn/blob/master/coral-implementation-recipe.ipynb
-
I added a "Test with pre-defined weights" to Section 3. It's just running the layer to ensure it creates the expected outputs.
-
The above example is continued in section 4 to get the class labels
-
I added a section "5) Example Run and Code Checks" that runs a very simple CNN on MNIST. I think getting your implementation to run on MNIST as well (and looking at the sanity checks) would be a good starting point before using it in a more complex context.
Let me know if you have any questions!
Ok, it seems like parts 3 and 4 are working correctly, phew. https://colab.research.google.com/drive/1-jkKxUOrXBya_dDkWQN6qgI-R5B3Q2AQ#scrollTo=IdyKXnR30yrQ
On parts 3 and 4 when the bias terms are modified, shouldn't there only be 4 bias weights for the test data, since we have 5 classes? Apologies if I'm just not following correctly, this is stretching my brain.
shouldn't there only be 4 bias weights
Oh, sorry about that, my bad. Hope it didn't cause too much head scratching! I was adding this section after adding the MNIST example at the bottom and must have been thinking of the 10 classes in MNIST (hence the 9 bias units 0...8). It's a weird coincidence that this didn't cause a dimension mismatch issue. I just fixed it.
Thanks, that's what I figured.
Part 5 is ported but not yet giving the correct results - there must be some bugs still to uncover. Pretty close I think though. https://colab.research.google.com/drive/1-jkKxUOrXBya_dDkWQN6qgI-R5B3Q2AQ#scrollTo=iQUU0hEhGNxN
Hm, was just running your notebook to see if there's something I could help with. What's concerning me is that the loss become negative. This shouldn't be possible (should converge to 0) -- it's weird. E.g., if I choose values that are "extremely good" I can see that it works:
But then, I am really confused why it becomes negative when it is plugged into the network.
Thanks for giving it a try! Yeah, I think something is wrong with the model architecture and/or training loop. There are unfortunately a lot of areas where I may have made mistakes in translating the pytorch to tf.keras. This is my first time using a subclassed tf.keras model, doing a custom training loop, or using a tf dataset, so some of the options or integration must be off. But probably only a few fixes away from being correct.
Ok, it seemed to have been an issue in converting the labels to levels during the training loop. Resolving by converting a tensor to a numpy array as a quick fix. Appears to be working now!
Nice!
Regarding the parts
self.pool_2 = tf.keras.layers.MaxPooling2D(pool_size = (2, 2),
strides = (2, 2),
# Not sure if this is right.
padding = "valid")
So, this is just an arbitrary CNN. You can replace the CNN by any architecture you want (we used ResNet-34 in the paper) given that you keep the modifications of the last layer (the CORAl layer)
Thanks, yep I follow that. I'll be initially using the approach for NLP so it will be integrated into a transformer-based architecture (RoBERTa currently). It will be doubly multitask in that we have 10 ordinal outcome variables to predict (survey items).
Regarding rank prediction, I was wondering why the predicted rank takes the sum of the binary indicators rather than choosing the rank with maximum predicted probability. Is there any intuition for why the former is preferred?
I was wondering why the predicted rank takes the sum of the binary indicators rather than choosing the rank with maximum predicted probability. Is there any intuition for why the former is preferred?
That's a good Q! Initially, we did that because it was done in the same manner in both the Niu et al and Li & Lin paper, which also used a binary-task approach. Wenzhi Cao with whom I worked on this paper also suggested looking at the max. pred probability approach. It was a quick modification, and I evaluated all the models, but I remember that the MAE was approx. the same so we kept using the sum.
Gotcha, thanks for the background. It does seem that they would often give the same prediction, and the sum version might be a bit more robust with a large number of ranks.
I converted the Keras code into a quick python package (https://github.com/ck37/coral-ordinal) and condensed Colab notebook (https://colab.research.google.com/drive/1AQl4XeqRRhd7l30bmgLVObKt5RFPHttn#scrollTo=s2HQ89oVs5TS) so I think we're in good shape! Now to apply it to my own data and try to wrap up this preprint.
Awesome, glad to hear! There was a small re-org I am wanting to do in the repo in the next couple of days, and I will be linking your Keras code then, because I think it will be very helpful to others as well! Good luck with the rest of your project.
Sounds great to me, and thanks again for such rapid assistance on this. I already received my first user bug report email this morning so there may indeed be some interest in the Keras port.
Thanks for putting this together! I hope it will be useful to you and others! Added a link to the README file.