ocaml/v2.ocaml.org

Filter email list to not show `unsubscribe` emails.

gs0510 opened this issue ยท 16 comments

On the https://ocaml.org/community/ page, the recent email threads show all emails sent to the list. Filter the list so that unsubscribe emails are not displayed.

Screenshot from 2021-04-13 13-37-16

@Ndipbanyan since you were looking for a medium issue, you can go ahead and work on this one.

@gs0510 Alright. Thank you. I will begin working on it and reach out for any help or clarifications that I might need.

@Ndipbanyan Have you been able to make any progress? Do you have any questions? Thanks!

@gs0510 I have been able to find the code that generates this list in the rss2.html in the script directory and I am trying to understand the function that does that to see if I can modify it to filter the list. So the drawback I am currently having is my little to lack of understanding of the Ocaml language. However, I am still going through tutorials to catch up.

okay, let me know if you run into any problems! Thanks!

@gs0510 So I came up with a solution and want to clear be about it before creating a PR. Let me try to explain- The api that is 'consumed' to display the emails in recent thread emails returns a result having items in which each item has a title tag which reflects the subject of each email and the email of the sender. Below is what I am referring to
Screenshot 2021-04-18 at 17 55 55

generated from https://sympa.inria.fr/sympa/rss/latest_arc/caml-list?count=40

Looking at the above, you will notice that the item with the title <title>[Caml-list] - ulugbekna@gmail.com</title> has its email subject as "[Caml-list]", item with title <title>[Caml-list] [CFP] Logical Frameworks and Meta-Languages: Theory and Practice - enrico.tassi@inria.fr</title> has its email subject as "[Caml-list] [CFP] Logical Frameworks and Meta-Languages: Theory and Practice" and item with the title <title>[Caml-list] unsubscribe - jean-denis.eiden@orange.fr</title> has "[Caml-list] unsubscribe" as its subject.
Now in the code base in the /script/rss2html.ml , line 595 contains a regex expression that is written to exlude "Re:" and anything in between [ ] which was used to match the subject(represented in between the <title> </title> tags). Doing this results to the [Caml-list] and [CFP] removed from the above "titles" leaving only the remaining part of the titles to be displayed. so in the case of <title>[Caml-list] - ulugbekna@gmail.com</title>, there isn't any title after the [Caml-list] has been replaced/removed so the email - ulugbekna@gmail.com is displayed. Going by all these, my implementation added the unsubscribe to the regex which will end up displaying <title>[Caml-list] unsubscribe - jean-denis.eiden@orange.fr</title> as "- jean-denis.eiden@orange.fr" in recent thread emails.

This has become rather too long :). However, the point of all my explanations is to be sure if my implementation is the way it should be or you mean an entirely different thing. Thank you for taking time in helping me with this.

HI @Ndipbanyan! You are almost right :) We don't want to display the threads that say unsubscribe on the email feed and not remove unsubscribe from the title. What the function normalize_title is just normalizing titles (so removing the [CFP] etc etc.). What we want to do is remove the unsubscribe post from the posts list, so you can parse the list to see if there's a post with unsubscribe in it's title and remove that from the list. Hope this helps!

Let me know if anything is unclear, or if there's anything OCaml related that you don't understand :)

Thank you @gs0510 for the clarity. I will look into implementing this and let you know when I run into any issue understanding anything. Thanks

@gs0510 I have been having issues in trying to run make or make production since I installed the ocaml platform extension on vscode. Below was the error I was getting
Screenshot 2021-04-23 at 07 50 09
I uninstalled the extension then the cohttp-server-lwt ./ocaml.org wouldn't start anymore and running make gives the below error
Screenshot 2021-04-23 at 08 50 02

Please can you help me detect what the problem is?

@Ndipbanyan Both errors are related to omd. Can you run opam show omd to see what version of omd you have?

cohttp-server-lwt ./ocaml.org will work only if your make command is successful.

After runnning opam show omd I got this
Screenshot 2021-04-23 at 12 12 03

The website doesn't work with the latest version of OMD, see issue #1321, you need to downgrade omd to 1.3.1 and it should be okay after that :)

Yes! It works now. Thanks. Got me stuck there for a while.

Also I think I have been able to filter the emails now. My implementation is thus:-
I wrote a regex (for the unsubscribe word) and added an else if block in the must_keep function to exclude any post whose title matches the regex. Is this implementation okay?

Before:

Screenshot 2021-04-23 at 15 01 24

After:

Screenshot 2021-04-23 at 15 02 39

Code snippet (lines 592 and 614)

Screenshot 2021-04-23 at 15 05 46

This looks good @Ndipbanyan, you can make the regex case agnostic so that all kinds of unsubscribes are filtered out. You should also open a PR. :)

Great! I've opened a PR. I used Str.regexp_case_fold as opposed to just Str.regexp so I believe that makes it case agnostic.