essandess/adblock2privoxy

easylist go.*. rule breaks many sites

Closed this issue · 5 comments

Taking care of double rules is not enough as there even single rules which by using .*. break more sites than intended

#ab2p-block-request-R1304
{+client-header-tagger{ab2p-block-request-R1304} \
}
# |http://go.$domain=nowvideo.sx (easylist.txt: 46984)
go.*.

Following is setting header for sites as imasdk.googleapis.com

WORKAROUND:
Use sed -i -e '/^go\.\*\./s/^/#/' /etc/privoxy/ab2p.action to disable this rule

P.S. Rulesets I created after all fixes/workarounds so far still use .*. ~1200 times. Almost all other actually seem less harmless with exception of promo.*. wich does come from easylist.txt as well.

Trying to fix this issue I did some testing for it and this is what I found out:

||log. - original adblock record
^log.*. - converted with fix from #23

This is still not right. After fix it would not catch frazes with blog.mypage.com, but still would catch stuff like loggingintothepage.mypage.com.

The only proper combination I found was ^log\.(*PRUNE).*? as this would catch log.mypage.com, but not loggingintothepage.mypage.com.

Proposed solution is to change all instances of . into \. even in hostnames not only in patterns like it is now and change = lst : "*." into = lst : "(*PRUNE).*?"

While changing the latter was easy in the adblock2privoxy code not knowing haskell I am not sure how to changed it within the code for dots and was able to do so partially only with sed -i -e '/\./{/^\^/s/\./\\./}' afterwords - change instances of dot into \dot but only for lines starting with ^.
My attempts to fix this in the code failed so far and help fixing it is welcomed.

After a bit of trial and error I come with this. Not only it compiles but also seems to work just like expected :) It is combined with previous patch for #23

diff -Naur adblock2privoxy-9999.old/adblock2privoxy/src/PatternConverter.hs adblock2privoxy-9999/adblock2privoxy/src/PatternConverter.hs
--- adblock2privoxy-9999.old/adblock2privoxy/src/PatternConverter.hs    2018-07-23 14:45:40.829753697 +0200
+++ adblock2privoxy-9999/adblock2privoxy/src/PatternConverter.hs        2018-07-23 14:47:28.325970392 +0200
@@ -34,20 +34,22 @@
             | otherwise = "/"
         host' = case host of
                     "" -> ""
-                    _  -> changeFirst.changeLast $ host
+                    _  -> changeFirst.changeMiddle.changeLast $ host
                     where
                     changeLast []     = []
                     changeLast [lst]
                         | lst == '|' || lst `elem` hostSeparators   =  []
-                        | lst == '*' || lst == '\0'                 =  "*."
-                        | otherwise                                 =  lst : "*."
+                        | lst == '*' || lst == '\0'                 =  "(*PRUNE).*?"
+                        | otherwise                                 =  lst : "(*PRUNE).*?"
                     changeLast (c:cs) = c : changeLast cs
 
+                    changeMiddle = replace "." "\\."
+
                     changeFirst []    = []
                     changeFirst (first:cs)
                         | first == '*'                       =       '.' :  '*'  : cs
                         | bindStart == Hard || proto /= ""   =             first : cs
-                        | bindStart == Soft                  =       '.' : first : cs
+                        | bindStart == Soft                  =       '^' : first : cs
                         | otherwise                          = '.' : '*' : first : cs
 
         query' = case query of

@wmyrda I’m honestly still swamped with other projects, but am starting to think about thinking about addressing all the great issues you’ve raised. Rather than work through these linearly, would you please triage what you believe to be the most important issues?

Also, you raised compiler issues in another thread. That one perhaps is the most fundamental because the code refactoring should be done in such a way that it isn’t undone by a version upgrade.

It looks like this may be one the highest priority issues to address. Would you Please weigh in? Note that in markdown you can refer to stuff easily with e.g. #19 #19 links.

Please do not feel like I am pushing You to do stuff, so definitely you may address them whenever You desire.
To make it easier follow what is important I will create another issue which would summarize all open bugs along with my subjective importance (low/medium/severe) and scope of required work (trivial/normal/high).

For compiler issue I think help is coming.

Fixed. See comments in #10.