essandess/adblock2privoxy

version string of filter is truncated if it not composed only of digits

Closed this issue · 6 comments

This may be trivial, but in case it is somehow connected to #15 seems wise to report it as well
adblock_anty-dotacje.txt version is 100.2 and adblock2privoxy allows only digits. As the result it truncates the version to 100
adblock_adguard.txt current version is 364.2 and result in file is 364

EDIT: Looking at the source version number is treated as Integer which must be a whole number. Probably switching here to floating point is required to accept digits after dot.

I added this code to address the issue, 6fec1a3:

       (<++>) a b = (++) <$> a <*> b
         (<:>) a b = (:) <$> a <*> b
         number = many1 digit
         subnumber = char '.' <:> number
         versionnumber = number <|> number <++> subnumber
         versionParser = (\x -> info{_version = read x}) <$> (string "Version: " *> versionnumber)

Ideally, this should be the parser many1 digit `sepBy` char '.' to get things like version 8.4.3, but this gives the type error, which I haven't followed through with yet:

    • Couldn't match type ‘[Char]’ with ‘Char’
      Expected type: Text.Parsec.Prim.ParsecT s u m String
        Actual type: Text.Parsec.Prim.ParsecT s u m [[Char]]
    • In the second argument of ‘(<$>)’, namely
        ‘(string "Version: " *> (many1 digit `sepBy` char '.'))’
      In the expression:
        (\ x -> info {_version = read x})
          <$> (string "Version: " *> (many1 digit `sepBy` char '.'))
      In an equation for ‘versionParser’:
          versionParser
            = (\ x -> info {_version = read x})
                <$> (string "Version: " *> (many1 digit `sepBy` char '.'))
   |
93 |         versionParser = (\x -> info{_version = read x}) <$> (string "Version: " *> (many1 digit `sepBy` char '.'))
   |                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@qrilka If you have a moment, I have a educational Haskell question I’ve been unable to tackle.

We would like to parse version numbers like 8.4.3.

The original code’s parser used many1 digit, and would grab the 8, but not the .4.3.

I thought that the obvious fix would be to replace many1 digit With many1 digit `sepBy` char ‘.’. But this fails to compile on a mismatched type of String versus [[Char]]. So I hacked in the code block above that would parse 8.4, and omit the .3.

Why doesn’t the sepBy work here?

@essandess I'm not sure I understand your point about "not work" here - sepBy works just as it's supposed to work - it constructs a list of values in the end with separator excluded, i.e. for "8.4.3" you'll get ["8","4","3"]. I see _version is an Integer and I wonder how could you store a multicomponent version there, something line [Int] would be more sensible if number of version components is not fixed (though I don't yet know how you use that information)

Thanks again for the Haskell n00b pointers @qrilka!

What I mean is that the code fragment many1 digit `sepBy` char ‘.’ does not compile in this statement:

-- versionnumber = many1 digit -- this compiles!
versionnumber = many1 digit `sepBy` char ‘.’  -- this doesn't compile!!! 
-- versionnumber = (++) <$> many1 digit `sepBy` char ‘.’  -- this doesn't compile either!!! 
versionParser = (\x -> info{_version = read x}) <$> (string "Version: " *> versionnumber)

What's the correct sepBy (or equivalent) parser that will grab the string "8.4.3" from a line that looks like:

Version: 8.4.3

Just to keep it simple, I'd like to parse the string alone, and ignore that fact that it is comprised of things that could be cast as Int type.

In Python, this would be something like '.'.join(["8","4","3"]), after the parser found the "Version: 8.4.3" and sepBy converted it to ["8","4","3"].

@qrilka Thanks again for the pointer. I got it:

intercalate "." <$> many1 digit `sepBy` char '.'

I didn't write anything here today :)
Theoretically you could do that in the parser already though it looks not quite pleasant:

λ> parse (do{d1 <- many1 digit; dotDs <- many1 $ (:) <$> char '.' <*> many1 digit; return $ concat (d1:dotDs)}) "" "18.4.3"
Right "18.4.3"