fimad/scalpel

Negate Selector?

Closed this issue · 10 comments

How do I select tags that lack a certain class, or in general fails to satisfy an AttributePredicate?
This would be simple if there was some function like,

notP :: AttributePredicate -> AttributePredicate
notP (MkAttributePredicate f) = MkAttributePredicate (not . f)

but MkAttributePredicate isn't exposed.

Is there another way to do this, or should I make a pull request about adding something like notP?

That's what I've done for now:

cls <- attr "class" 
guard $ cls /= "collapse_topic"

Because the AttributePredicates don't expose their booeans, I can't use them directly and so I had to redo the check myself. In fact, I didn't do a very good job here because I didn't split the class by spaces and check if my class was an element in that list, which goes to show that this isn't trivial.

Also, I don't think this is as clean as

chroot (span :@ [ :!: hasClass "collapse_topic"])

or something like that. It doesn't make sense to me to duplicate the functionality already implemented in the AttributePredicates every time you want to negate something.

jezen commented

@jbaum98 Ah yes, I see the point. I made this test case:

#!/usr/local/bin/stack
-- stack runghc --resolver lts-6.24 --install-ghc --package scalpel-0.4.0

{-# LANGUAGE OverloadedStrings #-}

import Text.HTML.Scalpel

exampleHtml :: String
exampleHtml = "<html>\
\    <body>\
\      <ul class='comments'>\
\        <li class='comment'>This comment should appear</li>\
\        <li class='comment'>We also want this one</li>\
\        <li class='comment bad-comment'>We don't like this one</li>\
\        <li class='comment bad-comment'>This is also bad</li>\
\      </ul>\
\    </body>\
\</html>"

main :: IO ()
main = print $ scrapeStringLike exampleHtml goodComments

goodComments :: Scraper String [String]
goodComments = texts $ "li" @: [ hasClass "comment", (not $ hasClass "bad-comment") ]

…and instead of seeing either correct or incorrect output, we get a compiler error:

scalpel.hs:24:55:
    Couldn't match expected type ‘AttributePredicate’
                with actual type ‘Bool’
    In the expression: (not $ hasClass "bad-comment")
    In the second argument of ‘(@:)’, namely
      ‘[hasClass "comment", (not $ hasClass "bad-comment")]’
    In the second argument of ‘($)’, namely
      ‘"li" @: [hasClass "comment", (not $ hasClass "bad-comment")]’

scalpel.hs:24:61:
    Couldn't match expected type ‘Bool’
                with actual type ‘AttributePredicate’
    In the second argument of ‘($)’, namely ‘hasClass "bad-comment"’
    In the expression: (not $ hasClass "bad-comment")

👍 for a PR, but @fimad seems a little inactive at the moment.

fimad commented

👍 I'd be happy to accept a pull request adding negation of predicates.

I think the reason that I didn't originally expose MkAttributePredicate was that I was concerned about the forall in the type definition and how that would force users to have to deal with TagSoup string conversion.

data AttributePredicate
        = MkAttributePredicate
                (forall str. TagSoup.StringLike str => TagSoup.Attribute str
                                                    -> Bool)

I'm having a problem with the pull request. Because the AttributePredicates hold functions that take both an attribute key and value, a simple negation won't do the trick:

checkPred (hasClass "foo") ("class", "foo") -- True ✓
checkPred (notP $ hasClass "foo") ("class", "foo") -- False ✓
checkPred (notP $ hasClass "foo") ("class", "bar") -- True ✓
checkPred (notP $ hasClass "foo") ("id", "blah") -- True ✗

Attributes can fail to match for two reasons: either they have the correct attribute key but the value fails to satisfy the meaning of the predicate, or the attribute key is incorrect and the predicate doesn't apply. I'm not sure how to get around the fact that these are tied up without somehow changing MkAttributePredicate.

jezen commented

@jbaum98 I may be deeply confused, but I'm not sure why you've indicated failure for the last assertion. It seems to be doing what I expect.

Sorry, I think I was confused about the underlying problem with my failing test. I'll be upfront: I have the following failing test cases:

scrapeTest
    "<a>foo</a><a class=\"a b\">bar</a><a class=\"b\">baz</a>"
    (Just ["foo", "baz"])
    (texts ("a" @: [notP $ hasClass "a"]))
scrapeTest
    "<a>foo</a><a B=C>bar</a><a B=D>baz</a>"
    (Just ["foo", "baz"])
    (texts ("a" @: [notP $ "b" @= "C"]))

Both are giving only "baz", which means both are failing on the node with no attributes. However, if you add some nonsense attribute like d=e to it, the test passes.

Now what I think is happening is that in this line, it's applying or to the result of the predicate applied to all the attributes, which in this case is an empty list, so we get or [] = False. Usually, this is the desired behavior because we want a node without any attributes to fail assertions that it possesses some desired attribute like hasClass. However, in this case, we want it to pass unless it possesses the undesired attribute. I'm not sure how to fix this because notP isn't even getting called because there are no attributes. Any thoughts?

fimad commented

Hrm it seems like we would want to apply the not after the or but before the and of that line.

While the representation of a predicate is Attribute -> Bool from an API perspective it has semantics more like [Attribute] -> Bool. You could make the representation match the semantics and pull the or [checkPred p attr | attr <- attrs] bit into the definition of each of the existing attribute predicates. Then when you go to negate a predicate the not will get placed in between the and and the or calculations.

fimad commented

Pushed new release (0.4.1) with these changes.