Text-based matching

Question

Text-based matching

Opened this issue 7 months ago · 12 comments

basically this with auto_assert: assert capture_log(fn -> Logger.error(msg) end) =~ msg

Answer 1 · 2024-05-12T17:08:23.000Z

The crux of this is somehow integrating with the text-match operator =~.

I think that auto_assert_capture_log is too specific. If support is added for text matching, I'd like it to be usable for any string comparison.

One option would be to allow a <~ operator to be used with auto_assert. Whereas =~ works as text =~ substring or regex, <~ would be substring or regex <~ text.

Here's a quick worked example. The goal is to test that "some error message" occurs in the logs.

# initial state
auto_assert capture_log(fn -> Logger.error("some error message") end)

# after first run
auto_assert "\e[31m\n12:50:27.941 [error] some error message\n\e[0m" <-
              capture_log(fn -> Logger.error("some error message") end)

# manually edit to use <~ and match less of the message
auto_assert "[error] some error message" <~
              capture_log(fn -> Logger.error("some error message") end)

The biggest issue with this is that it's not obvious what to do if the value no longer matches. Possibly the best we can do is replace it with the entire captured log again and require you to rewrite it.

# message changed
auto_assert "[error] some error message" <~
              capture_log(fn -> Logger.error("some error MESSAGE") end)

# after running
auto_assert "\e[31m\n12:50:27.941 [error] some error MESSAGE\n\e[0m" <~
              capture_log(fn -> Logger.error("some error MESSAGE") end)

# still have to manually edit to remove timestamp and only assert what you care about
auto_assert "[error] some error MESSAGE" <~
              capture_log(fn -> Logger.error("some error MESSAGE") end)

There are some heuristics that could be used to guess which part of the string you care about and suggest a more intelligent alternative. I might be able to leverage String.myers_difference/2 to find better suggestions, e.g. if the myers_difference "pattern" is [:ins, :eq, ..., :eq, :ins], that means that something in the middle of the asserted text changed, and that's probably the bit you care about. Might also be able to use regular expressions to omit common prefixes/suffixes like escape sequences and timestamps.

Given that, perhaps any string value that starts and ends with likely-ignorable formatting content could result in a pattern suggestion using <~. For instance:

defp example do
  "\e[31m\n12:50:27.941 [error] some error MESSAGE\n\e[0m"
end

test "example/0" do
  auto_assert example()

  # running yields the following two suggestions
  auto_assert "[error] some error MESSAGE" <~ example()
  auto_assert "\e[31m\n12:50:27.941 [error] some error MESSAGE\n\e[0m" <- example()
end

Answer 2 · 2024-05-14T16:48:15.000Z

@tcoopman I have a somewhat minimal version of this implemented in the text-match-operator branch. If you have time to try it out, I'd really appreciate any feedback!

# dep
{:mneme, github: "zachallaun/mneme", ref: "text-match-operator"}

Answer 3 · 2024-05-14T18:43:17.000Z

I'll try to look at it tomorrow, I'll keep you posted

Answer 4 · 2024-05-15T09:09:11.000Z

Some feedback:

when you match the full string you switch from <~ to <-. I didn't notice that at first.
on the full match you also start using """, but """ don't seem to work with <~

So that was weird / unexpected for me.

For the rest it feels nice, maybe adding regexes could be useful, but on the other hand I'm not sure it's worth the extra value.

Answer 5 · 2024-05-15T13:29:55.000Z

Some feedback:

1. when you match the full string you switch from `<~` to `<-`. I didn't notice that at first.

I agree that the difference between the two operators is subtle.

An alternative that, after some reflection, I think I like more is to introduce "matchers" that can go on the left-hand side of <- and that change the behavior of the match. Concretely:

auto_assert text("bar") <- "foo bar baz"

This is also consistent with how I'm planning to handle file snapshots (#72).

2. on the full match you also start using `"""`, but `"""` don't seem to work with `<~`

I think the issue here is that the """ string you're using is actually equivalent to "multiple workshops found for\n", but that newline isn't present in the string, so Mneme regenerates the result. Try adding a \ to the end of the line, which suppresses the newline:

auto_assert """
            multiple workshops found for\
            """
            <~ """
            [error] multiple workshops found for .....
            """

For the rest it feels nice, maybe adding regexes could be useful, but on the other hand I'm not sure it's worth the extra value.

Regexes do currently work, but you have to add them yourself and Mneme doesn't generate them. I don't plan to add regex generation -- that seems like a can of worms that I don't want to open.

Answer 6 · 2024-05-15T13:50:57.000Z

I'm not sure yet whether text/1 is the right name, but I'm about to push a change that removes substring <~ expr in favor of text(substring) <- expr.

Answer 7 · 2024-05-15T13:59:01.000Z

is it intentional that the text matcher is not used for exact matches?

Answer 8 · 2024-05-15T14:07:12.000Z

Yes, that's intentional (for now). The idea is that if you're doing an exact match, you want to know if anything in value changes. If Mneme generated text for exact string matches and then the string changed because something was prepended or appended to it, the test case would still succeed.

The current "rules" for when text() is generated by Mneme are:

The expression evaluates to a string
The string has "ignorable content" at the beginning or end, where ignorable content is currently things like whitespace, dates/timestamps, and terminal escape characters
After stripping "ignorable content", the remaining content is a single line

At least that last one is likely to change because there's nothing that fundamentally prevents multi-line """ strings inside text(), but there might be some additional restrictions like no ignorable content in the middle of the text. For instance, in this case, there's no good way to "strip" the escape characters from the middle of the text:

In these cases, what you'd likely want to do instead is split the captured log on newlines and then do regular assertions, like:

logged = capture_log(...) |> String.split("\n")

assert Enum.any?(logged, &(&1 =~ "this is a warning"))

Answer 9 · 2024-05-15T14:08:36.000Z

Though, if you really want to stay in Mneme-land, we could theoretically introduce a new contains() as well that asserts some pattern is present in an enumerable, so the above could be:

auto_assert contains(text("this is a warning")) <- capture_log(...) |> String.split("\n")

Answer 10 · 2024-05-15T14:11:47.000Z

What about: exact_match, matches, substring_match, text_match?

to be clear, removing the text if you have an exact match is fine for me as well, but you'll need to document it :-)

Answer 11 · 2024-05-15T14:17:28.000Z

I could get behind substring_match as a better and more obvious/explicit name than text, for sure!

I don't know about exact_match or matches since they would be a no-op, i.e. the following would be exactly the same:

auto_assert exact_match("foo") <- "foo"
auto_assert "foo" <- "foo"

Agreed that it should be documented! Right now there are some docs about generated patterns, but they're not comprehensive. I should write up a guide about how and why patterns are generated/changed that I can link to from various places.

Answer 12 · 2024-05-15T16:18:27.000Z

Wrote a new guide on pattern generation and selection to replace the small section it had in the overview. This will be a good place to add documentation for this feature.