JayBizzle/Crawler-Detect

Separate repo for the regular expressions?

joshua-bn opened this issue · 1 comments

The commonality across all of the fork projects is, or at least should be, the regular expressions. Of course different languages use different regex libraries, but they are often similar enough that they don't require much to make them work.

In my case, I would love to have these rules for my WAF and Nginx. But, I don't want to have to curate the list of rules and exclusions.

Then each project only has to be concerned with converting those expressions to their regular expression library and creating a check of that against a user agent.

Ultimately, we all share the same goal of identifying bots. Very often that's so we can reduce or eliminate them from our sites and combat SPAM. Lessening the work for maintainers helps everyone.

There are two formats that I can think of that would work well. First one is the simplest - new line delimited PCRE expressions. I can't think of a good reason not to go down this route.

The other is a JSON array of objects. JSON is usable in every language. It could have additional data added to it. For instance, if the parent project wanted to support something other than POSIX regex, one property of it would be the POSIX regex and the other whatever it was. It could have modifiers or something else. It could also have a name or information about the bot added to it. Not sure there's much use in it, but it's an option.

Hi,

Thanks for the feedback.

This has been discussed before in brief. It is definitely a decent idea. For some background, this project started out of our own need and grew from there, but the motivation for us to split this repo up isn't high enough for us to go that route.

We did however, start exporting the REGEX patterns so people could pull them into their project as required, for example like this...

We would definitely accept pull requests to add other export formats as required.

See export.php

Thanks again 👍