Library to provide compliance with the Web Robot Exclusion protocol (robots.txt)
Forked to allow parsing of Googlebot-style pattern matching rules
- Fixed a bug which was causing the test cases to fail
- Adding robots.txt caching.
import com.trigonic.jrobotx.RobotExclusion;
// ...
RobotExclusion robotExclusion = new RobotExclusion();
if (robotExclusion.allows(url, userAgentString)) {
// do something with url
}
To provide a folder to use for caching robots.txt files:
import com.trigonic.jrobotx.RobotExclusion;
// ...
File cacheDir = ...
// ...
RobotExclusion robotExclusion = new RobotExclusion(cacheDir);
if (robotExclusion.allows(url, userAgentString)) {
// do something with url
}