Toimik.RobotsProtocol

.NET 8 C# robots.txt parser and a C# Robots Meta Tag / X-Robots-Tag parser.

Features

RobotsTxt.cs

Creates instance via string or stream
Parses standard, extended, and custom fields:
- User-agent
- Disallow
- Crawl-delay
- Sitemap
- Allow (Toggle-able; Can be ignored if needed)
- Others (e.g. Host)
Supports misspellings of fields
Matches wild cards in paths (* and $)

RobotsTag.cs

Parses custom fields

Quick Start

Installation

Package Manager

PM> Install-Package Toimik.RobotsProtocol

.NET CLI

> dotnet add package Toimik.RobotsProtocol

Usage

Snippets are shown below.

Refer to demo programs in samples folder for complete source code.

RobotsTxt.cs (for parsing robots.txt)

var robotsTxt = new RobotsTxt();

// Load content of a robots.txt from a String
var content = "...";
_ = robotsTxt.Load(content);

// Load content of a robots.txt from a Stream
// var stream = "...";
// _ = await robotsTxt.Load(stream);

var isAllowed = robotsTxt.IsAllowed("autobot", "/folder/file.htm"};

RobotsTag.cs (for parsing robots meta tag / x-robots-tag)

var robotsTag = new RobotsTag();

// This data is either retrieved from Robots Meta Tag (e.g. <meta name="badbot"
// content="none"> or X-Robots-Tag HTTP response header (e.g. X-Robots-Tag: otherbot:
// index, nofollow). 
var data = ...;

// Words treated as the name of directives with values (e.g. max-snippet: 10).
var specialWords = new HashSet<string>
{
    "max-snippet",
    "max-image-preview",

    // ... Add accordingly
};

// Load the data to parse. This will extract every directive into their own Tag class
_ = robotsTag.Load(data, specialWords);

var hasNone = robotsTag.HasTag("autobot", "none");
var hasNoIndex = robotsTag.HasTag("autobot", "noindex");
var isIndexable = !hasNone && !hasNoIndex;

toimik/RobotsProtocol

Toimik.RobotsProtocol

Features

RobotsTxt.cs

RobotsTag.cs

Quick Start

Installation

Package Manager

.NET CLI

Usage

RobotsTxt.cs (for parsing robots.txt)

RobotsTag.cs (for parsing robots meta tag / x-robots-tag)