/rss_parser

Generic RSS parser for Rust supporting files, streams, and HTTP responses

Primary LanguageRustMIT LicenseMIT

RSS Parser

A high-performance, generic RSS parser for Rust that supports streaming parsing from any async input source including files, TCP streams, HTTP responses, and more.

Features

  • 🚀 Generic AsyncRead Support: Parse RSS from files, TCP streams, HTTP responses, or any AsyncRead source
  • 🔄 Streaming Interface: Implements tokio_stream::Stream for memory-efficient processing of large feeds
  • 📊 Gradual Parsing: Process RSS items one at a time without loading entire feed into memory
  • 🏷️ CDATA Support: Handles both regular text content and CDATA sections
  • 🔤 Case Insensitive: Robust parsing of RSS feeds with inconsistent tag casing
  • Async/Await: Built on tokio for high-performance async I/O
  • 🛡️ Type Safe: Leverage Rust's type system with custom RSS item structures

Quick Start

Add this to your Cargo.toml:

[dependencies]
rss-parser = "0.1.0"
tokio = { version = "1.0", features = ["full"] }
quick-xml = "0.31"
tokio-stream = "0.1"

Usage

Define Your RSS Item Structure

use rss_parser::{GradualRssItem, XmlNode};

#[derive(Debug)]
struct Article {
    title: Option<String>,
    description: Option<String>,
    link: Option<String>,
    pub_date: Option<String>,
}

impl GradualRssItem for Article {
    fn init() -> Self {
        Article {
            title: None,
            description: None,
            link: None,
            pub_date: None,
        }
    }

    fn populate(&mut self, node: XmlNode) {
        match node.tag.as_str() {
            "title" => self.title = node.value.or(node.cdata),
            "description" => self.description = node.value.or(node.cdata),
            "link" => self.link = node.value.or(node.cdata),
            "pubdate" => self.pub_date = node.value.or(node.cdata),
            _ => {}
        }
    }
}

Parse from File

use rss_parser::RssParser;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut parser = RssParser::<Article, _>::from_file("feed.xml").await?;
    
    while let Some(article) = parser.next().await {
        println!("Title: {:?}", article.title);
        println!("Link: {:?}", article.link);
    }
    
    Ok(())
}

Parse from HTTP Response

use reqwest;
use rss_parser::RssParser;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let response = reqwest::get("https://example.com/feed.xml").await?;
    let stream = response.bytes_stream();
    
    // Convert bytes stream to AsyncRead
    let reader = tokio_util::io::StreamReader::new(
        stream.map(|result| result.map_err(std::io::Error::other))
    );
    
    let mut parser = RssParser::<Article, _>::new(reader).await?;
    
    while let Some(article) = parser.next().await {
        println!("Article: {:?}", article);
    }
    
    Ok(())
}

Parse from TCP Stream

use tokio::net::TcpStream;
use rss_parser::RssParser;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let stream = TcpStream::connect("example.com:80").await?;
    let parser = RssParser::<Article, _>::from_tcp(stream).await?;
    
    // Use as stream
    use tokio_stream::StreamExt;
    let articles: Vec<Article> = parser.collect().await;
    
    println!("Parsed {} articles", articles.len());
    Ok(())
}

Using as a Stream

use tokio_stream::StreamExt;
use rss_parser::RssParser;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let parser = RssParser::<Article, _>::from_file("feed.xml").await?;
    
    // Process articles as they're parsed
    parser
        .for_each(|article| async move {
            println!("Processing: {:?}", article.title);
            // Process article...
        })
        .await;
    
    Ok(())
}

Advanced Usage

Custom Input Sources

The parser accepts any type implementing AsyncRead + Unpin:

use std::io::Cursor;
use rss_parser::RssParser;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let rss_data = r#"<?xml version="1.0"?>
    <rss version="2.0">
        <channel>
            <item>
                <title>Example Article</title>
                <description>This is an example</description>
            </item>
        </channel>
    </rss>"#;
    
    let cursor = Cursor::new(rss_data.as_bytes());
    let mut parser = RssParser::<Article, _>::new(cursor).await?;
    
    if let Some(article) = parser.next().await {
        println!("Parsed: {:?}", article);
    }
    
    Ok(())
}

Filtering and Processing

use tokio_stream::StreamExt;

let parser = RssParser::<Article, _>::from_file("feed.xml").await?;

let recent_articles: Vec<Article> = parser
    .filter(|article| {
        // Filter articles based on some criteria
        article.title.as_ref().map_or(false, |title| title.contains("Rust"))
    })
    .take(10)  // Take only first 10 matching articles
    .collect()
    .await;

API Reference

RssParser<T, R>

The main parser struct, generic over:

  • T: Your RSS item type implementing GradualRssItem
  • R: The input source implementing AsyncRead + Unpin

Methods

  • new(input: R) -> Result<Self, std::io::Error>: Create parser from any AsyncRead source
  • from_file(path: &str) -> Result<Self, std::io::Error>: Convenience constructor for files
  • from_tcp(stream: TcpStream) -> Result<Self, std::io::Error>: Convenience constructor for TCP streams
  • next(&mut self) -> Option<T>: Parse and return the next RSS item
  • Implements Stream<Item = T> for use with tokio-stream

GradualRssItem Trait

Implement this trait for your RSS item structures:

pub trait GradualRssItem {
    fn init() -> Self;
    fn populate(&mut self, node: XmlNode);
}

XmlNode

Represents a parsed XML node:

pub struct XmlNode {
    pub tag: String,        // The XML tag name (lowercase)
    pub value: Option<String>,   // Text content
    pub cdata: Option<String>,   // CDATA content
}

Performance

The parser is designed for high performance and low memory usage:

  • Streaming: Processes RSS items one at a time, not loading entire feed into memory
  • Zero-copy: Minimizes string allocations where possible
  • Async: Non-blocking I/O for handling multiple feeds concurrently

Error Handling

The parser uses Rust's standard error handling patterns:

  • Constructor methods return Result<RssParser<T, R>, std::io::Error>
  • next() returns Option<T> - None indicates end of feed or parse error
  • Malformed XML is handled gracefully, skipping problematic sections when possible

Requirements

  • Rust 1.75+
  • tokio runtime
  • quick-xml for XML parsing
  • tokio-stream for Stream implementation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Running Tests

cargo test

Running Examples

cargo run --example basic_usage
cargo run --example http_parsing  
cargo run --example stream_processing

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

v0.1.0

  • Initial release
  • Generic AsyncRead support
  • Stream trait implementation
  • File and TCP convenience constructors
  • CDATA support
  • Case-insensitive parsing

Related Projects

  • quick-xml - Fast XML parser used internally
  • tokio - Async runtime
  • feed-rs - Alternative feed parser with more format support

Note: This parser is specifically designed for RSS feeds. For Atom feeds or other syndication formats, consider using a more comprehensive feed parsing library.