/adaptive-log-parser

๐Ÿš€ An intelligent, LLM-enhanced log parser pipeline that converts multi-format raw logs into structured JSON, learns from missed patterns, and evolves using Drain3 & open-source LLMs.

Primary LanguagePythonMIT LicenseMIT

๐Ÿงญ Adaptive Log Parser System with LLM-Driven Intelligence

A smart log processing pipeline where logs โ€” regardless of source, structure, or format โ€” are:

โœ… Automatically analyzed and understood
๐Ÿง  Matched against known or discovered structures
๐Ÿ“ฆ Converted into clean JSON for downstream use (RAG, dashboards, alerts)
๐Ÿ” Continuously improved by learning from what it fails to parse


๐Ÿš€ Phase-Wise Implementation Roadmap

โœ… Phase 1: Rule-Based Multi-Pattern Log Parser

Status: โœ… Implemented

  • Uses manually defined regex patterns for known formats (Apache, Syslog, SSH, etc.)
  • Converts matching log lines into JSONL
  • Logs that do not match are skipped and stored separately

๐Ÿ”„ Phase 2: Feedback-Aware Parser with Skipped Log Tracker

Goal: Track all unmatched lines for improvement

Features:

  • Saves unparsed lines to SkippedLogs/
  • Records file name and line number for traceability
  • Enables continuous learning and correction

๐Ÿง  Phase 3: LLM-Assisted Pattern Discovery

Goal: Dynamically extract structure from unknown log formats using open-source LLMs like Mistral, Gemma, or LLaMA3.

Steps:

  • Pass skipped lines to an LLM with a prompt like:
    You are a log analysis assistant. Given the following log line, extract:
    - timestamp
    - level
    - message
    Return the output as JSON.
    
  • Cache and validate LLM outputs
  • Add to training or deployable pattern bank

Benefits:

  • Removes the need for new regexes
  • Handles unstructured, unknown, or mixed-format logs

๐Ÿงฌ Phase 4: Self-Training Log Template Miner (Drain3 / Spell)

Goal: Automatically learn templates and clusters from logs

Features:

  • Use Drain3 to:
    • Discover static and dynamic fields
    • Group logs into clusters
    • Mine templates like User * logged in from *
  • Store mined templates for downstream use or learning
  • Use clustering insights to guide new pattern or anomaly detection

โ™ป๏ธ Phase 5: Autonomous Parser Evolution Engine

Goal: Build a self-improving parser system

How:

  • Reprocess skipped lines periodically
  • Generate new patterns from LLM or Drain3
  • Validate outputs with scoring or confidence thresholds
  • Add verified patterns to live_parser_patterns.json

๐Ÿ“ˆ Optional Enhancements

Feature Description
๐Ÿงช Accuracy scoring Manual or LLM-assisted evaluation
๐Ÿง  Confidence thresholds Auto-accept LLM outputs above threshold
๐Ÿ“Š Parsing dashboard Visualize logs parsed, templates learned, anomalies
๐Ÿ” Secure fine-tuning Handle PII-sensitive logs privately
๐Ÿ’ฌ RAG-based querying Ask questions from logs via embedded vector DB

โœ… Log Intelligence Pipeline Diagram

graph TD
  A[Raw Logs] --> B[Regex-based Parser]
  B -->|Parsed| C[JSONL Logs]
  B -->|Skipped| D[SkippedLogs/]
  D --> E[LLM Analysis & Labeling]
  D --> F[Drain3 Template Mining]
  E --> G[Auto-Generated Patterns]
  F --> G
  G --> H[Updated Parser Patterns]
  H --> B
  C --> I[RAG / Vector DB]
Loading

๐Ÿ“ Suggested Folder Structure

log-parser-intelligent/
โ”œโ”€โ”€ logs/                  # Raw input logs
โ”œโ”€โ”€ ParsedLogs/           # Parsed JSONL files
โ”œโ”€โ”€ SkippedLogs/          # Unmatched logs with trace info
โ”œโ”€โ”€ Anomalies/            # Drain3-flagged anomalies
โ”œโ”€โ”€ Patterns/
โ”‚   โ”œโ”€โ”€ live_parser_patterns.json
โ”‚   โ””โ”€โ”€ learned_templates.json
โ”œโ”€โ”€ llm_prompts/
โ”‚   โ””โ”€โ”€ log_schema_extraction.txt
โ”œโ”€โ”€ vectorstore/          # For RAG embeddings
โ”œโ”€โ”€ drain3_snapshot.json  # Template cluster snapshot
โ””โ”€โ”€ README.md             # This file

๐Ÿ› ๏ธ Setup & Usage

  1. Clone this repo
  2. Install dependencies:
    pip install drain3 openai chromadb
  3. Run the multi-parser:
    python parse_logs.py --input ./logs --output ./ParsedLogs
  4. Run LLM-assist:
    python enrich_with_llm.py --input ./SkippedLogs --output ./ParsedLogs

๐Ÿ™‹ Contributing

Want to add new patterns, LLM prompt styles, or vector search capabilities?
Feel free to fork and raise a PR.


๐Ÿง  Credits & Stack

  • Drain3
  • ChromaDB
  • Open-source LLMs: Mistral / Gemma / LLaMA3 via Ollama
  • Inspired by real-world log intelligence & observability challenges

๐Ÿ“ฌ Contact

Feel free to connect for ideas, issues or collaborations:

  • Maintainer: @mrsahiljaiswal
  • Email: sahiljaiswal757@gmail.com (Replace with your real contact)