AutoFocus

AutoFocus is a companion tool designed to work alongside AutoRecon, providing enhanced analysis of reconnaissance data through AI-powered processing. AutoFocus streamlines the initial engagement phase of penetration testing by focusing on detecting version numbers and identifying vulnerabilities with minimal manual effort.

Features

Automatic File Enumeration: AutoFocus recursively enumerates all files generated by AutoRecon, which are organised into target directories by IP address.
AI-Powered Chunked Data Processing: AutoFocus uses a local Large Language Model (LLM) to analyse files in manageable chunks. This allows for efficient handling of large datasets, especially those generated during extensive reconnaissance efforts.
Task-Based Analysis: The tool utilises configurable tasks, specified via YAML, to define the specific data points that need analysis, such as version numbers and vulnerabilities. Each task is designed to target specific objectives in the data.
Structured JSON Output: All findings are saved in a structured JSON format, with results organised by IP address and task type. This structured output allows for easy integration into other tools or streamlined reporting.
Progress Updates and Error Handling: Provides continuous progress updates during the analysis process and includes error-handling mechanisms to ensure reliable operation during data processing.
Extensible and Customisable: AutoFocus is flexible and can easily be adapted to meet the needs of different reconnaissance scenarios. Users can add or modify tasks by editing the tasks.yaml configuration file.
Advanced Deduplication: Implements sophisticated deduplication techniques, including text normalisation, domain-specific rules, and fuzzy matching to ensure unique and relevant results.
HTML Reporting: Generates an interactive HTML report alongside the JSON output, providing a user-friendly interface for reviewing analysis results.
Logging System: Incorporates a comprehensive logging system for better debugging and tracking of the analysis process.
Blacklist and Whitelist Support: Allows users to specify directories and file types to exclude or include in the analysis process.

How It Works

AutoFocus employs a multi-agent system to process and analyse reconnaissance data:

File Enumeration: The script recursively walks through the directory structure created by AutoRecon, identifying files for analysis.
Data Chunking: Each file is processed using a sliding window approach (default window size of 6000 characters with a step size of 3000 characters), inspired by data preparation techniques used in forecast models. This method ensures efficient handling of large files while maintaining context across overlapping chunks.
Initial Analysis: The InitialAnalysisAgent processes each chunk of data for each defined task. It uses the Ollama API to interact with a local LLM for initial analysis.
Deduplication: The DeduplicationAgent employs advanced techniques to check for duplicate or highly similar results, ensuring that only unique findings are reported. This includes text normalisation, domain-specific rules, and fuzzy matching.
Consolidation: The ConsolidationAgent combines the results for each target and task, updating the JSON output file after processing each file. It also generates an HTML report for easy result visualisation.
Results Output: The final results are saved in both structured JSON format and an interactive HTML report, organised by target IP and task type.

Getting Started

Prerequisites

Python 3.8 or later
A local large language model to enable offline processing using Ollama
AutoRecon for initial data collection

Installation

Clone the repository:

git clone https://github.com/LiterallyBlah/AutoFocus.git
cd AutoFocus

Install the required Python packages:

pip install -r requirements.txt

Usage

Ensure AutoRecon has generated the necessary directories for each target IP.

Run AutoFocus:

python autofocus.py --input /path/to/autorecon/output --output /path/to/output/directory --tasks /path/to/tasks.yml

Configure analysis tasks by editing the tasks.yaml file, where you can define the tasks you want AutoFocus to perform.

Command-line Options

AutoFocus supports various command-line options to customize its behavior. You can view these options by running:

python autofocus.py --help

This will display the following help message:

usage: autofocus.py [-h] -i INPUT [-o OUTPUT] -t TASKS [-b [BLACKLIST ...]]
                    [-bt [BLACKLIST_FILE_TYPES ...]] [-wt [WHITELIST_FILE_TYPES ...]]
                    [-w WINDOW_SIZE] [-s STEP_SIZE]

AutoFocus Agent-Based Analysis System for Recon Data Processing

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to the input directory containing recon data
  -o OUTPUT, --output OUTPUT
                        Directory to save the analysis results (default: output)
  -t TASKS, --tasks TASKS
                        Path to the tasks.yml file containing analysis tasks
  -b [BLACKLIST ...], --blacklist [BLACKLIST ...]
                        List of directories to blacklist from analysis (default: exploit, loot, report)
  -bt [BLACKLIST_FILE_TYPES ...], --blacklist_file_types [BLACKLIST_FILE_TYPES ...]
                        List of file types to blacklist from analysis (e.g., .log, .tmp)
  -wt [WHITELIST_FILE_TYPES ...], --whitelist_file_types [WHITELIST_FILE_TYPES ...]
                        List of file types to whitelist for analysis (e.g., .txt, .json)
  -w WINDOW_SIZE, --window_size WINDOW_SIZE
                        Size of the data chunk window for analysis (default: 500 characters)
  -s STEP_SIZE, --step_size STEP_SIZE
                        Step size for moving through data chunks (default: 400 characters)

Example Command

python autofocus.py --input /path/to/recon/data --output /path/to/output --tasks /path/to/tasks.yml --blacklist exploit loot report --blacklist_file_types .log .tmp

This command processes the reconnaissance data from the specified input directory, saves the results in the output directory, performs the tasks specified in the tasks file, and excludes certain directories and file types from analysis.

Task Configuration

Tasks are specified in a YAML file, and each task requires careful configuration and prompt engineering to achieve optimal results:

name: A unique identifier for the task.
description: A brief description of what the task checks for. This should be clear and specific to guide the LLM's focus.
response: Instructions for the LLM on what to return (e.g., version numbers, vulnerabilities). These instructions need to be fine-tuned to elicit precise and relevant responses.
output: The structure of the result in the final JSON output.
regex (optional): A regular expression pattern to validate and extract structured data from the LLM's response.
blacklist (optional): A list of words to exclude from the results.

Fine-tuning and prompt engineering are crucial for each task:

Precision: Craft prompts that encourage the LLM to provide specific, targeted information.
Consistency: Ensure prompts maintain a consistent format across tasks for easier processing.
Context: Provide enough context in the description to guide the LLM's understanding of the task's purpose.
Iterative Improvement: Regularly review and refine prompts based on the quality of results obtained.
Avoid Ambiguity: Use clear, unambiguous language to prevent misinterpretation by the LLM.

Example (tasks.yaml):

tasks:
  - name: "version_check"
    description: "Identify software version numbers for correlation with known issues. Do not provide any information of the tools used to find the vulnerabilities or the checks performed. Also consider the context of the scan to determine if this is not applicable."
    response: "Return the software name and version numbers in the format: software_name:version_number."
    output: "version_numbers"
    regex: '([A-Z][a-zA-Z]*(?:\s+[A-Z][a-zA-Z]*)*):([A-Za-z0-9!@#$%^&*()_+.,-]+)'
    blacklist: ["nmap", "nikto", "whatweb", "wpscan"]
  - name: "vulnerability_scan"
    description: "The tool output you are given might have highlighted vulnerabilities. Extract the vulnerabilities. Ignore anything that is not highlighted as a vulnerability, such as the time of the scan or uptime of the target."
    response: "Return the following format: The name of the vulnerability:a concise description of the vulnerability:Extract the evidence of the vulnerability."
    output: "vulnerabilities"
    regex: '([A-Za-z0-9\s]+):([^:]+):(.+)'
    blacklist: ["notBefore", "notAfter", "Uptime", "Service Info", "ERROR", "NT_STATUS_ACCESS_DENIED"]

Configuration

AutoFocus uses environment variables for configuration. These can be set in a .env file in the root directory of the project. Here's an example of the contents of the .env file:

OLLAMA_MODEL=qwen2.5
OLLAMA_TIMEOUT=5

OLLAMA_MODEL: Specifies the Ollama model to use for analysis. Default is 'qwen2.5'.
OLLAMA_TIMEOUT: Sets the timeout (in seconds) for Ollama API requests. Default is 5 seconds.

You can adjust these values to suit your specific requirements and the capabilities of your local setup.

Contribution

Contributions are welcome! If you have ideas for improvements, new features, or bug fixes, feel free to open a pull request or raise an issue.

Licence

This project is licensed under the MIT Licence.