/collect

Executable script to recursively collect file contents and display a file tree, copying data to the clipboard and reporting token usage.

Primary LanguageGo

Collect Files Content Utility

This utility recursively scans the current directory, collects the content of files, and copies the combined content along with a file tree structure to your clipboard. It reports the total number of tokens used, utilizing the tiktoken-go library for token counting. You can specify which files to include or ignore through command-line arguments.

Features

  • Recursive Scanning: Walks through directories starting from the current location.
  • File Inclusion/Exclusion: Supports specifying patterns to include or ignore.
  • Configurable Token Limits: Total token limit and maximum tokens per file are configurable via command-line arguments.
  • Token Counting: Ensures the collected content doesn't exceed token limits.
  • Clipboard Copying: Automatically copies the collected content to your clipboard.
  • File Tree Generation: Generates a file tree structure of the collected files.
  • Concurrency: Efficient processing using goroutines.

Installation

Prerequisites

  • Go Language: Version 1.16 or higher
  • Git: To clone the repository

macOS

  1. Install Go (if not already installed):

    brew install go
  2. Clone the Repository:

    [git clone https://github.com/yourusername/collect-files-content.git](https://github.com/2mawi2/collect.git)
    cd collect/
  3. Build the Executable:

    go build -o collect
  4. Move Executable to PATH:

    sudo mv collect /usr/local/bin/
  5. Make Executable Globally Accessible:

    If you're using Fish Shell or any other shell, ensure /usr/local/bin is in your PATH. For Fish Shell:

    set -Ua fish_user_paths /usr/local/bin

Windows

  1. Install Go:

    Download and install Go from the official website.

  2. Clone the Repository:

    git clone https://github.com/yourusername/collect-files-content.git
    cd collect-files-content
  3. Build the Executable:

    go build -o collect.exe
  4. Add Executable to PATH:

    Move collect.exe to a directory that's in your PATH, or add the directory containing collect.exe to your PATH environment variable.

Usage

Run the collect command in the directory you want to scan.

collect [options]

Options

  • -include: (Optional) Comma-separated list of file extensions or patterns to include.

    Example:

    collect -include=".go,.txt"
  • -ignore: (Optional) Comma-separated list of patterns to ignore.

    Example:

    collect -ignore="testdata,*.md"
  • -gitignore: (Optional) Parse .gitignore files to exclude patterns. Defaults to true. Set to false to ignore .gitignore.

    collect -gitignore=false
  • -maxTotalTokens: (Optional) Maximum total tokens allowed in the collected content. Defaults to 128000.

    Example:

    collect -maxTotalTokens=50000
  • -maxFileTokens: (Optional) Maximum tokens allowed per file. Files exceeding this limit will be skipped. Defaults to 50000.

    Example:

    collect -maxFileTokens=20000
  • -tree: (Optional) Show a tree view of token usage per file and directory, including percentages.

    Example:

    collect -tree

Example Commands

  • Collect all files with default token limits:

    collect
  • Set a custom total token limit:

    collect -maxTotalTokens=100000
  • Set a custom maximum tokens per file:

    collect -maxFileTokens=30000
  • Include only specific file types and set token limits:

    collect -include=".go,.md,.txt" -maxTotalTokens=80000 -maxFileTokens=25000
  • Ignore specific directories or files:

    collect -ignore="vendor,node_modules,*.test.go"
  • Do not parse .gitignore:

    collect -gitignore=false
  • Show token usage tree:

    collect -tree

    This will display a tree view like:

    📁 ./ (100%)
      📁 src/ (45.2%)
        📄 main.go (2000 tokens, 25.3%)
        📄 utils.go (1500 tokens, 19.9%)
      📁 tests/ (54.8%)
        📄 main_test.go (4000 tokens, 54.8%)
    

How It Works

  1. Scanning: The script walks through the current directory recursively, respecting the include and ignore patterns provided.

  2. File Processing:

    • Skips directories and files matching ignore patterns.
    • Includes files matching the include patterns.
    • Skips binary files and files larger than 1 MB.
    • Reads file content and counts tokens using tiktoken-go.
    • Skips files exceeding the maxFileTokens limit.
  3. Token Counting:

    • Uses tiktoken-go to tokenize file content.
    • Ensures the total tokens do not exceed maxTotalTokens.
  4. Content Collection:

    • Builds a string containing the file tree and contents.
    • Formats each file with its relative path and content.
  5. Copy to Clipboard:

    • Copies the collected content to the system clipboard.
    • Supports both macOS (pbcopy) and Linux (xclip).
  6. Output:

    • Prints the total number of tokens used.
    • Alerts if the token limit is reached or files are skipped.

Notes

  • No External Dependencies: Aside from Go and tiktoken-go, no additional installations are required.

  • Clipboard Support:

    • On macOS, pbcopy is used (which is available by default).
    • On Windows, clipboard copying is not implemented in this script. Integration can be added if needed.
    • On Linux systems, xclip is required. Install it via your package manager.

Troubleshooting

  • Clipboard Not Working:

    • Ensure pbcopy (macOS) or xclip (Linux) is installed and accessible.

    • For Linux, install xclip:

      sudo apt-get install xclip
  • Token Limit Reached:

    • Adjust the -maxTotalTokens command-line argument to set a higher limit.
    • Include fewer files or more specific patterns.
  • Files Skipped Due to Token Size:

    • Adjust the -maxFileTokens command-line argument to set a higher per-file token limit.
  • Binary Files Detected as Text:

    • Ensure that binary files have appropriate extensions or are properly detected.
    • Modify the isBinaryFile function if necessary.

Customization

  • Change Default Token Limits:

    Modify the defaultMaxTotalTokens and defaultMaxFileTokens constants at the top of the script.

    const (
        defaultMaxTotalTokens = 128000 // Adjust as needed
        defaultMaxFileTokens  = 50000  // Adjust as needed
    )
  • Adjust Max File Size:

    Modify the maxFileSize constant.

    const maxFileSize = 1 * 1024 * 1024 // 1 MB
  • Default Ignore Patterns:

    Update the defaultIgnorePatterns slice with any additional patterns you wish to ignore by default.

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

License

This project is licensed under the MIT License.


Disclaimer: Ensure you comply with your organization's policies and any relevant laws when collecting and copying file content.