This utility recursively scans the current directory, collects the content of files, and copies the combined content along with a file tree structure to your clipboard. It reports the total number of tokens used, utilizing the tiktoken-go
library for token counting. You can specify which files to include or ignore through command-line arguments.
- Recursive Scanning: Walks through directories starting from the current location.
- File Inclusion/Exclusion: Supports specifying patterns to include or ignore.
- Configurable Token Limits: Total token limit and maximum tokens per file are configurable via command-line arguments.
- Token Counting: Ensures the collected content doesn't exceed token limits.
- Clipboard Copying: Automatically copies the collected content to your clipboard.
- File Tree Generation: Generates a file tree structure of the collected files.
- Concurrency: Efficient processing using goroutines.
- Go Language: Version 1.16 or higher
- Git: To clone the repository
-
Install Go (if not already installed):
brew install go
-
Clone the Repository:
[git clone https://github.com/yourusername/collect-files-content.git](https://github.com/2mawi2/collect.git) cd collect/
-
Build the Executable:
go build -o collect
-
Move Executable to PATH:
sudo mv collect /usr/local/bin/
-
Make Executable Globally Accessible:
If you're using Fish Shell or any other shell, ensure
/usr/local/bin
is in yourPATH
. For Fish Shell:set -Ua fish_user_paths /usr/local/bin
-
Install Go:
Download and install Go from the official website.
-
Clone the Repository:
git clone https://github.com/yourusername/collect-files-content.git cd collect-files-content
-
Build the Executable:
go build -o collect.exe
-
Add Executable to PATH:
Move
collect.exe
to a directory that's in yourPATH
, or add the directory containingcollect.exe
to yourPATH
environment variable.
Run the collect
command in the directory you want to scan.
collect [options]
-
-include
: (Optional) Comma-separated list of file extensions or patterns to include.Example:
collect -include=".go,.txt"
-
-ignore
: (Optional) Comma-separated list of patterns to ignore.Example:
collect -ignore="testdata,*.md"
-
-gitignore
: (Optional) Parse.gitignore
files to exclude patterns. Defaults totrue
. Set tofalse
to ignore.gitignore
.collect -gitignore=false
-
-maxTotalTokens
: (Optional) Maximum total tokens allowed in the collected content. Defaults to128000
.Example:
collect -maxTotalTokens=50000
-
-maxFileTokens
: (Optional) Maximum tokens allowed per file. Files exceeding this limit will be skipped. Defaults to50000
.Example:
collect -maxFileTokens=20000
-
-tree
: (Optional) Show a tree view of token usage per file and directory, including percentages.Example:
collect -tree
-
Collect all files with default token limits:
collect
-
Set a custom total token limit:
collect -maxTotalTokens=100000
-
Set a custom maximum tokens per file:
collect -maxFileTokens=30000
-
Include only specific file types and set token limits:
collect -include=".go,.md,.txt" -maxTotalTokens=80000 -maxFileTokens=25000
-
Ignore specific directories or files:
collect -ignore="vendor,node_modules,*.test.go"
-
Do not parse
.gitignore
:collect -gitignore=false
-
Show token usage tree:
collect -tree
This will display a tree view like:
📁 ./ (100%) 📁 src/ (45.2%) 📄 main.go (2000 tokens, 25.3%) 📄 utils.go (1500 tokens, 19.9%) 📁 tests/ (54.8%) 📄 main_test.go (4000 tokens, 54.8%)
-
Scanning: The script walks through the current directory recursively, respecting the include and ignore patterns provided.
-
File Processing:
- Skips directories and files matching ignore patterns.
- Includes files matching the include patterns.
- Skips binary files and files larger than 1 MB.
- Reads file content and counts tokens using
tiktoken-go
. - Skips files exceeding the
maxFileTokens
limit.
-
Token Counting:
- Uses
tiktoken-go
to tokenize file content. - Ensures the total tokens do not exceed
maxTotalTokens
.
- Uses
-
Content Collection:
- Builds a string containing the file tree and contents.
- Formats each file with its relative path and content.
-
Copy to Clipboard:
- Copies the collected content to the system clipboard.
- Supports both macOS (
pbcopy
) and Linux (xclip
).
-
Output:
- Prints the total number of tokens used.
- Alerts if the token limit is reached or files are skipped.
-
No External Dependencies: Aside from Go and
tiktoken-go
, no additional installations are required. -
Clipboard Support:
- On macOS,
pbcopy
is used (which is available by default). - On Windows, clipboard copying is not implemented in this script. Integration can be added if needed.
- On Linux systems,
xclip
is required. Install it via your package manager.
- On macOS,
-
Clipboard Not Working:
-
Ensure
pbcopy
(macOS) orxclip
(Linux) is installed and accessible. -
For Linux, install
xclip
:sudo apt-get install xclip
-
-
Token Limit Reached:
- Adjust the
-maxTotalTokens
command-line argument to set a higher limit. - Include fewer files or more specific patterns.
- Adjust the
-
Files Skipped Due to Token Size:
- Adjust the
-maxFileTokens
command-line argument to set a higher per-file token limit.
- Adjust the
-
Binary Files Detected as Text:
- Ensure that binary files have appropriate extensions or are properly detected.
- Modify the
isBinaryFile
function if necessary.
-
Change Default Token Limits:
Modify the
defaultMaxTotalTokens
anddefaultMaxFileTokens
constants at the top of the script.const ( defaultMaxTotalTokens = 128000 // Adjust as needed defaultMaxFileTokens = 50000 // Adjust as needed )
-
Adjust Max File Size:
Modify the
maxFileSize
constant.const maxFileSize = 1 * 1024 * 1024 // 1 MB
-
Default Ignore Patterns:
Update the
defaultIgnorePatterns
slice with any additional patterns you wish to ignore by default.
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
This project is licensed under the MIT License.
Disclaimer: Ensure you comply with your organization's policies and any relevant laws when collecting and copying file content.