/file2md

File2MD: A Micro File Parser Service. Friendly to LLM.

Primary LanguagePythonOtherNOASSERTION

File2MD

English | δΈ­ζ–‡

MedicNex AI Python FastAPI Docker License PaddleOCR GitHub contributors GitHub Repo stars

File2MD is a FastAPI-based microservice that converts 123 file formats (Word, PDF, PowerPoint, Excel, CSV, images, audio, video, Apple iWork suite, 82 programming languages, etc.) into unified Markdown code block format, which is LLM-friendly.

Features

  • πŸ” API Key Authentication: Support for multiple API key management
  • πŸ“„ Comprehensive Format Support: Supports 123 file formats with 16 parser types
  • πŸ’» Code File Support: Supports 82 programming languages file conversion, covering mainstream, functional, scripting, and configuration languages
  • πŸ–ΌοΈ Smart Image Recognition: Integrated Vision API and PaddleOCR, supports SVG to PNG recognition
  • ⚑ High-Performance Async: Based on FastAPI async framework
  • πŸš€ Queue Processing Mode: Supports batch document conversion, limit the maximum number of concurrent tasks (configurable via .env)
  • 🎯 Concurrent Image Processing: Multiple images in documents are processed simultaneously with PaddleOCR and AI vision recognition, improving processing speed by 2-10x. The limit can be configured via MAX_IMAGES_PER_DOC (-1 for no limit).
  • 🐳 Containerized Deployment: Provides Docker and Docker Compose support
  • πŸ“Š Unified Output Format: All file types output in unified code block format

WebUI (Beta)

  • πŸ–₯️ Built-in WebUI: Out-of-the-box web interface for file upload, API Key management, document conversion, and OCR recognition.
  • 🌐 Language Switch: Supports both English and δΈ­ζ–‡ interface.
  • πŸ“¦ No extra deployment: Access at http://localhost:8999/webui after backend is started.

Demo:

WebUI Demo


πŸ“‘ Table of Contents

πŸ“‹ Unified Output Format

All file conversion results use a unified code block format for easy LLM understanding and processing:

πŸ“ File Type 🏷️ Output Format πŸ“„ Content Example
🎞️ Slideshow Files ```slideshow PowerPoint/Keynote presentation content
πŸ–ΌοΈ Image Files ```image OCR text recognition + AI visual description
πŸ“ Plain Text Files ```text Original text content
πŸ“„ Document Files ```document Word/PDF/Pages structured content
πŸ“Š Spreadsheet Files ```sheet CSV/Excel/Numbers data tables
🎡 Audio Files ```audio Speech transcription + timeline information
🎬 Video Files ```video SRT subtitles + audio transcription
πŸ’» Code Files ```python/javascript/... etc Syntax highlighted code blocks

πŸ“‚ Supported File Formats

πŸ“„ Document and Data Files

Format Extensions Parser Output Format Description
Plain Text .txt, .md, .markdown, .text PlainParser text Direct text content reading
Word Documents .docx DocxParser document Extract text, tables, and formatting, concurrent image processing
Word Documents .doc DocParser document Convert via mammoth, concurrent image processing
RTF Documents .rtf RtfParser document Support RTF format, prefer Pandoc, fallback to striprtf
ODT Documents .odt OdtParser document OpenDocument text, support tables and lists
PDF Documents .pdf PdfParser document Extract text and images, concurrent image processing
Keynote Presentations .key KeynoteParser slideshow Apple Keynote presentations, extract metadata and structure
Pages Documents .pages PagesParser document Apple Pages word processing documents, extract metadata and structure
Numbers Spreadsheets .numbers NumbersParser sheet Apple Numbers spreadsheets, support table data extraction
PowerPoint Presentations .ppt, .pptx PptxParser slideshow Extract slide text content (no vision model)
Excel Spreadsheets .xls, .xlsx ExcelParser sheet Convert to HTML table format and statistics, concurrent image processing
CSV Data .csv CsvParser sheet Convert to HTML table format and data analysis
Image Files .png, .jpg, .jpeg, .gif, .bmp, .tiff, .webp, .ico, .tga ImageParser image PaddleOCR and vision recognition
SVG Files .svg SvgParser svg Recognize both code structure and visual features, convert to PNG for PaddleOCR and AI vision analysis
Audio Files .wav, .mp3, .mp4, .m4a, .flac, .ogg, .wma, .aac AudioParser audio Smart speech analysis and ASR conversion with intelligent segmentation
Video Files .mp4, .avi, .mov, .wmv, .mkv, .webm, .3gp AudioParser video Video audio extraction and subtitle generation with SRT format output

πŸ’» Code Files (82 Languages)

Language Category Supported Extensions Output Format
Mainstream Languages .py, .js, .ts, .java, .cpp, .c, .cs, .go, .rs, .php, .rb Corresponding language code blocks
Frontend Technologies .html, .css, .scss, .sass, .less, .vue, .jsx, .tsx, .svelte Corresponding language code blocks
Script Languages .r, .lua, .perl, .pl, .sh, .bash, .zsh, .fish, .ps1 Corresponding language code blocks
Configuration Files .json, .yaml, .yml, .toml, .xml, .ini, .cfg, .conf Corresponding language code blocks
Database and Others .sql, .dockerfile, .makefile, .cmake, .gradle, .proto, .graphql Corresponding language code blocks
Functional Languages .hs, .lhs, .clj, .cljs, .elm, .erl, .ex, .exs, .fs, .fsx Corresponding language code blocks
System and Tools .vim, .vimrc, .env, .gitignore, .gitattributes, .editorconfig Corresponding language code blocks

Complete List: Python, JavaScript, TypeScript, Java, C/C++, C#, Go, Rust, PHP, Ruby, R, HTML, CSS, SCSS, Sass, Less, Vue, React(JSX), Svelte, JSON, YAML, XML, SQL, Shell scripts, PowerShell, Dockerfile, Makefile, Haskell, Clojure, Elm, Erlang, Elixir, F#, Swift, Kotlin, Dart, Julia, MATLAB, LaTeX, Vim, and 82 other languages.

πŸš€ Quick Start

We provide four deployment options to choose from:

🐳 Docker Image One-Click Deployment (Recommanded)

Download the latest docker image medicnex-file2md.tar from the GitHub Releases, configure .env in the same directory, then run the following command:

#!/bin/bash

# Check if image exists
if ! docker images | grep -q "medicnex-file2md:latest"; then
    echo "Importing image..."
    docker load -i medicnex-file2md.tar
fi

# Stop and remove old container (if exists)
docker stop medicnex-file2md 2>/dev/null || true
docker rm medicnex-file2md 2>/dev/null || true

# Start new container
docker run -d --name medicnex-file2md -p 8999:8999 \
  -v $(pwd)/.env:/app/.env \
  medicnex-file2md:latest

echo "Service started, visit http://localhost:8999/docs"

or

chmod +x docker_image_deploy.sh
./docker_image_deploy.sh

This will deploy and start Docker with one click.

Check Health Status

curl http://localhost:8999/v1/health

View Real-time Logs

docker logs -f medicnex-file2md

Stop Docker

docker stop medicnex-file2md

Restart Docker

docker restart medicnex-file2md

🐳 Using Docker Compose

A simple deployment method with one-click automated deployment:

  1. Clone the project:
git clone https://github.com/MedicNex/file2md.git
cd file2md
  1. One-click deployment:
# Automated deployment (recommended)
./docker-deploy.sh

This script will automatically:

  • Check Docker environment
  • Generate secure API keys and Redis password
  • Build Docker images
  • Start all services (API + Redis + optional Nginx)
  • Perform health checks
  1. Access services:
  1. Manage services:
# Check service status
./docker-deploy.sh status

# View real-time logs
./docker-deploy.sh logs

# Restart services
./docker-deploy.sh restart

# Stop services
./docker-deploy.sh stop

πŸ’» Manual Docker Compose Deployment

If you need custom configuration:

  1. Configure environment variables:
# Copy environment variable template
cp .env.example .env

# Edit .env file, set your configurations
API_KEY=your-api-key-1,your-api-key-2
VISION_API_KEY=your-vision-api-key  # Optional, for image recognition
REDIS_PASSWORD=your-redis-password
  1. Start services:
# Basic deployment
docker-compose up -d

# Include Nginx reverse proxy
docker-compose --profile with-nginx up -d

πŸ› οΈ One-Click Deployment Script (Traditional Method)

Suitable for direct deployment on Linux servers:

  1. Configure environment variables:
cp .env.example .env
# Edit .env file
  1. Execute deployment:
# Ubuntu 24.04 server deployment
sudo ./deploy.sh
  1. View logs:
./monitor_logs.sh

πŸ’» Local Development Environment

Standard Method

  1. Install dependencies:
pip install -r requirements.txt
  1. Install system dependencies:

Ubuntu/Debian:

# PaddleOCR will automatically download required models on first use
# No additional OCR system dependencies needed, PaddleOCR is pure Python

# SVG vision recognition support (ImageMagick recommended)
sudo apt-get install -y imagemagick libmagickwand-dev pkg-config

# Audio processing support
sudo apt-get install -y ffmpeg libavcodec-extra

# Python development tools
sudo apt-get install -y python3-dev python3-pip build-essential

macOS:

# SVG vision recognition support (choose one)
brew install freetype imagemagick  # ImageMagick support
# or
brew install cairo pkg-config  # Cairo support

# Audio processing support
brew install ffmpeg  # Audio format conversion and processing
  1. Set environment variables:
export API_KEY="dev-test-key-123"
export VISION_API_KEY="your-vision-api-key"  # optional
  1. Start the service:
python -m uvicorn app.main:app --host 0.0.0.0 --port 8999 --reload

macOS Quick Deployment (Without Docker)

If you're experiencing slow Docker deployment on macOS, you can use these steps for direct local deployment:

  1. Create virtual environment:
python -m venv venv
source venv/bin/activate
  1. Install dependencies:
# First install base tools
pip install --upgrade pip setuptools wheel

# Install core dependencies individually (to avoid version conflicts)
pip install fastapi uvicorn pydantic python-multipart starlette
pip install loguru python-dotenv

# Install specific versions of PaddleOCR and PaddlePaddle (to resolve compatibility issues)
pip install paddlepaddle==2.5.2
pip install paddleocr==2.7.0

# Then install other dependencies
pip install -r requirements.txt --no-deps
  1. Install system dependencies:
# SVG vision recognition support (choose one)
brew install freetype imagemagick  # ImageMagick support
# or
brew install cairo pkg-config  # Cairo support

# Audio processing support
brew install ffmpeg  # Audio format conversion and processing

# Note: PaddleOCR will automatically download required models on first use
# On macOS, PaddleOCR is a pure Python implementation with no additional system dependencies
# However, it will download approximately 1GB of model files on first run, ensure good network connection
  1. Configure environment variables: Create a .env file in the project root directory with necessary configurations:
DEBUG=true
PORT=8999
MAX_CONCURRENT=5
LOG_LEVEL=INFO
REDIS_CACHE_ENABLED=false  # Set to false if Redis cache is not needed
API_KEY=your_api_key_here  # If API key authentication is required
# If you need vision API functionality, add the following configuration
# VISION_API_KEY=your_vision_api_key
  1. Start the service:
python -m app.main

Or start directly with uvicorn:

uvicorn app.main:app --host 0.0.0.0 --port 8999 --reload

On first startup, PaddleOCR will automatically download and cache required model files (approximately 1GB), which may take some time depending on your network speed. Subsequent starts will be faster once the models are cached.

Note: If you encounter an Unknown argument: use_gpu error during startup, this is due to PaddleOCR version compatibility issues. Use these specific versions to resolve:

pip uninstall -y paddleocr paddlepaddle
pip install paddlepaddle==2.5.2
pip install paddleocr==2.7.0
  1. Optional: Redis cache: If you need Redis cache functionality, install Redis using Homebrew:
brew install redis
brew services start redis

Then enable Redis in your .env file:

REDIS_CACHE_ENABLED=true
REDIS_HOST=localhost
REDIS_PORT=6379

🎡 Audio and Video Processing Features

Audio File Processing

Supported Formats: .wav, .mp3, .mp4, .m4a, .flac, .ogg, .wma, .aac (8 formats)

Core Features:

  • 🎯 Smart Audio Preprocessing: Automatic conversion to 16kHz mono, apply 80Hz high-pass filter to remove low-frequency noise
  • πŸ“Š RMS Energy Analysis: Calculate RMS of audio signal for precise voice activity detection
  • πŸ”„ Adaptive Threshold Detection: Dynamic threshold based on 10th percentile + 3dB, adapts to different recording environments
  • βœ‚οΈ Smart Segmentation: 300ms minimum silence duration, automatic merging of short segments
  • ⚑ Concurrent ASR Conversion: Multiple audio segments processed simultaneously for ASR, significantly improving processing speed
  • πŸ“ˆ Quality Assessment: Confidence scores based on average energy calculation

Video File Processing

Supported Formats: .mp4, .avi, .mov, .wmv, .mkv, .webm, .3gp (7 formats)

Core Features:

  • 🎬 Automatic Audio Extraction: Smart detection and extraction of audio tracks from video files
  • πŸ“ SRT Subtitle Generation: Generate standard timestamp format subtitles (HH:MM:SS,mmm)
  • πŸ”„ Unified Processing Pipeline: Reuse audio analysis algorithms for consistent processing quality
  • πŸ“Š Timeline Synchronization: Precise timestamp correspondence ensuring subtitle-video sync

Technical Configuration

Environment Variable Configuration:

# ASR service configuration
ASR_MODEL=whisper-1                    # ASR model name
ASR_API_BASE=https://api.openai.com/v1 # ASR API base URL
ASR_API_KEY=your-openai-api-key        # ASR API key

# Audio processing parameters
MAX_FILE_SIZE=100                      # Maximum file size (MB)
AUDIO_CONCURRENT_LIMIT=5               # Concurrent ASR requests

System Dependencies:

# Audio processing libraries (required)
pip install pydub numpy librosa

# Audio format support (optional, for more formats)
# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS  
brew install ffmpeg

Performance Optimization

  • Concurrent Processing: Multiple audio segments processed simultaneously for ASR, 3-5x speed improvement
  • Smart Segmentation: Avoid cutting in the middle of words, improving recognition accuracy
  • Adaptive Threshold: Dynamically adjust detection parameters based on audio characteristics
  • Memory Optimization: Stream processing for large files, avoiding memory overflow
  • Error Recovery: Automatic fallback to time-based segmentation when ASR fails

πŸ”— API Usage Guide

πŸ“€ Single File Conversion (Sync Mode)

curl -X POST "https://your-domain/v1/convert" \
  -H "Authorization: Bearer your-api-key" \
  -F "file=@example.py"

Response example (Python file):

{
  "filename": "example.py",
  "size": 1024,
  "content_type": "text/x-python",
  "content": "```python\ndef hello_world():\n    print('Hello, World!')\n```",
  "duration_ms": 150
}

πŸ“¦ Batch File Conversion (Async Queue Mode)

Use queue mode to batch submit multiple files. The number of concurrent connections can be controlled by MAX_CONCURRENT in .env:

curl -X POST "https://your-domain/v1/convert-batch" \
  -H "Authorization: Bearer your-api-key" \
  -F "files=@document1.docx" \
  -F "files=@image1.png" \
  -F "files=@script.py"

Response example:

{
  "submitted_tasks": [
    {
      "task_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "message": "Task submitted to conversion queue",
      "filename": "document1.docx",
      "status": "pending"
    }
  ],
  "total_count": 3,
  "success_count": 3,
  "failed_count": 0
}

πŸ“‹ Query Task Status

curl -X GET "https://your-domain/v1/task/{task_id}" \
  -H "Authorization: Bearer your-api-key"

πŸ“Š Query Queue Status

curl -X GET "https://your-domain/v1/queue/info" \
  -H "Authorization: Bearer your-api-key"

Response example:

{
  "max_concurrent": 5,
  "queue_size": 2,
  "active_tasks": 3,
  "total_tasks": 10,
  "pending_count": 2,
  "processing_count": 3,
  "completed_count": 4,
  "failed_count": 1
}

Response example (Image file):

{
  "filename": "chart.png", 
  "size": 204800,
  "content_type": "image/png",
  "content": "```image\n# OCR:\nChart Title: Sales Data Analysis\n\n# Visual_Features:\nThis is a bar chart showing monthly sales trends...\n```",
  "duration_ms": 2500
}

Response example (SVG file):

{
  "filename": "icon.svg",
  "size": 1024,
  "content_type": "image/svg+xml",
  "content": "```svg\n# Code\n<code class=\"language-svg\">\n<svg xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 24 24\">\n  <path d=\"M12 2l3.09 6.26L22 9.27l-5 4.87 1.18 6.88L12 17.77l-6.18 3.25L7 14.14 2 9.27l6.91-1.01L12 2z\"/>\n</svg>\n</code>\n\n# Visual_Features: This is a five-pointed star icon with clean line design, perfectly symmetrical star shape, suitable for rating or favorite functionality.\n```",
  "duration_ms": 3200
}

πŸ” Image OCR Recognition (OCR Only)

curl -X POST "https://your-domain/v1/ocr" \
  -H "Authorization: Bearer your-api-key" \
  -F "file=@document.png"

Description:
Recognize text in uploaded images using OCR only (PaddleOCR), without calling Vision API.

Supported formats: JPG, JPEG, PNG, BMP, TIFF, TIF, GIF, WEBP

Request parameters:

Name Type Required Description
file File Yes Image file to recognize

Response example:

{
  "filename": "document.png",
  "size": 204800,
  "content_type": "image/png",
  "ocr_text": "This is the recognized text from the image\nSupports multiple lines\nSupports English and Chinese",
  "duration_ms": 1200,
  "from_cache": false
}

Response fields:

  • filename: Original file name
  • size: File size (bytes)
  • content_type: MIME type
  • ocr_text: Recognized text content
  • duration_ms: Processing time (ms)
  • from_cache: Whether the result is from cache

Error example:

{
  "code": "UNSUPPORTED_TYPE",
  "message": "Unsupported file type: .pdf, only image formats are supported: .jpg, .jpeg, .png, .bmp, .tiff, .tif, .gif, .webp"
}

πŸ”— API Endpoints Overview

Endpoint Method Description
/v1/convert POST Single file synchronous conversion
/v1/ocr POST Image OCR recognition (OCR only)
/v1/convert-batch POST Batch file asynchronous submission
/v1/task/{task_id} GET Query task status
/v1/queue/info GET Query queue status
/v1/queue/cleanup POST Clean up expired tasks
/v1/supported-types GET Get supported file types
/v1/health GET Health check (with queue status)

Health Check

curl -X GET "https://your-domain/v1/health"

βš™οΈ Configuration

πŸ”§ Environment Variables

Variable Description Default Required
API_KEY API key list (comma-separated) dev-test-key-123 Yes
VISION_API_KEY Vision API key - No
VISION_API_BASE Vision API base URL https://api.openai.com/v1 No
VISION_MODEL Vision recognition model name gpt-4o-mini No
ASR_API_KEY ASR speech recognition API key - Required for audio features
ASR_API_BASE ASR API base URL https://api.openai.com/v1 No
ASR_MODEL ASR model name whisper-1 No
PORT Service port 8999 No
LOG_LEVEL Log level INFO No
MAX_IMAGES_PER_DOC Max images processed per document (-1 = unlimited) 5 No

πŸ”‘ API Key Management

  • Supports multiple API keys, separated by commas
  • Use Bearer <API_KEY> format in the Authorization header

❌ Error Handling

HTTP Status Code Error Code Description
401 INVALID_API_KEY Invalid or missing API Key
415 UNSUPPORTED_TYPE Unsupported file type
422 PARSE_ERROR File parsing failed
422 INVALID_FILE Invalid file

⚑ Performance Optimization

  • Asynchronous processing for file uploads and parsing
  • Concurrent Image Processing: Multiple images in documents are processed simultaneously with OCR and AI vision recognition
    • Supported file types: PDF, DOC, DOCX, Excel
    • Performance improvement: 2-10x processing speed (depending on image count and network conditions)
    • Technical implementation: Use asyncio.gather() for concurrent PaddleOCR and vision model calls
  • Automatic temporary file cleanup
  • Memory-optimized streaming processing
  • Support for large file processing
  • Smart encoding detection

πŸ”’ Security Features

  • API Key authentication mechanism
  • File type whitelist validation
  • Secure temporary file cleanup
  • Run as non-root user

πŸ“Š Monitoring and Logging

  • Structured JSON logging
  • Health check endpoints
  • Processing time statistics
  • Error tracking and reporting

πŸ“š More Resources

πŸš€ Redis Deployment Guide

πŸ“– Feature Documentation

πŸ”§ Extension Development

πŸ“ Adding New File Parsers

  1. Inherit from BaseParser class
  2. Implement parse() method
  3. Register in ParserRegistry

Example:

from app.parsers.base import BaseParser

class CustomParser(BaseParser):
    @classmethod
    def get_supported_extensions(cls):
        return ['.custom']
    
    async def parse(self, file_path: str) -> str:
        # Read file content
        with open(file_path, 'r') as f:
            content = f.read()
        
        # Format as code block
        return f"```custom\n{content}\n```"

πŸ—οΈ Architecture Design

πŸ“ Project Structure

medicnex-file2md/
β”œβ”€β”€ 🐳 Docker deployment files
β”‚   β”œβ”€β”€ Dockerfile                    # Docker image build file
β”‚   β”œβ”€β”€ docker-compose.yml           # Docker Compose service orchestration
β”‚   β”œβ”€β”€ docker-deploy.sh             # One-click Docker deployment script
β”‚   └── .dockerignore                # Docker build ignore file
β”œβ”€β”€ πŸ› οΈ Traditional deployment files
β”‚   β”œβ”€β”€ deploy.sh                    # Ubuntu server one-click deployment 
β”‚   └── monitor_logs.sh              # Log monitoring script
β”œβ”€β”€ βš™οΈ Configuration files
β”‚   β”œβ”€β”€ .env.example                 # Environment variable template
β”‚   β”œβ”€β”€ requirements.txt             # Python dependencies
β”‚   β”œβ”€β”€ LICENSE                      # Apache License 2.0
β”‚   └── README.md                    # Project documentation (this file)
└── πŸ“± Application core
    └── app/
        β”œβ”€β”€ main.py                  # FastAPI application entry
        β”œβ”€β”€ config.py                # Configuration management
        β”œβ”€β”€ auth.py                  # API Key authentication
        β”œβ”€β”€ models.py                # Pydantic data models
        β”œβ”€β”€ vision.py                # Vision recognition service
        β”œβ”€β”€ queue_manager.py         # Queue manager
        β”œβ”€β”€ cache.py                 # Redis cache management
        β”œβ”€β”€ utils.py                 # Utility functions
        β”œβ”€β”€ exceptions.py            # Exception handling
        β”œβ”€β”€ routers/
        β”‚   └── convert.py           # Conversion API routes
        └── parsers/                 # πŸ”§ Parser modules (16 parsers)
            β”œβ”€β”€ base.py              # Parser base class
            β”œβ”€β”€ registry.py          # Parser registry
            β”œβ”€β”€ audio.py             # Audio/video parser (smart chunking + ASR)
            β”œβ”€β”€ code.py              # Code file parser (82 languages)
            β”œβ”€β”€ pdf.py               # PDF parser
            β”œβ”€β”€ doc.py               # Word DOC parser (legacy)
            β”œβ”€β”€ docx.py              # Word DOCX parser
            β”œβ”€β”€ excel.py             # Excel parser
            β”œβ”€β”€ pptx.py              # PowerPoint parser
            β”œβ”€β”€ csv.py               # CSV parser
            β”œβ”€β”€ numbers.py           # Apple Numbers parser
            β”œβ”€β”€ keynote.py           # Apple Keynote parser
            β”œβ”€β”€ pages.py             # Apple Pages parser
            β”œβ”€β”€ image.py             # Image parser
            β”œβ”€β”€ svg.py               # SVG parser
            β”œβ”€β”€ markdown.py          # Markdown parser
            β”œβ”€β”€ odt.py               # OpenDocument text parser
            β”œβ”€β”€ rtf.py               # RTF document parser
            └── txt.py               # Text parser

🐳 Containerized Architecture

Docker Service Components:

  • file2md-api: Main API service, integrating PaddleOCR and all parsers
  • redis: Cache service, improving conversion performance and queue management
  • nginx: Reverse proxy service (optional, recommended for production)

Data Persistence:

  • paddleocr_models: PaddleOCR model files persistence
  • redis_data: Redis data persistence
  • temp_files: Temporary file storage
  • app_logs: Application log persistence

πŸ“„ License

This project is released under the Apache License 2.0.

Copyright 2025 MedicNex

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

🀝 Contributing

We warmly welcome community contributions! Here's how you can participate:

πŸ› Report Issues

  • Report bugs on the Issues page
  • Provide detailed error information and reproduction steps
  • Include your environment information (OS, Python version, etc.)

πŸ’‘ Feature Suggestions

  • Propose new features on the Issues page
  • Describe use cases and expected effects
  • Discuss feasibility of implementation approaches

πŸ”§ Code Contributions

  1. Fork this repository
  2. Create feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add some amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Submit Pull Request

🎯 Contribution Areas

  • πŸ”§ New Parsers: Add support for new file formats
  • πŸš€ Performance Optimization: Improve processing speed and memory efficiency
  • πŸ“š Documentation Improvement: Enhance usage guides and API documentation
  • 🐳 Deployment Optimization: Improve Docker and deployment scripts
  • πŸ§ͺ Test Enhancement: Increase test coverage

For more detailed information, please refer to Contributing Guide.

Thank you for your attention and contribution to the File2MD project! πŸ™

πŸŽ‰Contributers


πŸ“ˆ Latest Updates

v2.6.0 (Latest)

  • 🐳 Complete Docker Support: Brand new containerized deployment solution
    • Dockerfile: Optimized image based on Ubuntu 24.04, including all PaddleOCR dependencies
    • docker-compose.yml: Complete service orchestration including API, Redis, Nginx
    • docker-deploy.sh: One-click automated deployment script with automatic secure key generation
    • Data Persistence: Persistent storage for PaddleOCR models, Redis data, and logs
    • Health Checks: Built-in service health monitoring and automatic recovery
    • Resource Limits: Reasonable memory and CPU limit configurations
    • Security Configuration: Non-root user execution, automatic strong key generation
  • πŸ“‹ Documentation Optimization: Reorganized deployment guide with three deployment options
  • πŸ”§ Architecture Description: Updated project structure description with clear Docker-related files

v2.5.0

  • OCR engine switched from Tesseract to PaddleOCR, improving recognition accuracy

v2.4.0

  • 🎡 Audio and Video Processing Features: Brand new audio/video file processing support
    • Audio Format Support: .wav, .mp3, .mp4, .m4a, .flac, .ogg, .wma, .aac (8 formats)
    • Video Format Support: .mp4, .avi, .mov, .wmv, .mkv, .webm, .3gp (7 formats)
    • Smart Audio Preprocessing: 16kHz mono conversion, 80Hz high-pass filtering for noise removal
    • RMS Energy Analysis: Precise voice detection based on signal RMS
    • Adaptive Threshold: 10th percentile + 3dB dynamic threshold, adapts to different environments
    • Smart Segmentation Algorithm: 300ms minimum silence detection, automatic short segment merging
    • Concurrent ASR Conversion: Multiple audio segments simultaneously processed for speech recognition, 3-5x speed improvement
    • SRT Subtitle Generation: Automatic standard timestamp subtitle generation for video files
    • Quality Assessment: Confidence calculation and quality metrics based on energy
  • πŸ“Š Statistics Update: Supported formats increased from 109 to 123, added AudioParser
  • πŸ”§ Dependency Enhancement: Added pydub, numpy, librosa audio processing library support

v2.3.0

  • πŸ“± Apple iWork Support: Added support for Apple iWork suite
    • Keynote (.key): Presentation files, extract metadata and structure, output as slideshow format
    • Pages (.pages): Word processing documents, extract metadata and structure, output as document format
    • Numbers (.numbers): Spreadsheet files, support table data extraction, output as sheet format
    • Smart Parsing: Numbers files prioritize numbers-parser library for complete table data extraction, fallback to basic parsing
  • πŸ“Š Statistics Update: Supported formats increased from 106 to 109, parsers from 13 to 16
  • πŸ”§ Dependency Update: Added numbers-parser==4.4.6 dependency for Numbers file parsing

v2.2.0

  • πŸ“Š Data Update: Complete testing and updated supported format list
    • 109 File Formats: Complete validation of all supported extensions
    • 16 Parsers: Optimized classification and statistics
    • New Documentation: Created detailed Supported Formats List
  • πŸ”§ API Enhancement: /v1/supported-types endpoint returns accurate format information
  • πŸ–ΌοΈ SVG Features: Enhanced SVG to PNG visual recognition (ImageMagick support)
  • πŸ›‘οΈ Security Improvements: Health check API removes sensitive information exposure

v2.1.0

  • ✨ New: Concurrent image processing functionality
    • Multiple images in PDF, DOC, DOCX, Excel documents can now be processed concurrently
    • OCR and AI vision recognition run simultaneously, dramatically improving processing speed
    • Processing speed improved 2-10x (depending on image count)
  • πŸ”§ Optimization: Improved exception handling and error recovery mechanisms
  • πŸ› Fix: Resolved memory issues with large document image processing

πŸš€ File2MD

Star History Chart

Efficient and intelligent file conversion microservice, making AI better understand your documents