EXCEL-TO-MARKDOWN is a robust Python tool designed to convert Excel files (.xlsx and .xls) into well-formatted Markdown tables. Leveraging a modular architecture, this tool offers enhanced table detection capabilities, interactive prompts for handling complex Excel layouts, and seamless integration with various project workflows.
- Automated Table Detection: Identifies the first fully populated row as the table header, ensuring accurate Markdown conversion.
- Interactive Mode: Prompts users to specify table regions when automatic detection fails, handling complex and irregular Excel structures.
- Modular Design: Organized into distinct modules for detection, parsing, Markdown generation, and utilities, promoting maintainability and scalability.
- Supports Multiple Sheets: Processes all sheets within an Excel file, generating separate Markdown files for each.
- Flexible Column Specification: Allows users to define column ranges using both letter-based (e.g.,
A:D) and number-based (e.g.,1-4) inputs. - Unit Tested: Comprehensive unit tests ensure reliability and facilitate future enhancements.
- Easy Integration: Compatible with Poetry for dependency management and can be integrated into larger projects or CI/CD pipelines.
EXCEL-TO-MARKDOWN
โ
โโโ .venv
โโโ data
โ โโโ input
โ โโโ output
โโโ docs
โโโ excel_to_markdown
โ โโโ __init__.py
โ โโโ main.py
โ โโโ detector.py
โ โโโ parser.py
โ โโโ markdown_generator.py
โ โโโ utils.py
โโโ src
โโโ tests
โ โโโ test_detector.py
โ โโโ test_parser.py
โ โโโ test_markdown_generator.py
โ โโโ test_main.py
โโโ .gitignore
โโโ LICENSE
โโโ poetry.lock
โโโ pyproject.toml
โโโ readme.md
-
excel_to_markdown/main.py: Entry point of the application. Handles argument parsing, orchestrates the workflow, and manages file I/O.detector.py: Contains functions related to detecting the table start within Excel sheets.parser.py: Handles parsing user inputs, such as column specifications.markdown_generator.py: Responsible for converting pandas DataFrames to Markdown format.utils.py: Utility functions like column letter to index conversion and filename sanitization.
-
tests/test_detector.pytest_parser.pytest_markdown_generator.pytest_main.py
Each test file contains unit tests for their respective modules, ensuring functionality and reliability.
-
Python 3.7+: Ensure you have Python installed. You can download it from python.org.
-
Poetry: Python dependency management tool. Install it using the following command:
curl -sSL https://install.python-poetry.org | python3 -
git clone https://github.com/yourusername/EXCEL-TO-MARKDOWN.git
cd EXCEL-TO-MARKDOWNPoetry manages virtual environments automatically. To install dependencies:
poetry installTo activate the virtual environment:
poetry shell-
Input Directory: Place all your Excel files (
.xlsxor.xls) in thedata/inputdirectory. -
Output Directory: The converted Markdown files will be saved in the
data/outputdirectory by default. If this directory doesn't exist, the script will create it.
data/input: Directory containing your Excel files.data/output: (Optional) Directory where Markdown files will be saved. If not specified, anoutputfolder will be created inside the input directory.
You can also start a localhost server for real-time editing using:
poetry run appThis will start a server on your localhost, allowing you to make edits to your spreadsheets locally and see immediate updates.
Execute the main script over CLI using the following command:
python -m excel_to_markdown.main data/input data/outputExample:
python -m excel_to_markdown.main data/input data/outputFor each sheet in each Excel file:
-
Automatic Detection:
- The script attempts to detect the header row based on the enhanced logic (first fully populated row).
- If successful, it proceeds to convert without prompts.
-
Manual Specification:
- If automatic detection fails, you'll be prompted to enter:
- Header Row Number: The row where your table headers are located (1-based index).
- Columns to Include: Specify the range of columns, e.g.,
A:Dor1-4.
- If automatic detection fails, you'll be prompted to enter:
Sample Interaction:
Processing sheet: 'Sales Data' in file 'report1.xlsx'
Automatically detected table starting at row 2.
Markdown file 'report1_Sales_Data.md' for sheet 'Sales Data' has been created successfully.
Processing sheet: 'Summary' in file 'report1.xlsx'
Automatic table detection failed.
Enter the header row number (1-based index): 5
Enter the columns to include (e.g., A:D or 1-4): B:E
Markdown file 'report1_Summary.md' for sheet 'Summary' has been created successfully.
Contributions are welcome! To contribute:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeatureName
-
Commit Your Changes
git commit -m "Add some feature" -
Push to the Branch
git push origin feature/YourFeatureName
-
Open a Pull Request
Please ensure that your contributions adhere to the existing code style and include relevant tests.
Unit tests are located in the tests/ directory. To run the tests:
poetry run pytestEnsure that you have the virtual environment activated via Poetry.
This project is licensed under the GPLv3.
For any inquiries or support, please contact devin.r.liu@gmail.com.
Happy Converting! ๐