A fast, efficient, and flexible CSS tokenizer that is fail-safe and W3C compliant. Designed for maximum performance, Comot-CSS parses CSS input while handling edge cases gracefully. It is an ideal choice for embedded systems, web tools, or any application needing a reliable and robust CSS tokenizer.
- Efficient CSS Tokenization: A high-performance tokenizer that parses CSS input with minimal overhead.
- Fail-Safe Design: Comot-CSS is built to handle malformed or unexpected CSS gracefully without crashing, making it suitable for production environments where stability is key.
- W3C Compliant: Fully compliant with the W3C CSS specification, ensuring that it accurately tokenizes valid CSS as well as handles invalid or malformed CSS according to the specification.
- Error Handling: Comprehensive error tracking and reporting mechanisms that log parsing issues with detailed information on line and column numbers.
- Extensibility: Designed with modularity in mind, allowing for easy extensions and modifications.
- Memory Efficiency: Optimized for handling large CSS files with minimal memory usage, leveraging arena allocation for fast memory management.
- Fuzz and Unit Testing: Includes both unit and fuzz tests to ensure stability and correctness across a variety of edge cases.
- Cross-Platform: Compatible with all platforms that support C, including Linux, macOS, and Windows.
Before you begin, ensure you have the following dependencies:
- CMake (v3.15 or higher) for building the project
- Git to clone the repository
- A C Compiler (GCC, Clang, etc.)
This project uses arena-allocator as a submodule for efficient memory allocation, which will be fetched automatically via CMake.
To install and build Comot-CSS, follow these steps:
-
Clone the repository:
git clone https://github.com/dev-charles15531/comot-css.git cd comot-css -
Create a build directory:
mkdir build cd build -
Run CMake to configure the project:
cmake ..
-
Build the project:
cmake --build . -
(Optional) To build tests:
cmake --build . --target tests
The Comot-CSS project is organized as follows:
comot-css/
├── CMakeLists.txt # Root CMake configuration
├── LICENSE # Project license
├── README.md # Project documentation
├── examples/ # Example projects using the tokenizer
│ ├── main.c # Main example code
│ └── CMakeLists.txt # CMake configuration for examples
├── include/ # Public headers
│ ├── comot-css/ # All the public header files
│ │ ├── diag.h
│ │ ├── error.h
│ │ ├── tokenizer.h
│ │ └── tokens.h
├── src/ # Core tokenizer implementation
│ ├── CMakeLists.txt # CMake configuration for source files
│ ├── tokenizer/ # Tokenizer-related files
│ └── utils/ # Utility functions (error handling, etc.)
├── tests/ # Unit and fuzz tests
│ ├── unit/ # Unit tests for the tokenizer
│ ├── fuzz/ # Fuzz testing for edge cases
├── run_fuzz_test.sh # Script to run fuzz tests
└── run_unit_test.sh # Script to run unit tests
include/contains public header files for the tokenizer and utility functions.src/includes the core implementation of the tokenizer and utility code.tests/contains unit and fuzz tests to ensure correctness and robustness.
The core of Comot-CSS is the Tokenizer—a finite state machine (FSM) designed to tokenize CSS input in compliance with the W3C specification. The tokenizer ensures that the CSS is broken down into tokens that represent valid CSS syntax (e.g., selectors, properties, values) and can also handle errors gracefully, logging them without causing a crash.
Comot-CSS ensures full W3C compliance, adhering strictly to the CSS tokenization rules as defined in the specification. In addition, it has been designed to be fail-safe—even if the input is invalid or malformed, the tokenizer handles such cases without crashing. Instead, it will gracefully log the error and continue processing, providing robust handling of edge cases and malformed CSS.
The Comot-CSS tokenizer supports all the token types defined by the W3C CSS specification. Below is the complete list of supported token types:
TOKEN_IDENT: Identifiers, e.g.,body,#main.TOKEN_FUNCTION: Function names, e.g.,rgb(),url().TOKEN_AT_KEYWORD: At-rule keywords, e.g.,@media,@import.TOKEN_HASH: Hashes (used for IDs), e.g.,#element.TOKEN_STRING: String literals, e.g.,"Hello World",'Hello'.TOKEN_BAD_STRING: Malformed string literals, e.g.,"Hello World.TOKEN_URL: URL values, e.g.,url('image.png').TOKEN_BAD_URL: Malformed URL values, e.g.,url( image.png ).TOKEN_DELIM: Delimiters, e.g.,;,:,,,(),[].TOKEN_NUMBER: Numeric values, e.g.,1,1px,100%.TOKEN_PERCENTAGE: Percentage values, e.g.,50%,100%.TOKEN_DIMENSION: Dimension values (numbers with units), e.g.,10px,2em.TOKEN_WHITESPACE: Whitespace, e.g., spaces, tabs, newlines.TOKEN_COMMENT: CSS comments, e.g.,/* This is a comment */.TOKEN_COLON: Colon used in property definitions, e.g.,color: red;.TOKEN_SEMICOLON: Semicolon used in separating declarations, e.g.,color: red;.TOKEN_COMMA: Comma used in value lists, e.g.,red, blue, green.TOKEN_LEFT_PAREN: Left parenthesis(.TOKEN_RIGHT_PAREN: Right parenthesis).TOKEN_LEFT_SQUARE: Left square bracket[.TOKEN_RIGHT_SQUARE: Right square bracket].TOKEN_LEFT_CURLY: Left curly brace{.TOKEN_RIGHT_CURLY: Right curly brace}.TOKEN_CDO: Character sequence<!--(start of comment).TOKEN_CDC: Character sequence-->(end of comment).TOKEN_EOF: End of file, indicating the end of the input.TOKEN_ERROR: Error token, generated when the tokenizer encounters an invalid or unrecognized character sequence.
This project includes both unit tests and fuzz tests to ensure the correctness of the tokenizer.
To run the unit tests, simply execute the following script:
./run_unit_test.shThis script will automatically build the project (if necessary) and execute the unit tests.
To run the fuzz tests, execute the following script:
./run_fuzz_test.shThis script will automatically build the project (if necessary) and run the fuzz tests, ensuring that edge cases are handled correctly.
We welcome contributions to Comot-CSS. To get started, please fork the repository and create a pull request. Ensure your changes are covered by tests, and adhere to the following guidelines:
- Code Quality: Follow C coding conventions and style guidelines.
- Documentation: Update the README or comments when introducing new features or significant changes.
- Testing: Ensure that your code is covered by unit or fuzz tests.
This project is licensed under the MIT License - see the LICENSE file for details.
- arena-allocator: A fast and efficient memory allocator for C that is used to manage memory in this project. GitHub Repository
Feel free to open an issue or reach out if you have any questions or suggestions! dev-charles15531