Collection of custom nodes for ComfyUI.
Simply clone the repo into the custom_nodes
directory with this command:
git clone https://github.com/ForeignGods/ComfyUI-Mana-Nodes.git
and install the requirements using:
.\python_embed\python.exe -s -m pip install -r requirements.txt --user
If you are using a venv, make sure you have it activated before installation and use:
pip install -r requirements.txt
speech2text.mp4
- Font to Image Batch Animation
- Split Video to Frames and Audio
- Speech-to-Text Conversion
- SVG Loader/Animator
- Font to Image Alpha Channel
- keyframe model/lora switcher for animateDiff
- animation process of transition from pictures to videos
- add font support for other languages
Configure the font2img node by setting the following parameters in ComfyUI:
- font_file: fonts located in the custom_nodes\ComfyUI-Mana-Nodes\font\example_font.ttf directory (supports .ttf, .otf, .woff, .woff2).
- font_color: Color of the text. (https://www.w3.org/wiki/CSS3/Color/Extended_color_keywords)
- background_color: Background color of the image.
- border_color: Color of the border around the text.
- border_width: Width of the text border.
- shadow_color: Width of the text border.
- shadow_offset_x: Horizontal offset of the shadow.
- shadow_offset_y: Vertical offset of the shadow.
- line_spacing: Spacing between lines of text.
- kerning: Spacing between characters of font.
- padding: Padding between image border and font.
- frame_count: Number of frames (images) to generate.
- image_width: Width of the generated images.
- image_height: Height of the generated images.
- transcription_mode: Mode of text transcription ('word', 'line', 'fill').
- text_alignment: Alignment of the text in the image.
- text_interpolation_options: Mode of text interpolation ('strict', 'interpolation', 'cumulative').
- text: The text to render in the images. (is ignored when optional input transcription is given)
- animation_reset: Defines when the animation resets ('word', 'line', 'never').
- animation_easing: Easing function for animation (e.g., 'linear', 'exponential').
- animation_duration: Duration of the animation.
- start_font_size, end_font_size: Starting and ending size of the font.
- start_x_offset, end_x_offset, start_y_offset, end_y_offset: Offsets for text positioning.
- start_rotation, end_rotation: Rotation angles for the text.
- rotation_anchor_x, rotation_anchor_y: offset of the rotation anchor point, relative to the texts initial position.
- input_images: Text will be overlayed on input_images instead of background_color.
- transcription: Transcription from the speech2text node, contains dict with timestamps, framerate and transcribed words.
- images: The generated images with the specified text and configurations.
- transcription_framestamps: Outputs a string containing the framestamps, new line calculated based on image width. (Can be useful to manually correct mistakes by speech recognition)
- Example: Save this output with string2file -> correct mistakes -> remove transcription input from font2img -> paste corrected framestamps into text input field of font2img node.
- Specifies the text to be rendered on the images. Supports multiline text input for rendering on separate lines.
- For simple text: Input the text directly as a string.
- For frame-specific text (in modes like 'strict' or 'cumulative'): Use a JSON-like format where each line specifies a frame number and the corresponding text. Example:
"1": "Hello", "10": "World", "20": "End"
- Defines the mode of text interpolation between frames.
strict
: Text is only inserted at specified frames.interpolation
: Gradually interpolates text characters between frames.cumulative
: Text set for a frame persists until updated in a subsequent frame.
- Sets the starting and ending offsets for text positioning on the X and Y axes, allowing for text transition across the image.
- Input as integers. Example:
start_x_offset = 10
,end_x_offset = 50
moves the text from 10 pixels from the left to 50 pixels from the left across frames.
- Defines the starting and ending rotation angles for the text, enabling it to rotate between these angles.
- Input as integers in degrees. Example:
start_rotation = 0
,end_rotation = 180
rotates the text from 0 to 180 degrees across frames.
- Sets the starting and ending font sizes for the text, allowing the text size to dynamically change across frames.
- Input as integers representing the font size in points. Example:
start_font_size = 12
,end_font_size = 24
will gradually increase the text size from 12 to 24 points across the frames.
- Dictates when the animation effect resets to its starting conditions.
- word: Resets animation with each new word.
- line: Resets animation at the beginning of each new line of text.
- never: The animation does not reset, but continues throughout.
- Controls the pacing of the animation.
- Examples include linear, exponential, quadratic, cubic, elastic, bounce, back, ease_in_out_sine, ease_out_back, ease_in_out_expo.
- Each option provides a different acceleration curve for the animation, affecting how the text transitions and rotates.
- The length of time each animation takes to complete, measured in frames.
- A larger value means a slower, more gradual transition, while a smaller value results in a quicker animation.
- Determines how the transcribed text is applied across frames.
- word: Each word appears on its corresponding frame based on the transcription timestamps.
- line: Similar to word, but text is added line by line.
- fill: Continuously fills the frame with text, adding new words at their specific timestamps.
Extracts frames and audio from a video file.
- video: Path the video file.
- frame_limit: Maximum number of frames to extract from the video.
- frame_start: Starting frame number for extraction.
- filename_prefix: Prefix for naming the extracted audio file. (relative to .\ComfyUI-Mana-Nodes)
- frames: Extracted frames as image tensors.
- frame_count: Total number of frames extracted.
- audio: Path of the extracted audio file.
- fps: Frames per second of the video.
- height, width: Dimensions of the extracted frames.
Converts spoken words in an audio file to text using a deep learning model.
- audio: Audio file path or URL.
- wav2vec2_model: The Wav2Vec2 model used for speech recognition. (https://huggingface.co/models?search=wav2vec2)
- spell_check_language: Language for the spell checker.
- framestamps_max_chars: Maximum characters allowed until new framestamp lines created.
- fps: Frames per second, used for synchronizing with video. (Default set to 30)
- transcription: Text transcription of the audio. (Should only be used as font2img transcription input)
- raw_string: Raw string of the transcription without timestamps.
- framestamps_string: Frame-stamped transcription.
- timestamps_string: Transcription with timestamps.
- raw_string: Returns the transcribed text as one line.
THE GREATEST TRICK THE DEVIL EVER PULLED WAS CONVINCING THE WORLD HE DIDN'T EXIST
- framestamps_string: Depending on the framestamps_max_chars parameter the sentece will be cleared and starts to build up again until max_chars is reached again.
- In this example framestamps_max_chars is set to 25.
"27": "THE",
"31": "THE GREATEST",
"43": "THE GREATEST TRICK",
"73": "THE GREATEST TRICK THE",
"77": "DEVIL",
"88": "DEVIL EVER",
"94": "DEVIL EVER PULLED",
"127": "DEVIL EVER PULLED WAS",
"133": "CONVINCING",
"150": "CONVINCING THE",
"154": "CONVINCING THE WORLD",
"167": "CONVINCING THE WORLD HE",
"171": "DIDN'T",
"178": "DIDN'T EXIST",
timestamps_string: Returns all transcribed words, their start_time and end_time in json format as a string.
[
{
"word": "THE",
"start_time": 0.9,
"end_time": 0.98
},
{
"word": "GREATEST",
"start_time": 1.04,
"end_time": 1.36
},
{
"word": "TRICK",
"start_time": 1.44,
"end_time": 1.68
},
{
"word": "THE",
"start_time": 2.42,
"end_time": 2.5
},
{
"word": "DEVIL",
"start_time": 2.58,
"end_time": 2.82
},
{
"word": "EVER",
"start_time": 2.92,
"end_time": 3.04
},
{
"word": "PULLED",
"start_time": 3.14,
"end_time": 3.44
},
{
"word": "WAS",
"start_time": 4.22,
"end_time": 4.34
},
{
"word": "CONVINCING",
"start_time": 4.44,
"end_time": 4.92
},
{
"word": "THE",
"start_time": 5.0,
"end_time": 5.06
},
{
"word": "WORLD",
"start_time": 5.12,
"end_time": 5.42
},
{
"word": "HE",
"start_time": 5.58,
"end_time": 5.62
},
{
"word": "DIDN'T",
"start_time": 5.7,
"end_time": 5.88
},
{
"word": "EXIST",
"start_time": 5.94,
"end_time": 6.28
}
]
Writes a given string to a text file.
- string: The string to be written to the file.
- filename_prefix: Prefix for naming the text file. (relative to .\ComfyUI-Mana-Nodes)
Combines a sequence of images (frames) with an audio file to create a video.
- audio: Audio file path or URL.
- frames: Sequence of images to be used as video frames.
- filename_prefix: Prefix for naming the video file. (relative to .\ComfyUI-Mana-Nodes)
- fps: Frames per second for the video.
- video_file_path: Path to the created video file.
These workflows are included in the example_workflows directory:
- Personal Use: The included fonts are for personal, non-commercial use. Please refrain from using these fonts in any commercial project without obtaining the appropriate licenses.
- License Compliance: Each font may come with its own license agreement. It is the responsibility of the user to review and comply with these agreements. Some fonts may require a license for commercial use, modification, or distribution.
- Removing Fonts: If any font creator or copyright holder wishes their font to be removed from this repository, please contact us, and we will promptly comply with your request.
- https://www.dafont.com/akira-expanded.font
- https://www.dafont.com/aurora-pro.font
- https://www.dafont.com/another-danger.font
- https://www.dafont.com/doctor-glitch.font
- https://www.dafont.com/ghastly-panic.font
- https://www.dafont.com/metal-gothic.font
- https://www.dafont.com/the-constellation.font
- https://www.dafont.com/the-augusta.font
- https://www.dafont.com/vogue.font
- https://www.dafont.com/wreckside.font
Your contributions to improve Mana Nodes are welcome! If you have suggestions or enhancements, feel free to fork this repository, apply your changes, and create a pull request. For significant modifications or feature requests, please open an issue first to discuss what you'd like to change.