๐ AllTalk v1 Minor updates/bug fixes list
erew123 opened this issue ยท 1 comments
๐ฉ AllTalk minor updates/bug fixes/new features (This is for version 1.x of AllTalk)
If you have an issue, I will be keeping a list here of any minor updates that I make to fix those issues. If any issue you are experiencing is in the list please follow the updating instructions
Help with known problems can be found here
๐ช Changelog
15th August 2024
- AllTalk - Requirements - Tweaked the requirements file for text-gen-webui to deal with the PIP changes in v24.x of pip.
19th June 2024
- AllTalk - docker - Merged in updates for the docker build files #249
14th June 2024
- AllTalk - atseup.sh - Forced PyTorch version back to being 2.2.1 on standalone install for v1,9 of AllTalk.
14th May 2024
- AllTalk - Finetuning - Updated file path handling to deal with Gradio 4.xx versions.
1st May 2024
- AllTalk - Narrator - Low VRAM & Narrator, when used together should now be faster. Changed the way the AI model is moved about in this circumstance, which should reduce generation time where Low VRAM is used, shaving X seconds off the standard length generation.
28th April 2024
- AllTalk - TTS Generator - Amended text splitting to catch outlier scenarios.
- AllTalk - Streaming - Added an endpoint for stopping Streaming mid generation (in testing currently).
26th April 2024
- AllTalk - Finetuning - Custom folders for saving models. Learning rate selectable. Updated documentation.
8th April 2024
- AllTalk - atsetup.sh - Linux specific. Corrected PyTorch download URL, due to a missing digit on the path.
5th April 2024
- AllTalk - TTS Generator - Export WAV, updated crunker to a new version as it fixes export performance across a network.
- AllTalk - atsetup.bat - Altered the way Windows 10 is handled as the old way appeared to have an issue on Windows 10.
- AllTalk - Compacted some images down just to improve performance.
1st April 2024
- AllTalk - Sorry I fluffed up DeepSpeed on the ATsetup (not joking though I know this is an April 1st update). Please git pull, or follow the instructions directly below. The updated ATsetup should clear the fault now! Sorry sorry.
- AllTalk - Vastly improved the start-up screen. Added additional useful information/checks. Removed large DeepSpeed text output. Added a Github last updated check, so people know when AllTalk had its last change made. This is detailed in the help section under Understanding the AllTalk start-up screen.
29th March 2024
-
AllTalk - DeepSpeed setup issue with ATSetup.bat has been corrected.
- In your
alltalk_tts
folder, perform agit pull
- Run
atsetup.bat
. - Select option 2
AllTalk as a Standalone Application
- Select option 3
Re-Apply/Update the requirements file
.
- In your
If you cannot git pull
(AllTalk installed from a ZIP file) you can download the updated atsetup.bat
from here saving it over the top of the one in the alltalk_tts
folder, then follow from step 2 above.
28th March 2024
- AllTalk & Finetuning - Streamlined the interface. Added additional Pre-flight checks and terminal/console warnings. Improved the documentation throughout. Removed the need to install the Nvidia CUDA Toolkit v11.8, though it will still be required for compiling DeepSpeed on Linux systems.
- AllTalk - TTS Generator - Squashed various interface bugs. The ID list buttons are now dynamic and enable/disable as necessary. Resolved a few outlier issues with problems editing the text of generated TTS when trying to regenerated.
- AllTalk - TTS Generator - Added 2x new features. TTSDiff will compare the original text to the generated TTS and let you know which generated TTS's are bad (on a best effort basis). TTSSRT can be used to generate subtitle files, for things such as video audiobooks.
- AllTalk - Moved up Python requirements. Simplified the requirements installation files. Updated ATsetup. Updated Diagnostics. Re-wrote all documentation and applied to both Github & built in documentation. Cleaned up the whole file structure. Tested across Windows & Linux. All in about 12-14 days work.
25th March 2024
- AllTalk & Finetuning - Further cleaned up instructions. Cleaned up Pre-flight check. Added a refresh to the dropdowns on step 3.
22nd March 2024
- AllTalk - Documentation - Provided a large update to the Github documented instructions. Built in documentation to be done at a later date.
17th March 2024
- AllTalk - TTS Generator - Stopped currently playing ID number from resetting to 1 on pressing Stop.
16th March 2024
- AllTalk & Finetuning - Added a custom tokenizer for Japanese.. Also nice big warning messages if you don't pass the Pre-flight check.
- AllTalk & DeepSpeed - Built the DeepSpeed v14 wheel for CUDA 11.8 with PyTorch 2.2.1. Option added to windows atsetup.bat.
16th March 2024
- AllTalk & Finetuning - Finetuning now has a pre-flight check system with additional help documentation to ensure your system is configured correctly for Finetuning. The interface was given a little overhaul. There is more yet to add/change, but figured this is a good start. TTS is now version 0.22.0 across the board of all of AllTalks apps.
- AllTalk - Diagnostics were updated to make things a little cleaner to look though and understand.
11th March 2024
- AllTalk & Finetuning - Reduced thread count when working with Japanese language training (limitation of external training scripts). Improved some documentation. Moved Whisper models to 32bit floats, allowing older non RTX cards to work (no noticeable impact on speed).
- AllTalk & Text-gen-webui - Specifically when used with the Stable Diffusion Plugin. AllTalk will now strip any images before TTS generation, generate the TTS, then re-insert the image back into the chat when handing the audio and text string back to Text-generation-webui. It should also be noted that the Stable Diffusion Plugin will remove text from the generation, so you need to consider load order of plugins link to details here.
7th March 2024
- AllTalk Kobold Streaming support - Thanks to @LostRuins and @illtellyoulater PR here
21st Feb 2024
- AllTalk Currently only for Text-generation-webui - Attempted to allow Chinese character set to pass through (not sure if it will or wont work with the Narrator).
5th Feb 2024
- AllTalk extended API text length to 2000 characters.
24th Jan 2024
- AllTalk A new version of Transformers has been released 4.37.1 which now resolves the prior loading issue.
- AllTalk Documentation/Github Added another link to some sample new voices (as yet unsure of the quality).
22nd Jan 2024
- AllTalk A new version of Transformers has been released 4.37 https://github.com/huggingface/transformers/releases which causes a load/import problem
ImportError: cannot import name 'SampleOutput' from 'transformers.generation.utils'
at this time, I'm unsure if this is a bug in their code as I cannot find any breaking changes currently. I have forcedpip install transformers==4.36.2
in the requirements files.
21st Jan 2024
- AllTalk Simplified changing the start-up duration allowed for people with older machines. At the top of
script.py
is nowstartup_wait_time = 120
. Help documentation updated accordingly.
19th Jan 2024
- AllTalk - TTS Generator removed the hard coding to the IP of the AllTalk server so that it will dynamically point to the correct location on generation requests.
18th Jan 2024
- AllTalk - Text-gen-webui temperature and repetition sliders now within the main interface.
- AllTalk - SillyTavern post a couple of changes to AllTalks ST extension, ST have approved the extension into the staging area.
15th Jan 2024
- AllTalk - TTS Generator Added a warning message if people try to run it from its disk location vs its URL address.
- AllTalk - TTS Generator Push mimetype of application/javascript (for exporting on TTS generator).
13th Jan 2024
- AllTalk - Version update Added additional API endpoints for streaming generation and server status. Added
atsetup
utility for Windows & Linux systems to streamline installation & maintenance both with Text-gen-webui and Standalone installation routines. Added SillyTavern support (yet to send PR to SillyTavern). Documentation created along with installation videos for theatsetup
utility. Text-gen-webui interface cleaned up to look a bit nicer. Cleaned up a few console outputs. Added cutlet and unidic-lite to assist generation on non-Japanese enabled computers with Japanese TTS. - AllTalk - TTS API Corrected missing
_combined.wav
on non-timestamped narrator generations. - AllTalk - TTS Generator/API Timestamps Corrected applying a short UUID to the timestamp to avoid dual filename generation occurring within the same second for short sentences, resulting in one of the files being overwritten by the latter.
8th Jan 2024
- AllTalk - TTS Generator Added export batch file splitting to avoid the 1GB limit within browser combining of wav's. Smaller batches can also reduce memory overhead, so good for systems with less memory. Ran a test to generate TTS that was 57911 words long.
7th Jan 2024
- AllTalk - TTS Generator Added page pagination for large TTS generations to reduce browser memory use. Cleared some audio browser cache issues to further reduce memory use. Added a "No Playback" option which will be good for very large generation 20,000+ word type scenario, to keep memory down (as yet untested at that character count).
6th Jan 2024
- AllTalk - TTS Generator Added the TTS Generator, which is designed for creating TTS of any length from as larger amount of text as you want. You are able to individually edit/regenerate sections after all the TTS is produced, export out to 1x wav file. You can also stream TTS if you just want to play back text or even push audio output to wherever AllTalk is currently running from at the terminal/command prompt.
5th Jan 2024
- AllTalk - Updated filtering to allow Hungarian ล and ลฑ characters to pass through correctly.
4th Jan 2024
- AllTalk - Updated filtering to allow Cyrillic characters to pass through correctly.
3rd Jan 2024
- AllTalk - Thanks to @nicobubulle. Add greedy option to avoid apostrophe being removed. Add accentuated character for foreign language.
2nd Jan 2024
- AllTalk - Ms Word Add-in Added a proof-of-concept MS Word add-in to stream selected text to speech from within documents. This is purely POC and not production, hence support on this will be limited.
1st Jan 2024
-
AllTalk - Narrator Given another rebuild and a final upgrade. This version passed all of my tests. AI Systems will still not follow the rules all the time, see here for details, but it should give a good level of control.
-
AllTalk - Updated Narrator and filtering also set on the API now.
-
AllTalk - Streaming audio & Separation/tidy of built in documentation. A big thanks to @rbruels who has managed to get streaming working through the demo page within the built in documentation. This now allows for a lot of other opportunities in future. The built in documentation has also been split out of the main code base, allowing for much easier editing & management of both the code and the documentation.
30th Dec 2023
- Finetuning - Simplified Added an additional routine to give 2x possible locations to send your compacted model to. Built in the routine to compact any legacy models. Tidied up the interface a bit. Cleaned up the built in documentation & added some additional documentation on Github. Corrected the gitignore file not ignoring the finetune folder.
29th Dec 2023
-
AllTalk - Additional API Endpoints and Playback 3x additional API endpoints, providing a Ready status, list of available voices and a preview voice option. The API now also supports playing the generated TTS at the terminal/command prompt where the script is running from, through that machines local audio device. Full details in the API section.
-
AllTalk - 4th Model Loader for Finetuned models As the finetuning process now moves models to
/models/trainedmodel/
as long as a model is detected in this location when AllTalk starts up a 4th model loader will become available in the Gradio interface so that you can directly load the model.
28th Dec 2023
- AllTalk & Finetuning - Larger update on Finetuning to simplify last steps as well as compact down the model. Added a separate compaction script for people "stuck" with large models here. Improved the Narrator text splitting function, though still hunting out some outlier situations (it varies LLM model to model, so harder to track them down).
27th Dec 2023
- AllTalk - Standalone API Fix possible lost TTS segments. Same as the fix on Dec 25th, but for the standalone mode API. This will have no bearing for anyone who is just using AllTalk normally and not in standalone mode.
25th Dec 2023
- AllTalk - Fix possible lost TTS segments. Applied a small update to avoid a possible race condition on file naming with small sentences when generating narrator/character speech. This would fix small sentences sometimes being lost.
24th Dec 2023
- Finetuning - MP3 & Flac issue. Corrected finetuning not correctly picking up MP3 and Flac file names.
- Finetuning - speakers_xtts.pth file missing - Issue is with TTS 0.22.0. Have set a downgrade in the
requirements_finetune.txt
file to TTS 0.21.3 while I get an answer/solution from Coqui. Re-run thepip install -r requirements_finetune.txt
if you get this issue.