How to install the best open source TTS (text to speech model) -- ChatTTS and fix tone solution

by Rockets

1. Introduction

The ChatTTS, released on May 30, 2024, is a text-to-speech model specifically designed for conversational scenarios, such as LLM assistant dialogue tasks. It supports both English and Chinese languages. The largest model has been trained on over 100,000 hours of Chinese and English data. The open-source version on HuggingFace, which is based on 40,000 hours of training and has not undergone SFT (the specific training method is not mentioned), is available. This article will introduce how to install ChatTTS in PIP and fix tone solution (deployment on LattePanda single board computer).

ChatTTS boasts impressive voice quality, almost indistinguishable from human speech, and is suitable for video voiceovers and voice responses. The deployment process is relatively simple, and the latest update even supports installation directly via pip.

2. Install progress

The installation process is as follows:

2.1 First, set up a conda environment to isolate the relevant libraries.

CODE

conda create --name chattts -y

After executing the commands, the results should appear as follows:

Figure: Conda environment set up

2.2 Activate the conda environment.

CODE

conda activate chattts

After executing the command, the result should be as follows:

Figure: Conda activate Chattts

2.3 Create a directory (optional).

CODE

mkdir chattts
cd chattt

This directory is primarily used to save the generated WAV files.

2.4 Install the chattts-fork library.

CODE

pip install chattts-fork

The execution result should be as follows:

Figure: Pip install result

The installation completion message for chattts-fork will be displayed.

2.5 Run

After installation, it’s ready to use.

A simple usage method is as follows with the command:

CODE

chattts hello,world

The first time you run it, it will require downloading relevant dependency files. The gpt.ps file is approximately 901MB, and a total of nearly 1GB of disk space will be needed. The execution result should be as follows:

Figure: Chattts cli generate done

Afterward, a tts.wav file will appear in the current directory, which can be played using a media player. You will then hear the corresponding audio content.

Congratulations, your first audio file has been successfully created.

3. Additional Information

How can we use it? Here’s a brief introduction.

For detailed usage instructions, you can refer to the help file of the chattts command.

CODE

chattts -h

After executing the command, the content should appear as follows:

Figure: Chattts cli help premeter

The -h command is the help command, which allows you to understand the parameter settings for running the command line.

4. Fix tone solution

The -s option is the seed option. Since the voice is randomly generated, to ensure consistency in the voice, you can use the -s option to guarantee voice consistency.

CODE

chattts -s 111 'this is a test sentense voice.'
chattts -s 111 'the voice is same as before.'

Male

Figure: Male voice seed

Female

Figure: Female voice seed

The `-o` option is for specifying the output filename. You can use this option to provide the desired filename for the output.

Figure: Chattts voice fix by seed

5. Precautions

When using pip install chattts-fork, there are certain network requirements as it needs to download nearly 1GB of content. You can choose a pip source that is geographically closer to you.

The project has just been released and is updating rapidly. The related operations may be updated at any time. Please keep an eye on the 2noise/chattts GitHub project.

The model accepts English commas and periods as punctuation marks. Other punctuation marks are considered illegal. You can use an apostrophe as a separator to enclose the text.

CODE

WARNING:ChatTTS.core:Invalid characters found! : {':', '：', '\n', '!'}

6. Test in LattePanda Sigma

We have already deployed it on the Lattepanda Sigma and tested its performance.

Figure: Deploy Chattts in Lattepanda Sigma

Figure: Chattts generate wav result in Lattepanda Sigma

When running on the Lattepanda Sigma, you can see that a 22-second voice clip takes about 39 seconds to process, which is nearly a 1:2 processing efficiency. This is comparable to the efficiency on devices equipped with an RTX 4090 graphics card.

Recently, chattts has launched its own website at https://chattts.com/. If you prefer not to set things up yourself, you can also experience it through the web interface.

7. Reference

1. ChatTTS GitHub - 2noise

2. ChatTTS GitHub - yihong0618

3. ChatTTS official website