How to install the best open source TTS (text to speech model) -- ChatTTS and fix tone solution

1. Introduction

The ChatTTS, released on May 30, 2024, is a text-to-speech model specifically designed for conversational scenarios, such as LLM assistant dialogue tasks. It supports both English and Chinese languages. The largest model has been trained on over 100,000 hours of Chinese and English data. The open-source version on HuggingFace, which is based on 40,000 hours of training and has not undergone SFT (the specific training method is not mentioned), is available. This article will introduce how to install ChatTTS in PIP and fix tone solution  (deployment on LattePanda single board computer).


ChatTTS boasts impressive voice quality, almost indistinguishable from human speech, and is suitable for video voiceovers and voice responses. The deployment process is relatively simple, and the latest update even supports installation directly via pip.


2. Install progress

The installation process is as follows:


2.1 First, set up a conda environment to isolate the relevant libraries.

conda create --name chattts -y

After executing the commands, the results should appear as follows:


conda environment set up

Figure: Conda environment set up


2.2 Activate the conda environment.

conda activate chattts

After executing the command, the result should be as follows:

conda activate chattts

Figure: Conda activate Chattts


2.3 Create a directory (optional).

mkdir chattts
cd chattt

This directory is primarily used to save the generated WAV files.


2.4 Install the chattts-fork library.


pip install chattts-fork

The execution result should be as follows:

 pip install result

Figure: Pip install result


The installation completion message for chattts-fork will be displayed.


2.5 Run

After installation, it’s ready to use.

A simple usage method is as follows with the command:

chattts hello,world

The first time you run it, it will require downloading relevant dependency files. The file is approximately 901MB, and a total of nearly 1GB of disk space will be needed. The execution result should be as follows:


Chattts cli generate done

Figure: Chattts cli generate done


Afterward, a tts.wav file will appear in the current directory, which can be played using a media player. You will then hear the corresponding audio content.


Congratulations, your first audio file has been successfully created.


3. Additional Information

How can we use it? Here’s a brief introduction.

For detailed usage instructions, you can refer to the help file of the chattts command.

chattts -h

After executing the command, the content should appear as follows:


Chattts cli help premeter

Figure: Chattts cli help premeter


The -h command is the help command, which allows you to understand the parameter settings for running the command line.


4. Fix tone solution

The -s option is the seed option. Since the voice is randomly generated, to ensure consistency in the voice, you can use the -s option to guarantee voice consistency.

chattts -s 111 'this is a test sentense voice.'
chattts -s 111 'the voice is same as before.'


Male voice seed

Figure: Male voice seed



Figure: Female voice seed


The `-o` option is for specifying the output filename. You can use this option to provide the desired filename for the output.

Chattts voice fix by seed

Figure: Chattts voice fix by seed


5. Precautions

When using pip install chattts-fork, there are certain network requirements as it needs to download nearly 1GB of content. You can choose a pip source that is geographically closer to you.


The project has just been released and is updating rapidly. The related operations may be updated at any time. Please keep an eye on the 2noise/chattts GitHub project.


The model accepts English commas and periods as punctuation marks. Other punctuation marks are considered illegal. You can use an apostrophe as a separator to enclose the text.

WARNING:ChatTTS.core:Invalid characters found! : {':', ':', '\n', '!'}

6. Test in LattePanda Sigma

We have already deployed it on the Lattepanda Sigma and tested its performance.

deploy Chattts in lattepanda sigma

Figure: Deploy Chattts in Lattepanda Sigma


Figure: Chattts generate wav result in Lattepanda Sigma


When running on the Lattepanda Sigma, you can see that a 22-second voice clip takes about 39 seconds to process, which is nearly a 1:2 processing efficiency. This is comparable to the efficiency on devices equipped with an RTX 4090 graphics card.


Recently, chattts has launched its own website at If you prefer not to set things up yourself, you can also experience it through the web interface.


7. Reference

1. ChatTTS GitHub - 2noise

2. ChatTTS GitHub - yihong0618

3. ChatTTS official website