VoiceGPT

ABOUT

VoiceGPT is an LAM (Large Audio Model) of networks and libraries that are capable of life-like voice generation through text using AI and deep learning made for Unity. 


This asset only works in the editor and runs offline on local hardware.


LINKS


Documentation | Forum | Website


Please check out the forum page for the latest developments and discussion related to this asset. We are researching and adding more functionality continuously. Your support is appreciated.


FEATURES


👥 Ultra Fast Voice Cloning: Clone any voice with just 3-6 seconds of the voice clip. Supported in both local and server-based models.


🗣 Text to Voice Converter: Simply enter the text to be voiced out and click on generate. Get voices with any voice of your choice plus 60 more options.


👅 Language and Accent Support: The VoiceGPT_0.1.6 model supports only English.


🔊 Voice Modulation controls: Offline version can controls emotional values, diffusion parameters, and matching closeness to the given voice. By manipulating these parameters, users can customize the generated speech to better suit their needs and preferences.


〰️ Preview waveform: Play sound clips right inside the editor without going into the play mode. Scrub the play head to play any part of the clip. Timestamps and simple graphic of the waveform is shown for better clarity inside the editor.


✂️ Trim audio: A user friendly GUI in the Editor to trim the ends of an audio clip if in case a part of the clip is not required or is empty.


Combine clips: Multiple audio clips can be combined into one using an intuitive user friendly feature in the editor. Simply select clips, rearrange their order with ease and merge them into one.


⚙️ Equalize tracks: Mastering audio clips involves equalization of clips which can easily be done within the editor itself. Simply select the clip, adjust gain, pitch and frequency band sliders. A 6 band equalization is offered in the editor.


📄 Editor Script: The Editor Script displays all the options neatly in one panel. The editor has an in-built preview audio player. Simple design for trimming, combining and equalizing or mastering audio tracks.


EDITOR

Keeping it all in the editor: Keeping all assets in one workspace inside the Editor and having to switch to fewer services can have several benefits, such as:


- Improved Efficiency: When all assets are located in one workspace, it becomes easier to access and manage them. Users do not have to spend time switching between different services or applications, which can be time-consuming and lead to a loss of productivity.


- Streamlined Workflow: Having all assets in one workspace can help create a more streamlined workflow. This is because users can easily move between different assets, such as code files, images, and documents, without having to navigate between different services. This can help to speed up the development process and make it more efficient.


- Reduced Complexity: Using fewer services can help to reduce the complexity of the development process.

In the pack, you will find a demo scene and an editor window which help you to access the TTS models. There are other useful audio settings like trimming, combining and mastering the audio track that can be accessed through the VoiceGPT Editor Window.


DEPENDENCIES

This tool requires the Editor Coroutines and Python Scripting v7.0.1+ from the package manager and an active internet connection. Please note that Python Scripting v7.0.1+ is deprecated for Unity 6, but can still be downloaded and used.


LIMITATIONS

Since this tool is still under development, there are a few limitations:


- Process up to 500 character at a single time. This limit will increase as we scale up.

- There are around 60+ voices to choose from. With Voice Cloning, you can add how ever many you'd like.

- Audio generation time is ~5 seconds per clip. This may increase with an increased number of tokens and user base.

Offline Version:

- Generations take ~10-20 seconds depending on the length of the audio clip and the parameters provided.

- Offline version is only trained on the English language.

- Process up to 750 characters at a single time.


VoiceGPT is now exclusively offline. Enjoy an unlimited and unbound experience of generating voices from text on your local handware.