VoiceGPT

Asset is now available in offline mode!


In the online mode - no sign-up, No API Keys, no recurring payments, no subscription fees, no additional costs, just one-click easy to use inferences on our voice model.


Also can be used to extend the character count of DeepVoice up to 560,000 char/mon


EXAMPLES


- VOICES


In the code of destiny, debug doubts and execute the program of unwavering determination.

▶️ PLAY!


Beyond the screen, discover the uncharted lands of perseverance and claim the trophy of resilience.

▶️ PLAY!


Press forward, no matter the level. The adventure of a lifetime awaits in the next frame.

▶️ PLAY!



- ACCENTS


Now, retired, I sit in my small dacha, sipping hot tea, memories of comrades and distant battles warming my heart. My babushka's borscht, a taste of home, brings comfort in the quiet days. Life was tough.

▶️ PLAY!


Ja! It's all so different now..... Used to bike through tulip fields, and now, dodging zombies! I hate this!

▶️ PLAY!


- NON WORD SOUNDS

Compilation of non-word sounds by different characters - Slow laugh, Ouch, Uh Uh Ah, Uh huh, Uff, Aha, Nuh uh, Mmm, Oh -

▶️ PLAY!


- LANGUAGES

千里之行,始于足下 ▶️ PLAY!

Die beste Zeit für einen Neuanfang ist jetzt. ▶️ PLAY!

सपने वो नहीं जो हम सोते वक्त देखते हैं, सपने वो हैं जो हमें सोने नहीं देते। ▶️ PLAY!

Liberté, égalité, fraternité. ▶️ PLAY!

삶이 있는 한 희망은 있다 ▶️ PLAY!

Onde há vontade, há um caminho. ▶️ PLAY!

La vita è breve, l'arte è lunga. ▶️ PLAY!

A los Tontos No les Dura el Dinero ▶️ PLAY!

Doe normaal, dan doe je al gek genoeg ▶️ PLAY!

Az élet szép ▶️ PLAY!

Güzel şeylere inan ▶️ PLAY!

Fortuna kołem się toczy. ▶️ PLAY!

I"العقل زينة. ▶️ PLAY!

Co tě nezabije, to tě posílí. ▶️ PLAY!

Береги платье снову, а честь смолоду. ▶️ PLAY!


Note: All languages are available in all the 60+ voices.

You can find more examples listed in the documentation


ABOUT

VoiceGPT is an LAM (Large Audio Model) of networks and libraries that are capable of life-like voice generation through text using AI and deep learning made for Unity. Works in realtime, both in, Edit Mode or Play Mode inside of the Unity Editor or any mobile device. This asset has a one-click, beginner friendly GUI and does not require any coding to use.


QUOTA

500,000 characters per month of voice over and narration takes with VoiceGPT. 500,000 characters translates to 150 pages of 12-point text in Calibri. This quota is issued on the 1st of every month. Process up to 8x characters more.


LINKS


Documentation | Forum | Website


Please note: The voices you hear in this description and the videos (Trailer and Getting Started) are AI generated.


Please check out the forum page for the latest developments and discussion related to this asset. We are researching and adding more functionality continuously. Your support is appreciated.


FEATURES


👥 Ultra Fast Voice Cloning: Clone any voice with just 3-6 seconds of the voice clip. Supported in both local and server-based models.


🗣 Text to Voice Converter: Simply enter the text to be voiced out and click on generate. Get game ready voices with any voice of your choice plus 60 more options.


👅 Language and Accent Support: The VoiceGPT_X model supports different languages such as English, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Dutch, Hungarian Turkish, Polish, Arabic, Czech, and Russian. The offline version for now supports only English.


🔊 Voice Modulation controls: Offline version can controls emotional values, diffusion parameters, and matching closeness to the given voice. By manipulating these parameters, users can customize the generated speech to better suit their needs and preferences.


〰️ Preview waveform: Play sound clips right inside the editor without going into the play mode. Scrub the play head to play any part of the clip. Timestamps and simple graphic of the waveform is shown for better clarity inside the editor.


✂️ Trim audio: A user friendly GUI in the Editor to trim the ends of an audio clip if in case a part of the clip is not required or is empty.


Combine clips: Multiple audio clips can be combined into one using an intuitive user friendly feature in the editor. Simply select clips, rearrange their order with ease and merge them into one.


⚙️ Equalize tracks: Mastering audio clips involves equalization of clips which can easily be done within the editor itself. Simply select the clip, adjust gain, pitch and frequency band sliders. A 6 band equalization is offered in the editor.


📄 Editor Script: The Editor Script displays all the options neatly in one panel. The editor has an in-built preview audio player. Simple design for trimming, combining and equalizing or mastering audio tracks.


EDITOR

Keeping it all in the editor: Keeping all assets in one workspace inside the Editor and having to switch to fewer services can have several benefits, such as:


- Improved Efficiency: When all assets are located in one workspace, it becomes easier to access and manage them. Users do not have to spend time switching between different services or applications, which can be time-consuming and lead to a loss of productivity.


- Streamlined Workflow: Having all assets in one workspace can help create a more streamlined workflow. This is because users can easily move between different assets, such as code files, images, and documents, without having to navigate between different services. This can help to speed up the development process and make it more efficient.


- Reduced Complexity: Using fewer services can help to reduce the complexity of the development process.

In the pack, you will find a demo scene and an editor window which help you to access the TTS models. There are other useful audio settings like trimming, combining and mastering the audio track that can be accessed through the VoiceGPT Editor Window.


DEPENDENCIES

This tool requires the Editor Coroutines and Python Scripting v7.0.1+ from the package manager and an active internet connection.


LIMITATIONS

Since this tool is still under development, there are a few limitations:


- Process up to 500 character at a single time. This limit will increase as we scale up.

- There are around 60+ voices to choose from. With Voice Cloning, you can add how ever many you'd like.

- Audio generation time is ~5 seconds per clip. This may increase with an increased number of tokens and user base.

Offline Version:

- Generations take ~10-20 seconds depending on the length of the audio clip and the parameters provided. 

- Offline version is only trained on the English language.  

- Process up to 500 characters at a single time.



Goes best with:

Now We're Talking! by Chatterwave - A real-time auto multilingual mouth animation asset on the Asset Store.