DeepLips: Lip Sync AI

Generative AI
AiKodex
(0)
OFFICIAL SALE
View on the Asset Store

Get Discount

50% OFF

~~$40.00~~ $20.00

Does your character not have jaw bones, blendshapes, or visemes?

DeepLips can still make it talk.

DeepLips is an AI lip sync tool that works with almost any character, even ones that were never built for facial animation.

Just load your character + audio clip and click Generate. DeepLips creates the lip sync automatically.

No facial rigs.

No phoneme setup.

No hand animating mouths.

DeepLips is a texture based lip sync AI that works by playing a lip sync "video" on the face.

⏱ Skip Hours of Setup

Traditional lip sync workflows require:

Creating phoneme blendshapes
Mapping visemes
Animating mouth poses
Adjusting timing

... DeepLips skips all of that.

You simply:

Load your character
Select the face area
Add an audio clip
Click Generate

That’s it.

🎭 Works With Almost Any Character

You can use it with:

Stylized characters
Realistic characters
NPC crowds
Characters without facial rigs
Characters from the Mixamo, Fuse, CC3/4
Custom avatars

Basically, if the character has a texture, DeepLips can drive the speech animation.

FEATURES

🖼 Frame Speech Animation: DeepLips creates a frame sequence that can be played back in sync with audio. This gives you a simple, visual result that is easy to preview, easy to tweak, and easy to integrate into your scene.

🎨 LipMask Shader Support: Includes support for a LipMask shader workflow with controls for X Offset, Y Offset, X Size, and Y Size. This makes it easy to position and fit the mouth area onto your character.

🦴 No Facial Rig Required: Most lip sync systems require jaw bones, viseme rigs, or blendshapes. DeepLips does not. If your character has a face texture and a mesh, you can make it speak.

🧠 AI Runs Locally: Lip sync frames are generated locally inside Unity. No cloud processing required for the core workflow.

⚡ Very Easy Runtime Setup: After generating frames, just attach the LipSyncFrameSequence component, audio and frame data generated by DeepLips.

🎯 More Natural Than Basic Viseme Systems: Many viseme systems rely on simple mouth shape switching, which can look robotic, inaccurate or slightly off. DeepLips generates mouth animation directly from the audio and plays a lip sync video on the face.

API:

Is as easy as:

----------------------------------

Play(Audio Clip, "Frames Folder");

----------------------------------

Dependencies:

Requires Inference Engine from Unity 6+

Limitations

- DeepLips uses image based animation rather than mesh deformation. Because the mesh itself does not move, the face can sometimes feel slightly stiffer than full facial rigs.

- The illusion may break under harsh lighting as the mesh does not deform.

- With long sequences, storing frames can take up storage space.

- Real-time generation and playback: With fast GPUs, generating the entire frame sequence can take a few seconds. With CPU, it may take a minute or two to generate the frames for lip sync. The playback is smooth and in real-time. Pre-generate and load the frames in cache for an instant playback.

- May break or generate incorrectly if the face is too far from a human face. The AI model works with cartoon faces, but if it deviates too much, the generation may be incorrect.

These tradeoff allows DeepLips to work on characters that were never built for facial animation and reduces setup time.