We’re excited to launch the Fabric 1.0 API, the world’s first talking video model.
Image-to-video AI has been advancing quickly, but Fabric 1.0 brings something new: faster generation, lower compute costs, and support for longer clips (up to 1-minute long). In fact, our model is:
- 60x cheaper than the average model
- 7x faster
Instead of being limited to stock avatars, Fabric can animate any image and match it with speech, opening up far more creative possibilities. And now, it's available for developers and teams who want to generate AI videos at scale.
What is Fabric 1.0?
Fabric 1.0 is an image + audio to video model. You provide a picture and a voice recording, and the model produces a talking video. The audio drives not only lip movement but also head, body, and hand gestures, making the output natural and expressive.
- Make any static character talk: Upload a photo of your product, mascot, character or spokesperson and watch it speak. This opens up endless possibilities for branded content and unique personalities.
- Built for enterprise: Whether you’re launching a campaign, recording tutorials, or iterating on social ads, Fabric produces ready‑to‑share videos faster than human‑led production and without the need to film yourself.
- Infinite visual styles: Instead of being stuck with a few preset avatars, you can animate any illustration, clay figure or anime‑style drawing. The model interprets the artwork, matches lip movements to the audio and preserves the original look and feel.
Why Fabric is different
Many “talking head” generators only let you pick from a few built-in avatars. Fabric is designed to work with any visual input: photos, sketches, mascots, and even stylized artwork. It preserves the look and feel of the original image while bringing it to life. And that’s not all, the model is:
- Affordable: Fabric is 60x cheaper than comparable tools starting at only 8¢ per second.
- Faster: generates videos 7x faster than typical alternatives.
- High quality: accurate lip sync and expressive movements for professional-grade storytelling.
Technical specs
Here are some more finer details of our model:
- Inputs: Image (jpg/jpeg/png) + audio (mp3/wav/m4a/aac) under 10 MB
- Max length: 1 minute
- FPS: 25
- Aspect ratios: Standard formats including 16:9, 1:1, 9:16, and more
- Resolution: 480p and 720p (scaled appropriately for non-16:9 formats)
- Generation speed: ~1.5 minutes per 10s at 480p; ~5 minutes per 10s at 720p
The engine is powered by a Diffusion Transformer (DiT) trained on diverse datasets of talking people, which allows Fabric to deliver accurate lip sync and expressive motion across many different types of characters.
How the model works
- Upload an image: Fabric accepts photos, illustrations, 3D renders, and more.
- Add audio or text: Upload a recording or type a script. If you type, VEED’s AI voice generator will narrate it.
- Generate your video: Fabric syncs the voice to your image and animates it with natural movements.
- Share anywhere: Export in vertical, square, or landscape formats for TikTok, Instagram, YouTube, ads, or presentations.
Pricing
Fabric uses VEED’s existing credit system. Costs are based on resolution and video duration:
Get Started
Fabric 1.0 API is available on fal.ai.
This release is just the beginning. We are already working on longer video support, higher resolutions, and even richer animation. It's game-changing for creators, marketers, and business owners!
.png)