Kling O1 is Kuaishou's unified multimodal video model, released December 1, 2025. It's the world’s first AI model combining text prompts, reference images, and video inputs to edit existing footage. Swap, add, or remove elements without frame-by-frame VFX work. The model is built on a Multimodal Visual Language (MVL) framework that interprets all inputs together for consistent edits across frames.

How do I get Kling O1 access?

You can access Kling O1 on VEED’s [AI playground](https://www.veed.io/ai-playground). With a subscription and AI credits, upload videos, describe edits, and refine in VEED's built-in editor. For full capabilities, including text-to-video and start/end frame control, access Kling O1 directly at klingai.com.

What can I create with Kling O1?

Kling O1 supports various video editing tasks through multimodal prompts: - Swap outfits or characters while preserving original motion - Add products into footage for marketing and e-commerce content - Build scenes by combining multiple reference images with consistent elements - Remove distracting objects, passersby, or unwanted elements from shots

What are Kling O1's technical specifications?

Kling O1 specs on VEED's AI playground: - Input video length: 3-10 seconds - Minimum resolution: 720p - Output length: 10 seconds - Reference images: Up to 4 images - Editing modes: Swap items, Add items, Remove items

How does Kling O1 pricing work?

On VEED, Kling O1 currently costs 60 credits per second—you'll see the exact credit cost when you select the model. A VEED subscription with AI credits gives you access to Kling O1 alongside other AI video models, plus more AI tools for post-generation work. On Kuaishou's platform (klingai.com), Video O1 is available on paid plans. Check their website for current pricing and offers.

How does Kling O1 compare to other AI video models?

The key differentiator: Kling O1 is the first model to unify text, images, and video inputs for editing in a single workflow. In Kuaishou's internal benchmarks, Kling O1 outperformed Veo 3.1 on image reference tasks and Runway Aleph on video transformation tasks. These are Kuaishou's self-reported results, so we recommend testing for yourself. Other models like Veo 3.1 and [Seedance](https://www.veed.io/ai-models/video/seedance-1.0) offer multimodal and editing capabilities, but typically through separate modes rather than one unified interface.

Kling O1

The world’s first unified multimodal video model

OpenAI Sora 2 demo

Start with this prompt

VEED Fabric demo

Start with this prompt

Google Veo 3.1 demo

Start with this prompt

Kling 2.5 Turbo demo

Start with this prompt

4.6

2,100+ reviews

Kling O1: Combine text, images, and video inputs to edit your footage

Kling O1 is the first AI model that edits existing video using text, images, and video inputs. No more reshoots because the client wants a different product color. No more scrapping a great take because of a distracting background object. And no more shooting three times for three outfit variations.

With Kling O1, you simply upload your footage, describe the change, add a reference image—and get an edited video back. Swap a character's outfit. Add a product to a scene. Remove background distractions. VEED is one of the first platforms to offer Kling O1, available now in our AI playground alongside models like Veo 3.1 and Sora 2. Try Kling O1 now.

How to use Kling O1:

Step 1

Upload your video and select an editing mode

Select Kling O1 from the video-to-video options. Upload your source video (3-10 seconds, minimum 720p). Choose your editing mode: Swap items, Add items, or Remove items.

Step 2

Describe your edit with prompts and images

Write a text prompt describing what you want to change: "swap the blue jacket for a red hoodie" or "add a coffee cup to the table." Upload up to 4 reference images to guide the edit, using @Image tags to reference them in your prompt.

Step 3

Generate and export your edited video

Kling O1 processes your inputs and generates a 10-second edited video. Download it directly or bring it into our video editor for additional refinements like trimming, adding subtitles, or combining with other clips.

Learn More

Watch this video to learn more about Kling O1:

Kling O1 features

Multimodal prompt inputs

Kling O1 is powered by Kuaishou's Multimodal Visual Language (MVL) framework, a unified Transformer architecture that merges text semantics with visual signals. Combine existing video, reference images, and text descriptions in a single generation. The model interprets them together, understanding how edits should behave across frames. Reference a video for its camera movement, and apply that motion to a different scene.

Fix your footage without reshooting

You have footage that's almost right—wrong outfit, distracting background object, missing prop, outdated packaging. Kling O1 lets you fix it. Swap in the correct product. Remove the passersby. Add the prop that should've been there. The model tracks elements across frames with what Kuaishou calls "director-like memory," retaining identity of characters, props, and settings through dynamic camera movements.

Build scenes with multiple reference images

Go beyond fixing. Construct entire scenes by combining separate reference images. Upload a character, an environment, and props as individual inputs, then prompt Kling O1 to bring them together. The model maintains consistency across each element, letting you build compositions that would otherwise require coordinated shoots or complex compositing.

Try Kling O1

FAQ

Discover more

Explore related tools

Loved by creators.

Loved by the Fortune 500

“

VEED has been game-changing. It's allowed us to create gorgeous content for social promotion and ad units with ease.

Max Alter
Director of Audience Development, NBCUniversal

“

I love using VEED. The subtitles are the most accurate I've seen on the market. It's helped take my content to the next level.

Laura Haleydt
Brand Marketing Manager, Carlsberg Importers

“

I used Loom to record, Rev for captions, Google for storing and Youtube to get a share link. I can now do this all in one spot with VEED.

Cedric Gustavo Ravache
Enterprise Account Executive, Cloud Software Group

“

VEED is my one-stop video editing shop! It's cut my editing time by around 60%, freeing me to focus on my online career coaching business.

Nadeem L
Entrepreneur and Owner, TheCareerCEO.com

Kling O1: Everything You Need to Know About the New AI Video Model

Learn how Kling O1's multi-modal reasoning delivers precise AI videos with cinematic camera control. Now available on VEED's AI Playground!

Introducing Your AI Video Playground

If you’ve spent any time on LinkedIn or your favourite creator forums lately, you’ve probably seen the buzz: Minimax, PixVerse, Google Veo, and a growing list of next-gen generative AI models that are redefining how video gets made. But with new models launching every week, it’s hard to know what’s worth your time—or how they actually fit into your workflow.

The Video Prompts Formula That Changes Everything: Stop Getting Terrible AI Videos

Writing a simple video prompt to generate a full video with AI has opened up creative possibilities for content creators. ‍But getting the results you want can feel like solving a puzzle. As researcher Hachik Yazadzhiyan points out, creators don’t just want speed. They want AI that helps them tell better stories.‍

When it comes to amazing videos, all you need is VEED

Try Kling O1

No credit card required

Beyond Kling O1

Test Kling O1's multimodal video editing and try other video-to-video models for various projects. VEED is an AI video creation platform that lets you generate and fine-tune videos all in one place. Combine clips, generate captions, add voiceovers, and export for any platform. Sign up now and start creating with VEED.

Product

What's New

Recorder

Video Editor

Captions & Translations

Publish

Create

Edit

Publish

Use Cases

By company size

Marketing

Training

Sales

Communication

Marketing

Training

Sales

Communication

AI

Text To Video

Voice & Dubbing

AI Editing

AI Video Models

Image to Video Models

Avatars & AI Voices

AI Editing

AI Generation

AI Video APIs

Learn

Inspiration

Kling O1

The world’s first unified multimodal video model

Kling O1: Combine text, images, and video inputs to edit your footage

Upload your video and select an editing mode

Describe your edit with prompts and images

Generate and export your edited video

Loved by creators.

Kling O1: Everything You Need to Know About the New AI Video Model

Introducing Your AI Video Playground

The Video Prompts Formula That Changes Everything: Stop Getting Terrible AI Videos