Skip to main content

What models does Air use for AI?

Learn about the industry-leading image and video AI models available in Air Canvas.

Written by Lauren Ford
Updated today

By default, Air automatically selects the best model for your specific request when prompting the Canvas AI agent. If you prefer more control over your generations, you can toggle off the Auto setting and select a specific model from our industry-leading options.

Uploaded image 0

Image models

Air offers a variety of image generation and editing models to suit your creative needs:

  • Nano Banana 2: Google's fast image generation and editing model. Generates vibrant, high-fidelity visuals at speed with strong text rendering, character consistency for up to 5 people, and natural language understanding.

  • Nano Banana Pro: Google's advanced image model. Produces production-quality visuals with industry-leading text rendering, character consistency for up to 5 people, and support for 1K–4K resolution.

  • Flux 2 Pro: Production-optimized text-to-image and image editing model from Black Forest Labs. Delivers studio-grade images with a zero-configuration approach, optimized for consistency and speed.

  • GPT Image 1.5: OpenAI's model that generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.

  • GPT Image: OpenAI's natively multimodal model that accepts both text and image inputs to produce high-quality image outputs.

  • Seedream 4.5: ByteDance's new-generation model that integrates image generation and editing into a single unified architecture, supporting stylized transformations and complex edits.

  • Seedream 4: ByteDance's previous generation unified image generation and editing model, featuring the same architecture as 4.5 with stylized and transform capabilities.

Video models

For dynamic motion and cinematic clips, you can choose from the following video generation models:

  • Seedance Pro 1.5: ByteDance's joint audio-video generation model. Creates 4–12 second videos at up to 1080p from text or image inputs, featuring synchronized dialogue, sound effects, ambient audio, and accurate lip-sync across multiple languages.

  • Seedance 2.0: ByteDance's latest video generation model with reference-to-video capability. Creates 4–15 second videos at up to 720p from text, image, or reference inputs with native audio generation and cinematic quality.

  • Kling 3 Pro: Kuaishou's top-tier video generation model. Creates 3–15 second videos from text or image inputs with cinematic visuals, fluid motion, native audio generation, and multi-shot support.

  • Sora 2 Pro: OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

  • Veo 3.1: Google DeepMind's advanced video generation model. Creates 1080p video with native audio including dialogue, sound effects, and ambient noise, with accurate lip-sync and realistic motion physics.

Next steps

Now that you know which models power Air's AI features, check out our guide on writing effective prompts to get the best possible results from your generations.

Did this answer your question?