Back to Renderful
API Guide

Text to Video API: Generate AI Videos from Text Prompts

·6 min read
Sora 2 playground on Renderful — text to video API

Text-to-video APIs let developers generate AI videos from written prompts using a simple REST call. Instead of hiring videographers or licensing stock footage, you can produce custom video content programmatically — perfect for marketing automation, social media tools, and creative applications.

What is a Text to Video API?

A text to video API is a cloud service that takes a natural language description and returns a generated video clip. You send an HTTP request with a prompt like “A drone shot flying over a tropical beach at sunset” and receive an MP4 video in response.

The API handles all the heavy computation on cloud GPUs. Your application just needs to make an HTTP request and handle the response — no ML frameworks, no model weights, no GPU infrastructure required.

How Text-to-Video Generation Works

Modern text-to-video models use diffusion transformers to generate video frame by frame. The process starts with noise and progressively refines it into coherent video frames that match your text prompt. Here's the typical API workflow:

1

Send a POST request with your prompt, model, and parameters (duration, aspect ratio)

2

The API queues the generation and returns a generation ID

3

The model generates video frames on cloud GPUs (30s to 3min)

4

Your webhook receives the completed video URL, or you poll the status endpoint

5

Download or stream the generated MP4 video

Available Models

Renderful provides access to the leading text-to-video models through a single API:

Quick Start Code Example

Here's how to generate a video from a text prompt using Renderful's API:

POST/api/v1/generations
curl -X POST https://api.renderful.ai/api/v1/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/sora-2",
    "input": {
      "prompt": "A drone shot flying over a tropical beach at golden hour, cinematic",
      "aspect_ratio": "16:9",
      "duration": 5
    },
    "webhook": "https://your-app.com/webhook"
  }'

Python

python
import requests

response = requests.post(
    "https://api.renderful.ai/api/v1/generations",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "kuaishou/kling-v2-master",
        "input": {
            "prompt": "A cat sitting on a windowsill watching rain, cozy atmosphere",
            "aspect_ratio": "16:9",
            "duration": 5,
        },
        "webhook": "https://your-app.com/webhook",
    },
)

generation = response.json()
print(f"Generation ID: {generation['id']}")
print(f"Status: {generation['status']}")

JavaScript

javascript
const response = await fetch("https://api.renderful.ai/api/v1/generations", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "wan-ai/wan2.1-t2v-turbo",
    input: {
      prompt: "A timelapse of a flower blooming in a garden, macro shot",
      aspect_ratio: "9:16",
      duration: 5,
    },
    webhook: "https://your-app.com/webhook",
  }),
});

const generation = await response.json();
console.log("Generation ID:", generation.id);

Pricing

Renderful offers pay-as-you-go pricing for all text-to-video models with no monthly minimums:

ModelPrice / GenerationAvg. SpeedMax Duration
Sora 2From $0.20~2 min20s
Kling v2 MasterFrom $0.14~90s10s
Runway Gen-3From $0.25~60s10s
WAN 2.1 TurboFrom $0.05~45s5s

Use Cases

Text-to-video APIs are being used across industries to automate video content creation:

Marketing & Advertising

Generate ad creatives, social media videos, and product launch teasers at scale. A/B test different visual styles without hiring a production team.

Social Media Content

Create short-form video content for TikTok, Instagram Reels, and YouTube Shorts. Automate content calendars with AI-generated videos.

Product Demos & Explainers

Produce product walkthrough videos and explainer content. Visualize concepts and features before they’re built.

E-commerce & Retail

Generate product showcase videos from text descriptions. Create lifestyle videos showing products in context without physical photography.

Frequently Asked Questions

What is a text to video API?
A text to video API is a REST endpoint that accepts a text prompt and returns a generated video. You send an HTTP request with your prompt and parameters (duration, aspect ratio, resolution), and the API returns a video file or URL. No GPU or ML infrastructure required on your end.
Which AI model is best for text to video generation?
It depends on your use case. Sora produces the most cinematic results, Kling excels at realistic motion, Runway is great for creative effects, and WAN offers fast generation at lower cost. Renderful lets you try all of them through a single API.
How long does AI video generation take?
Generation time varies by model and video length. Most models take 30 seconds to 3 minutes for a 5-second clip. Sora and Kling tend to be on the longer end due to higher quality output. Renderful uses webhooks so your app doesn’t need to wait.
How much does a text to video API cost?
Through Renderful, pricing starts at $0.05 per generation for basic models and up to $0.50 for premium models like Sora. All pricing is pay-as-you-go with no monthly minimums. Free credits are included on sign up.
Can I use generated videos commercially?
Yes. Videos generated through Renderful’s API can be used commercially. Specific licensing terms vary by model, but most models including Kling, Runway, and WAN allow commercial use of generated content.

Related Articles

Start Generating Videos from Text Today

Create your Renderful account, get free credits, and start generating AI videos with Sora, Kling, Runway, WAN, and more through a single API.