I Built a Full Video Pipeline with AI in 4 Minutes
From script to published video using Seedance, TTS, and automated subtitle burning — a step-by-step breakdown of the pipeline that changed how I create content.
The 4-Minute Claim
Let me be precise. The pipeline itself takes 4 minutes to set up from scratch — once. After that, producing each video takes 10-15 minutes of actual human work (writing the script) and 3-5 minutes of automated processing.
What used to require a camera, a ring light, three re-takes, and 45 minutes of editing now takes a script and one terminal window.
Here's the exact pipeline I validated, with every command.
The Stack
Total cost for a 30-second video: approximately $0.30-0.80 depending on clip count.
Step 1: Write the Script (the only human step)
The script drives everything. I write it in chunks — each chunk becomes one video clip.
The formula that works:
```
[Hook — 3 seconds]
One sentence that makes someone stop scrolling.
Must be counterintuitive or specific.
[Problem — 5 seconds]
The thing your audience recognizes as real.
[Solution — 15 seconds]
What you actually did. Specifics over generalities.
[Result — 4 seconds]
The outcome. Numbers or visible evidence.
[CTA — 3 seconds]
One action. Not three.
```
Example script (28 seconds, 5 clips):
```
"99% of AI content creators are talking to themselves. Here's why.
They optimize for creation. Not distribution.
I spent this week building a full video pipeline — script to published —
using Seedance, TTS, and automated subtitles.
Zero camera. Zero editing software. One API key.
Want the exact pipeline? Comment 'video' below."
```
Step 2: Generate Video Clips with Seedance
Seedance is a video generation model available via Volcengine's ARK API. You give it an image + a motion prompt, it returns a 5-second video clip.
```bash
export ARK_API_KEY="your-key"
export ARK_ENDPOINT_ID="your-endpoint-id"
bash seedance-i2v.sh cover-image.png "professional workspace, slow zoom in, soft natural light, cinematic" clip1.mp4
```
The script:
For a 30-second video, I generate 5-6 clips. Total wait time: ~3 minutes.
Prompt writing rules I learned the hard way:
Step 3: Normalize and Concatenate
Seedance clips may have slightly different resolutions or framerates. Before concatenating, normalize everything:
```bash
bash normalize-clips.sh clip1.mp4 clip2.mp4 clip3.mp4 clip4.mp4 clip5.mp4
```
This outputs `clip1-norm.mp4` through `clip5-norm.mp4` and a `concat.mp4` — all clips joined, no audio, uniform 1920×1080 at 30fps.
For vertical video (TikTok/Reels/Xiaohongshu), change the scale:
```bash
SCALE=1080:1920 bash normalize-clips.sh clip1.mp4 clip2.mp4 clip3.mp4
```
Step 4: Generate Voiceover
I use TTS to generate the voiceover from the same script. Key parameters:
If the TTS output isn't in the right format:
```bash
ffmpeg -i voice-raw.mp3 -ar 44100 -ac 2 -b:a 128k voice.mp3
```
Step 5: Write and Burn Subtitles
I write subtitles manually as SRT — it takes 3-4 minutes and gives me precise control over timing. For volume production, Whisper can auto-generate them from the voiceover.
```srt
1
00:00:00,000 --> 00:00:03,500
99%的内容创作者在对着空气说话
2
00:00:03,500 --> 00:00:08,000
他们优化的是创作,不是分发
```
The tricky part: ffmpeg's drawtext filter breaks on Chinese characters without a properly configured libass. My solution: use Pillow to render subtitles frame-by-frame.
```bash
python3 burn-subs.py --video concat.mp4 --srt subs.srt --output final-with-subs.mp4
```
Takes about 1× realtime (30 seconds to process a 30-second video).
Step 6: Merge Audio and Video
```bash
ffmpeg -i final-with-subs.mp4 -i voice.mp3 -c:v copy -c:a aac -shortest output-final.mp4
```
Done. The output file is ready to upload.
The Insight Behind the Pipeline
What surprised me wasn't that this worked — it's that it works well enough that I'd post the output without embarrassment.
The bottleneck in content creation isn't talent or ideas. It's production friction. Every step that requires a camera, software, or manual editing is a step where most creators give up.
Removing those steps doesn't make the content better. It makes the creation sustainable.
The pipeline I've described is about distribution velocity — the ability to convert a thought into a published video in under an hour, consistently, without burning out.
For newsletter writers especially, this is the missing piece. You already have the insights (your newsletters). The pipeline turns them into video without requiring you to become a video producer.
What's Next
I'm integrating this pipeline into OnePost's workflow — so newsletter content becomes not just social posts, but videos automatically. The [generate endpoint](/app) already produces 7 platform variants from a newsletter. Video is the eighth.
The full pipeline scripts are open — drop a comment if you want the repo.
Put this into practice
Paste your newsletter → get 7 platform-native posts in 30 seconds. Free to start.
Try OnePost free →