Production Skills (Phase 2)

Four skills for video production — voiceover generation, audio mixing, screen recording planning, and AI video generation with Seedance.

Phase 2 of the video pipeline. By this point you have a storyboard.md, vo-script.md, and audio-plan.md from Phase 1. These four skills turn that plan into a final video.

iopho-video-director calls these in sequence. You can also invoke each standalone.

Phase 2 sequence:

iopho-recording-checklist → iopho-voiceover-tts → iopho-audio-director assemble
                                                   └ iopho-seedance-prompts (AI route)

iopho-voiceover-tts

Multi-engine TTS voiceover production. Three modes: audition → generate → assemble.

Prerequisites:

pip install edge-tts               # Free TTS, no API key
brew install ffmpeg                # For assembly
# Optional paid engines:
# export ELEVENLABS_API_KEY=...
# export MINIMAX_API_KEY=...

/iopho-voiceover-tts <mode: audition|generate|assemble> [--engine elevenlabs|minimax|edge] [--voice NAME] [--lang en|zh] [--output-dir DIR]

Modes

audition — Compare 2–3 voices on the same hero line before committing.

Engine routing for audition:

English + ElevenLabs key → Will, Adam, Antoni
English, no key → Edge TTS (JennyNeural, GuyNeural, SoniaNeural)
Chinese → MiniMax (Gentleman) + Edge TTS (XiaoxiaoNeural, YunxiNeural)

Generates 2–3 MP3 files for the user to listen and pick. No cost committed until generate.

generate — Produce all VO segments from a source file.

Accepted input formats:

vo-script.md — scenes with text + timecodes
src/i18n/video-strings.json — JSON cue list
.storyboard.md — VO lines extracted from scene descriptions
Inline text — for one-off generation

Generates one MP3 per segment with rate-limiting (0.5s between ElevenLabs calls). Reports duration, file size, and cost estimate per segment.

assemble — Concatenate segments into master-vo.mp3 with proper timing.

Reads timing from audio-plan.md VO timing table, places segments at correct timestamps, and exports the master voiceover track.

Engine comparison

Engine	Languages	Quality	Cost	Key needed
ElevenLabs	EN/ZH/multi	★★★★★	~$0.30/1000 chars	Yes
MiniMax	ZH-first	★★★★	~¥0.10/1000 chars	Yes
Edge TTS	EN/ZH/40+	★★★	Free	No

iopho-audio-director

Plan and produce the complete audio layer: BGM + VO + SFX → master audio.

Prerequisites:

brew install ffmpeg
pip install python3
# Optional: GEMINI_API_KEY for audio analysis mode

/iopho-audio-director <mode: plan|assemble|analyze> [--project-dir DIR] [--bgm FILE] [--vo FILE] [--duration SEC]

Modes

plan — Read storyboard + context, output audio-plan.md with:

Architecture decision: VO coverage %, BGM strategy (continuous bed / scene-matched / ambient), SFX density
BGM spec: genre, BPM, mood, instrument palette + Suno AI prompt with timecodes
VO timing table: scene, start time (seconds + frames), VO text, estimated duration, TTS engine
SFX placement: timestamp, SFX type, trigger, duration
Duck schedule: time ranges and BGM levels (-18dB under VO, 0dB for music-only sections)

assemble — Execute the audio-plan.md to produce master-audio.mp3:

Trim BGM to video length with fade-out
Duck BGM under VO (sidechain compress per duck schedule)
Layer SFX at timestamps
Normalize to platform target (YouTube: -16 LUFS / social: -14 LUFS)
Export audio/master-audio.mp3 + audio/master-audio-loud.mp3

analyze — Inspect an existing audio file:

/iopho-audio-director analyze --bgm path/to/audio.mp3

Returns: BPM (global + dynamic), beats, energy envelope, structural sections. Use to match BGM tempo to scene cuts or find beat-sync points for animations.

Decision guide

Situation	Action
Have a storyboard	Start with `plan` mode
No storyboard yet	Run `iopho-video-director` Phase 1 first
Need BGM	Use Suno prompt from `plan` output to generate
Have a track already	Skip to `assemble`
Need VO	Run `iopho-voiceover-tts` first, then `assemble`

iopho-recording-checklist

Generate a shot-by-shot screen recording guide from a storyboard.

/iopho-recording-checklist [project-dir] [--storyboard FILE] [--format checklist|table]

Reads storyboard.md, context.md, and audio-plan.md from the project directory, then generates recording-checklist.md.

What gets planned

Filters scenes by type — only scenes requiring recorded footage are included:

Scene type	Needs recording?
Screen demo	✅ Yes
Terminal/CLI demo	✅ Yes
MG animation (Remotion)	❌ No
AI-generated (Seedance)	❌ No
Stock footage	❌ No

For each recording, specifies:

Resolution and FPS (Retina 2×, 1920×1080, 30fps vs 60fps)
Browser zoom level for high-DPI screens
Exact app state: which screen, what data, which account, feature flags, dark/light theme
Mobile vs desktop vs tablet setup

Output recording-checklist.md — a shot list ready for handoff to a designer or recording session.

iopho-seedance-prompts

Seedance 2.0 prompt engineering for AI video generation. Chinese-first (中文理解效果最佳).

/iopho-seedance-prompts <视频需求描述> [--style 风格] [--duration 时长] [--capability 能力模式]

Official site: jimeng.jianying.com

Input limits (Seedance 2.0)

Input type	Limit
Images	≤ 9 files (jpeg/png/webp/bmp/tiff/gif, < 30MB each)
Videos	≤ 3 files (mp4/mov, total 2–15s, < 50MB each)
Audio	≤ 3 files (mp3/wav, total ≤ 15s, < 15MB each)
Total files	≤ 12 combined
Output duration	4–15 seconds
Video pixels	409,600–927,408 px (640×640 to 834×1112)

What this skill generates

Given your scene description, the skill outputs an optimized Seedance prompt covering:

Multimodal inputs — how to reference images, videos, and audio with @ syntax
Capability mode — which of Seedance’s 10 capability modes fits the scene (motion, style transfer, extension, interpolation, etc.)
Camera language — shot vocabulary: lens type, movement, angle, focus
Visual style — cinematic references, color palette, mood
Timing — duration selection and scene transition guidance

When to use

Abstract or cinematic transitions between demo scenes
Product concept visuals where no real UI exists yet
Mood/brand sequences (opening, closing)
Any scene where “show an AI rendering” is better than a literal screen recording

The skill is called by iopho-video-director in Phase 1–2 when the storyboard contains scenes marked as [AI-generated] or [Seedance].

Production Skills (Phase 2)

iopho-voiceover-tts

Modes

Engine comparison

iopho-audio-director

Modes

Decision guide

iopho-recording-checklist

What gets planned

iopho-seedance-prompts

Input limits (Seedance 2.0)

What this skill generates

When to use

Overview

Products

Video Workflow

Platform