Production Skills (Phase 2)
Four skills for video production — voiceover generation, audio mixing, screen recording planning, and AI video generation with Seedance.
Phase 2 of the video pipeline. By this point you have a storyboard.md, vo-script.md, and audio-plan.md from Phase 1. These four skills turn that plan into a final video.
iopho-video-director calls these in sequence. You can also invoke each standalone.
Phase 2 sequence:
iopho-recording-checklist → iopho-voiceover-tts → iopho-audio-director assemble └ iopho-seedance-prompts (AI route)iopho-voiceover-tts
Section titled “iopho-voiceover-tts”Multi-engine TTS voiceover production. Three modes: audition → generate → assemble.
Prerequisites:
pip install edge-tts # Free TTS, no API keybrew install ffmpeg # For assembly# Optional paid engines:# export ELEVENLABS_API_KEY=...# export MINIMAX_API_KEY=.../iopho-voiceover-tts <mode: audition|generate|assemble> [--engine elevenlabs|minimax|edge] [--voice NAME] [--lang en|zh] [--output-dir DIR]audition — Compare 2–3 voices on the same hero line before committing.
Engine routing for audition:
- English + ElevenLabs key → Will, Adam, Antoni
- English, no key → Edge TTS (JennyNeural, GuyNeural, SoniaNeural)
- Chinese → MiniMax (Gentleman) + Edge TTS (XiaoxiaoNeural, YunxiNeural)
Generates 2–3 MP3 files for the user to listen and pick. No cost committed until generate.
generate — Produce all VO segments from a source file.
Accepted input formats:
vo-script.md— scenes with text + timecodessrc/i18n/video-strings.json— JSON cue list.storyboard.md— VO lines extracted from scene descriptions- Inline text — for one-off generation
Generates one MP3 per segment with rate-limiting (0.5s between ElevenLabs calls). Reports duration, file size, and cost estimate per segment.
assemble — Concatenate segments into master-vo.mp3 with proper timing.
Reads timing from audio-plan.md VO timing table, places segments at correct timestamps, and exports the master voiceover track.
Engine comparison
Section titled “Engine comparison”| Engine | Languages | Quality | Cost | Key needed |
|---|---|---|---|---|
| ElevenLabs | EN/ZH/multi | ★★★★★ | ~$0.30/1000 chars | Yes |
| MiniMax | ZH-first | ★★★★ | ~¥0.10/1000 chars | Yes |
| Edge TTS | EN/ZH/40+ | ★★★ | Free | No |
iopho-audio-director
Section titled “iopho-audio-director”Plan and produce the complete audio layer: BGM + VO + SFX → master audio.
Prerequisites:
brew install ffmpegpip install python3# Optional: GEMINI_API_KEY for audio analysis mode/iopho-audio-director <mode: plan|assemble|analyze> [--project-dir DIR] [--bgm FILE] [--vo FILE] [--duration SEC]plan — Read storyboard + context, output audio-plan.md with:
- Architecture decision: VO coverage %, BGM strategy (continuous bed / scene-matched / ambient), SFX density
- BGM spec: genre, BPM, mood, instrument palette + Suno AI prompt with timecodes
- VO timing table: scene, start time (seconds + frames), VO text, estimated duration, TTS engine
- SFX placement: timestamp, SFX type, trigger, duration
- Duck schedule: time ranges and BGM levels (-18dB under VO, 0dB for music-only sections)
assemble — Execute the audio-plan.md to produce master-audio.mp3:
- Trim BGM to video length with fade-out
- Duck BGM under VO (sidechain compress per duck schedule)
- Layer SFX at timestamps
- Normalize to platform target (YouTube: -16 LUFS / social: -14 LUFS)
- Export
audio/master-audio.mp3+audio/master-audio-loud.mp3
analyze — Inspect an existing audio file:
/iopho-audio-director analyze --bgm path/to/audio.mp3Returns: BPM (global + dynamic), beats, energy envelope, structural sections. Use to match BGM tempo to scene cuts or find beat-sync points for animations.
Decision guide
Section titled “Decision guide”| Situation | Action |
|---|---|
| Have a storyboard | Start with plan mode |
| No storyboard yet | Run iopho-video-director Phase 1 first |
| Need BGM | Use Suno prompt from plan output to generate |
| Have a track already | Skip to assemble |
| Need VO | Run iopho-voiceover-tts first, then assemble |
iopho-recording-checklist
Section titled “iopho-recording-checklist”Generate a shot-by-shot screen recording guide from a storyboard.
/iopho-recording-checklist [project-dir] [--storyboard FILE] [--format checklist|table]Reads storyboard.md, context.md, and audio-plan.md from the project directory, then generates recording-checklist.md.
What gets planned
Section titled “What gets planned”Filters scenes by type — only scenes requiring recorded footage are included:
| Scene type | Needs recording? |
|---|---|
| Screen demo | ✅ Yes |
| Terminal/CLI demo | ✅ Yes |
| MG animation (Remotion) | ❌ No |
| AI-generated (Seedance) | ❌ No |
| Stock footage | ❌ No |
For each recording, specifies:
- Resolution and FPS (Retina 2×, 1920×1080, 30fps vs 60fps)
- Browser zoom level for high-DPI screens
- Exact app state: which screen, what data, which account, feature flags, dark/light theme
- Mobile vs desktop vs tablet setup
Output recording-checklist.md — a shot list ready for handoff to a designer or recording session.
iopho-seedance-prompts
Section titled “iopho-seedance-prompts”Seedance 2.0 prompt engineering for AI video generation. Chinese-first (中文理解效果最佳).
/iopho-seedance-prompts <视频需求描述> [--style 风格] [--duration 时长] [--capability 能力模式]Official site: jimeng.jianying.com
Input limits (Seedance 2.0)
Section titled “Input limits (Seedance 2.0)”| Input type | Limit |
|---|---|
| Images | ≤ 9 files (jpeg/png/webp/bmp/tiff/gif, < 30MB each) |
| Videos | ≤ 3 files (mp4/mov, total 2–15s, < 50MB each) |
| Audio | ≤ 3 files (mp3/wav, total ≤ 15s, < 15MB each) |
| Total files | ≤ 12 combined |
| Output duration | 4–15 seconds |
| Video pixels | 409,600–927,408 px (640×640 to 834×1112) |
What this skill generates
Section titled “What this skill generates”Given your scene description, the skill outputs an optimized Seedance prompt covering:
- Multimodal inputs — how to reference images, videos, and audio with
@syntax - Capability mode — which of Seedance’s 10 capability modes fits the scene (motion, style transfer, extension, interpolation, etc.)
- Camera language — shot vocabulary: lens type, movement, angle, focus
- Visual style — cinematic references, color palette, mood
- Timing — duration selection and scene transition guidance
When to use
Section titled “When to use”- Abstract or cinematic transitions between demo scenes
- Product concept visuals where no real UI exists yet
- Mood/brand sequences (opening, closing)
- Any scene where “show an AI rendering” is better than a literal screen recording
The skill is called by iopho-video-director in Phase 1–2 when the storyboard contains scenes marked as [AI-generated] or [Seedance].