Skip to content

Production Skills (Phase 2)

Four skills for video production — voiceover generation, audio mixing, screen recording planning, and AI video generation with Seedance.

Phase 2 of the video pipeline. By this point you have a storyboard.md, vo-script.md, and audio-plan.md from Phase 1. These four skills turn that plan into a final video.

iopho-video-director calls these in sequence. You can also invoke each standalone.

Phase 2 sequence:

iopho-recording-checklist → iopho-voiceover-tts → iopho-audio-director assemble
└ iopho-seedance-prompts (AI route)

Multi-engine TTS voiceover production. Three modes: audition → generate → assemble.

Prerequisites:

Terminal window
pip install edge-tts # Free TTS, no API key
brew install ffmpeg # For assembly
# Optional paid engines:
# export ELEVENLABS_API_KEY=...
# export MINIMAX_API_KEY=...
/iopho-voiceover-tts <mode: audition|generate|assemble> [--engine elevenlabs|minimax|edge] [--voice NAME] [--lang en|zh] [--output-dir DIR]

audition — Compare 2–3 voices on the same hero line before committing.

Engine routing for audition:

  • English + ElevenLabs key → Will, Adam, Antoni
  • English, no key → Edge TTS (JennyNeural, GuyNeural, SoniaNeural)
  • Chinese → MiniMax (Gentleman) + Edge TTS (XiaoxiaoNeural, YunxiNeural)

Generates 2–3 MP3 files for the user to listen and pick. No cost committed until generate.

generate — Produce all VO segments from a source file.

Accepted input formats:

  • vo-script.md — scenes with text + timecodes
  • src/i18n/video-strings.json — JSON cue list
  • .storyboard.md — VO lines extracted from scene descriptions
  • Inline text — for one-off generation

Generates one MP3 per segment with rate-limiting (0.5s between ElevenLabs calls). Reports duration, file size, and cost estimate per segment.

assemble — Concatenate segments into master-vo.mp3 with proper timing.

Reads timing from audio-plan.md VO timing table, places segments at correct timestamps, and exports the master voiceover track.

EngineLanguagesQualityCostKey needed
ElevenLabsEN/ZH/multi★★★★★~$0.30/1000 charsYes
MiniMaxZH-first★★★★~¥0.10/1000 charsYes
Edge TTSEN/ZH/40+★★★FreeNo

Plan and produce the complete audio layer: BGM + VO + SFX → master audio.

Prerequisites:

Terminal window
brew install ffmpeg
pip install python3
# Optional: GEMINI_API_KEY for audio analysis mode
/iopho-audio-director <mode: plan|assemble|analyze> [--project-dir DIR] [--bgm FILE] [--vo FILE] [--duration SEC]

plan — Read storyboard + context, output audio-plan.md with:

  • Architecture decision: VO coverage %, BGM strategy (continuous bed / scene-matched / ambient), SFX density
  • BGM spec: genre, BPM, mood, instrument palette + Suno AI prompt with timecodes
  • VO timing table: scene, start time (seconds + frames), VO text, estimated duration, TTS engine
  • SFX placement: timestamp, SFX type, trigger, duration
  • Duck schedule: time ranges and BGM levels (-18dB under VO, 0dB for music-only sections)

assemble — Execute the audio-plan.md to produce master-audio.mp3:

  1. Trim BGM to video length with fade-out
  2. Duck BGM under VO (sidechain compress per duck schedule)
  3. Layer SFX at timestamps
  4. Normalize to platform target (YouTube: -16 LUFS / social: -14 LUFS)
  5. Export audio/master-audio.mp3 + audio/master-audio-loud.mp3

analyze — Inspect an existing audio file:

Terminal window
/iopho-audio-director analyze --bgm path/to/audio.mp3

Returns: BPM (global + dynamic), beats, energy envelope, structural sections. Use to match BGM tempo to scene cuts or find beat-sync points for animations.

SituationAction
Have a storyboardStart with plan mode
No storyboard yetRun iopho-video-director Phase 1 first
Need BGMUse Suno prompt from plan output to generate
Have a track alreadySkip to assemble
Need VORun iopho-voiceover-tts first, then assemble

Generate a shot-by-shot screen recording guide from a storyboard.

/iopho-recording-checklist [project-dir] [--storyboard FILE] [--format checklist|table]

Reads storyboard.md, context.md, and audio-plan.md from the project directory, then generates recording-checklist.md.

Filters scenes by type — only scenes requiring recorded footage are included:

Scene typeNeeds recording?
Screen demo✅ Yes
Terminal/CLI demo✅ Yes
MG animation (Remotion)❌ No
AI-generated (Seedance)❌ No
Stock footage❌ No

For each recording, specifies:

  • Resolution and FPS (Retina 2×, 1920×1080, 30fps vs 60fps)
  • Browser zoom level for high-DPI screens
  • Exact app state: which screen, what data, which account, feature flags, dark/light theme
  • Mobile vs desktop vs tablet setup

Output recording-checklist.md — a shot list ready for handoff to a designer or recording session.


Seedance 2.0 prompt engineering for AI video generation. Chinese-first (中文理解效果最佳).

/iopho-seedance-prompts <视频需求描述> [--style 风格] [--duration 时长] [--capability 能力模式]

Official site: jimeng.jianying.com

Input typeLimit
Images≤ 9 files (jpeg/png/webp/bmp/tiff/gif, < 30MB each)
Videos≤ 3 files (mp4/mov, total 2–15s, < 50MB each)
Audio≤ 3 files (mp3/wav, total ≤ 15s, < 15MB each)
Total files≤ 12 combined
Output duration4–15 seconds
Video pixels409,600–927,408 px (640×640 to 834×1112)

Given your scene description, the skill outputs an optimized Seedance prompt covering:

  • Multimodal inputs — how to reference images, videos, and audio with @ syntax
  • Capability mode — which of Seedance’s 10 capability modes fits the scene (motion, style transfer, extension, interpolation, etc.)
  • Camera language — shot vocabulary: lens type, movement, angle, focus
  • Visual style — cinematic references, color palette, mood
  • Timing — duration selection and scene transition guidance
  • Abstract or cinematic transitions between demo scenes
  • Product concept visuals where no real UI exists yet
  • Mood/brand sequences (opening, closing)
  • Any scene where “show an AI rendering” is better than a literal screen recording

The skill is called by iopho-video-director in Phase 1–2 when the storyboard contains scenes marked as [AI-generated] or [Seedance].