The problem
At Kittora, every long-form recording should become five things by Friday: a YouTube long-form, vertical short clips for TikTok and Reels, a carousel for Instagram, a quote card for LinkedIn, and platform-native captions for each. Doing it manually used to cost a full day per video.
Brand colours drift between pieces. Captions get rushed because each platform has its own voice. Infographics get skipped because the editor has run out of hours. By the time the content is out, the moment has moved on, and the pack never quite lands as one coordinated drop.
The turn
Stop treating each recording as a one-off production. Build a local pipeline that does the editing thinking and the rendering work end to end, while a human stays in control of the source files and the brand.
The system
A Python-orchestrated pipeline that runs on the owner's machine. Local-first on purpose: no SaaS dependency, no per-asset cost, no editorial drift between platforms.
Google Gemini reads the source video and writes the production plan in natural language. Anthropic Claude writes the per-platform captions in Australian English voice. Remotion (React under the hood) renders every video piece: chapter cards, lower thirds, infographic cuts, animated intros, animated outros. FFmpeg handles the lower-level segment work.
A single brand config governs colours, typography, and dimensions across every output, so a TikTok short and a LinkedIn carousel share the same visual DNA. Per-platform best-practice files are injected into the prompts so the captions don't read like generic LLM output. Every step is composable: run one for fast iteration, chain them for the full pack.
Status
Production-ready. Used internally for Kittora content. As a meta example: the agent diagram on the Google Ads Analytics Dashboard case study was rendered by this pipeline.