You Can Now Make Real Videos by Writing HTML
One of the most useful tools we have picked up in 2026 is HyperFrames - an open-source framework from HeyGen that turns plain HTML, CSS, and animation into finished MP4 video. No timeline editor, no After Effects, no proprietary project format. You describe the video, your AI coding agent writes the HTML, and a render command produces a real video file.
Its tagline says it plainly: "Write HTML. Render video. Built for agents." That last part is why it matters. Coding agents like Claude Code and Codex are already excellent at writing HTML and CSS - so giving them a way to emit video means a non-designer can produce a polished, animated explainer just by asking. We have started using it for client product demos and social clips, and it is good enough that we wanted to write it up.
How We Made the Video Above
That clip is the whole argument in 44 seconds, so it is worth saying exactly what went into it - because none of it required an editor, a studio, or a motion designer. It is one HTML file that pulls together a handful of ordinary inputs:
- A real voiceover, recorded on a phone. Our founder recorded a few takes on his iPhone. We dropped them into the project, transcribed them, and cut the lines to the beats - so the narration is a genuine human voice, not text-to-speech.
- Royalty-free b-roll. The surfing, the gym, the workspace shots are free stock clips (Pexels) dropped straight in as
<video>elements. The same slots could hold footage you shoot yourself. - A product demo built entirely in HTML. The SaaS dashboard with the cursor that moves and clicks is not a screen recording - it is markup and CSS with an animated pointer. That is the "HTML + anything = video" idea made literal: if you can build the UI, you can film it.
- Off-the-shelf pieces from the registry. The lower-third card and the film-grain texture are prebuilt blocks we pulled in by name and restyled to our brand.
- Titles, transitions, captions, and a music bed - all defined in the same file, all on one timeline.
Then one command rendered the finished MP4. The source lives in version control next to the site, so when the brand or the copy changes, we re-render instead of re-shooting. That is the point of the rest of this post.
How It Actually Works
The core idea is "video as code," and the design is refreshingly literal: HTML is the source of truth. Every element is a clip. Timing lives in data-* attributes (data-start, data-duration, data-track-index). Motion comes from a paused animation timeline that the renderer can seek to any frame, and CSS controls how everything looks.
To render, HyperFrames loads your HTML in headless Chrome, steps through it frame by frame (frame = time x fps), and encodes the result with FFmpeg. Because nothing depends on the wall clock, it is deterministic - the same input always produces the exact same frames. That means video you can put in version control and a CI pipeline, and regenerate on demand.
The mental model: a HyperFrames composition is just a web page that happens to know what time it is. If you can build a landing page, you can build a video.
What You Can Build
It is far more than slideshows. The framework and its add-on registry cover most of what a real motion-graphics pipeline needs:
- Voiceover, built in. Local text-to-speech (Kokoro) with dozens of voices and multiple languages - no API key - so narration is generated right in the project.
- Captions that sync. Whisper transcription produces word-level timestamps for karaoke-style, per-word animated captions; you can also import existing SRT/VTT.
- Scene transitions and VFX. A large registry of transitions (3D, blur, glitch, light leaks, shader warps, whip pans) you drop in by name.
- Social-ready overlays. Prebuilt lower thirds and platform cards - YouTube, TikTok, Instagram, X, Reddit, Spotify - plus macOS-style notifications and app showcases.
- Audio-reactive motion and data viz. Drive animations off audio frequency bands (pulse on the bass), and render charts and flowcharts as animated graphics.
- Transparent overlays. Built-in background removal cuts out a presenter so you can composite them over your scene.
Animation is adapter-based, so you can author motion with GSAP, CSS keyframes, Anime.js, the Web Animations API, Lottie, or Three.js - the one rule is that it must be seekable, which is what keeps renders deterministic.
Shoot B-Roll on Your iPhone, Wrap It in Code
Here is the part that makes this practical for a small team: you do not need a stock-footage subscription or a camera crew to get real footage into a video. The clip in your pocket counts. The workflow we use looks like this:
- Record on your phone. Shoot the b-roll on your iPhone - a product in use, a storefront, a whiteboard, a quick talking-head intro. Anything you would normally pay for as stock, you can usually just film.
- Get it into the project. AirDrop or drag the
.mov/.mp4into your HyperFrames folder, then tell Claude Code what it is. (npx hyperframes initwill even copy media in and transcribe any audio for you.) Claude Code can trim it, convert it, and pull a transcript for captions without you touching an editor. - Wrap it in HTML. Your footage becomes a clip in the composition - literally a
<video>element withdata-startanddata-duration. From there HyperFrames composites everything else around and over it: animated titles, lower thirds, word-synced captions, a generated voiceover, transitions between shots, even background removal so a presenter floats over a branded scene. - Render. One command turns the whole thing - your phone clip plus all the motion graphics - into a finished, on-brand MP4.
So the honest answer to "where does the b-roll come from?" is: you. You film it, and the code does the production work that used to need an editor. The explainer at the top of this post is pure motion graphics - no footage - but the exact same composition could have your iPhone clips dropped straight into it.
Using It in Claude Code
HyperFrames ships as agent skills, so you teach your agent the whole workflow with one install. (Want just the commands in order? See our step-by-step guide to making a video with HyperFrames.) In a terminal:
- Install once:
npx skills add heygen-com/hyperframes(add--allfor the full skill set). You need Node.js 22+ and FFmpeg on your PATH. - Scaffold:
npx hyperframes init my-videocreates the project, copies in media, and transcribes any audio. - Author: describe the video in plain language - Claude Code writes and edits the HTML composition for you.
- Check:
npx hyperframes lintvalidates structure andnpx hyperframes inspectcatches text spilling off-screen. - Preview:
npx hyperframes previewopens HyperFrames Studio with frame-accurate scrubbing and hot reload. - Render:
npx hyperframes renderproduces an MP4.
This pairs naturally with everything else Claude Code does - we keep video compositions in the same repo as the product and regenerate them when the brand or copy changes. (New to Claude Code? Start with our guide to Claude Chat, Cowork, and Code and our Claude Code vs Codex breakdown.)
Using It in the Codex App
The same skills work in OpenAI's Codex - the install command and the npx hyperframes dev loop are identical. HeyGen also publishes a dedicated Codex plugin bundle for HyperFrames that includes the core authoring, CLI, registry, GSAP, and website-to-video skills, so the Codex app knows the production workflow out of the box.
In other words, it is agent-agnostic. Whether your team lives in the Claude Code terminal or the Codex desktop app - or runs both through Crest - the path to a finished video is the same: describe it, preview it, render it.
Output and Putting It on Your Site
HyperFrames renders to a standard MP4 by default (1920x1080 at 30fps), with flags for frame rate, quality, and a WebM export when you need transparency. Because the output is an ordinary video file, embedding it anywhere is trivial - host it and drop in a normal HTML <video> tag, same as any other clip. The source of truth stays in code; what you ship to the page is the rendered file.
Why This Matters for Business
Video is the most expensive content most businesses produce, and the bottleneck is usually tooling and specialists. HyperFrames changes the economics:
- Templatable at scale. Because a composition is HTML, you can parameterize it and regenerate variants - personalized demos, localized versions via the multilingual TTS, per-account social clips - the same way you template a web page.
- No license friction. It is Apache-2.0 and open source, with no per-render fees, so volume does not blow up your costs.
- The whole chain is built in. Voiceover, captions, transitions, overlays, and background removal all live in the tool - you are not stitching five subscriptions together.
- Agent-native means fast. A marketer can describe a video and have an agent produce an on-brand draft in minutes, then iterate.
Our Take
HyperFrames is one of those tools that quietly changes what a small team can ship. We use it to turn product updates into explainer clips and to spin up social-ready video without booking a motion designer for every asset. If you want help wiring it into your Claude Code or Codex workflow - or you just want a batch of branded videos built - that is exactly the kind of work we do. Book a free call and we will scope it to your team.



