Skip to main content
HyperFrames: Make Real Videos by Writing HTML, in Claude Code or Codex
Guides|June 18, 20268 min read

HyperFrames: Make Real Videos by Writing HTML, in Claude Code or Codex

HeyGen's open-source HyperFrames turns HTML, CSS, and animation into finished MP4 video - and it is built for AI agents. Here is what it does, why it is valuable, and how to use it inside Claude Code and the Codex app.

Gabe KedingParker NewellLuke Keding

The OneWave Team

AI Consulting

You Can Now Make Real Videos by Writing HTML

One of the most useful tools we have picked up in 2026 is HyperFrames - an open-source framework from HeyGen that turns plain HTML, CSS, and animation into finished MP4 video. No timeline editor, no After Effects, no proprietary project format. You describe the video, your AI coding agent writes the HTML, and a render command produces a real video file.

Its tagline says it plainly: "Write HTML. Render video. Built for agents." That last part is why it matters. Coding agents like Claude Code and Codex are already excellent at writing HTML and CSS - so giving them a way to emit video means a non-designer can produce a polished, animated explainer just by asking. We have started using it for client product demos and social clips, and it is good enough that we wanted to write it up.

Made by OneWave with HyperFrames + Claude CodeReal footage + a founder voiceover, wrapped in code. No editor. No motion team.
We built this ourselves with HyperFrames, directing Claude Code to write the composition. It is a single HTML file: stock and iPhone footage dropped in as clips, our founder's real voice as the narration, plus animated titles, social cards, a lower third, transitions, film grain, and a music bed - every kind of video, from one tool. It renders from a single command, lives in version control, and re-renders on demand.

How We Made the Video Above

That clip is the whole argument in 44 seconds, so it is worth saying exactly what went into it - because none of it required an editor, a studio, or a motion designer. It is one HTML file that pulls together a handful of ordinary inputs:

  • A real voiceover, recorded on a phone. Our founder recorded a few takes on his iPhone. We dropped them into the project, transcribed them, and cut the lines to the beats - so the narration is a genuine human voice, not text-to-speech.
  • Royalty-free b-roll. The surfing, the gym, the workspace shots are free stock clips (Pexels) dropped straight in as <video> elements. The same slots could hold footage you shoot yourself.
  • A product demo built entirely in HTML. The SaaS dashboard with the cursor that moves and clicks is not a screen recording - it is markup and CSS with an animated pointer. That is the "HTML + anything = video" idea made literal: if you can build the UI, you can film it.
  • Off-the-shelf pieces from the registry. The lower-third card and the film-grain texture are prebuilt blocks we pulled in by name and restyled to our brand.
  • Titles, transitions, captions, and a music bed - all defined in the same file, all on one timeline.

Then one command rendered the finished MP4. The source lives in version control next to the site, so when the brand or the copy changes, we re-render instead of re-shooting. That is the point of the rest of this post.

How It Actually Works

The core idea is "video as code," and the design is refreshingly literal: HTML is the source of truth. Every element is a clip. Timing lives in data-* attributes (data-start, data-duration, data-track-index). Motion comes from a paused animation timeline that the renderer can seek to any frame, and CSS controls how everything looks.

To render, HyperFrames loads your HTML in headless Chrome, steps through it frame by frame (frame = time x fps), and encodes the result with FFmpeg. Because nothing depends on the wall clock, it is deterministic - the same input always produces the exact same frames. That means video you can put in version control and a CI pipeline, and regenerate on demand.

The mental model: a HyperFrames composition is just a web page that happens to know what time it is. If you can build a landing page, you can build a video.

What You Can Build

It is far more than slideshows. The framework and its add-on registry cover most of what a real motion-graphics pipeline needs:

  • Voiceover, built in. Local text-to-speech (Kokoro) with dozens of voices and multiple languages - no API key - so narration is generated right in the project.
  • Captions that sync. Whisper transcription produces word-level timestamps for karaoke-style, per-word animated captions; you can also import existing SRT/VTT.
  • Scene transitions and VFX. A large registry of transitions (3D, blur, glitch, light leaks, shader warps, whip pans) you drop in by name.
  • Social-ready overlays. Prebuilt lower thirds and platform cards - YouTube, TikTok, Instagram, X, Reddit, Spotify - plus macOS-style notifications and app showcases.
  • Audio-reactive motion and data viz. Drive animations off audio frequency bands (pulse on the bass), and render charts and flowcharts as animated graphics.
  • Transparent overlays. Built-in background removal cuts out a presenter so you can composite them over your scene.

Animation is adapter-based, so you can author motion with GSAP, CSS keyframes, Anime.js, the Web Animations API, Lottie, or Three.js - the one rule is that it must be seekable, which is what keeps renders deterministic.

Shoot B-Roll on Your iPhone, Wrap It in Code

Here is the part that makes this practical for a small team: you do not need a stock-footage subscription or a camera crew to get real footage into a video. The clip in your pocket counts. The workflow we use looks like this:

  • Record on your phone. Shoot the b-roll on your iPhone - a product in use, a storefront, a whiteboard, a quick talking-head intro. Anything you would normally pay for as stock, you can usually just film.
  • Get it into the project. AirDrop or drag the .mov/.mp4 into your HyperFrames folder, then tell Claude Code what it is. (npx hyperframes init will even copy media in and transcribe any audio for you.) Claude Code can trim it, convert it, and pull a transcript for captions without you touching an editor.
  • Wrap it in HTML. Your footage becomes a clip in the composition - literally a <video> element with data-start and data-duration. From there HyperFrames composites everything else around and over it: animated titles, lower thirds, word-synced captions, a generated voiceover, transitions between shots, even background removal so a presenter floats over a branded scene.
  • Render. One command turns the whole thing - your phone clip plus all the motion graphics - into a finished, on-brand MP4.

So the honest answer to "where does the b-roll come from?" is: you. You film it, and the code does the production work that used to need an editor. The explainer at the top of this post is pure motion graphics - no footage - but the exact same composition could have your iPhone clips dropped straight into it.

Using It in Claude Code

HyperFrames ships as agent skills, so you teach your agent the whole workflow with one install. (Want just the commands in order? See our step-by-step guide to making a video with HyperFrames.) In a terminal:

  • Install once: npx skills add heygen-com/hyperframes (add --all for the full skill set). You need Node.js 22+ and FFmpeg on your PATH.
  • Scaffold: npx hyperframes init my-video creates the project, copies in media, and transcribes any audio.
  • Author: describe the video in plain language - Claude Code writes and edits the HTML composition for you.
  • Check: npx hyperframes lint validates structure and npx hyperframes inspect catches text spilling off-screen.
  • Preview: npx hyperframes preview opens HyperFrames Studio with frame-accurate scrubbing and hot reload.
  • Render: npx hyperframes render produces an MP4.

This pairs naturally with everything else Claude Code does - we keep video compositions in the same repo as the product and regenerate them when the brand or copy changes. (New to Claude Code? Start with our guide to Claude Chat, Cowork, and Code and our Claude Code vs Codex breakdown.)

Using It in the Codex App

The same skills work in OpenAI's Codex - the install command and the npx hyperframes dev loop are identical. HeyGen also publishes a dedicated Codex plugin bundle for HyperFrames that includes the core authoring, CLI, registry, GSAP, and website-to-video skills, so the Codex app knows the production workflow out of the box.

In other words, it is agent-agnostic. Whether your team lives in the Claude Code terminal or the Codex desktop app - or runs both through Crest - the path to a finished video is the same: describe it, preview it, render it.

Output and Putting It on Your Site

HyperFrames renders to a standard MP4 by default (1920x1080 at 30fps), with flags for frame rate, quality, and a WebM export when you need transparency. Because the output is an ordinary video file, embedding it anywhere is trivial - host it and drop in a normal HTML <video> tag, same as any other clip. The source of truth stays in code; what you ship to the page is the rendered file.

Why This Matters for Business

Video is the most expensive content most businesses produce, and the bottleneck is usually tooling and specialists. HyperFrames changes the economics:

  • Templatable at scale. Because a composition is HTML, you can parameterize it and regenerate variants - personalized demos, localized versions via the multilingual TTS, per-account social clips - the same way you template a web page.
  • No license friction. It is Apache-2.0 and open source, with no per-render fees, so volume does not blow up your costs.
  • The whole chain is built in. Voiceover, captions, transitions, overlays, and background removal all live in the tool - you are not stitching five subscriptions together.
  • Agent-native means fast. A marketer can describe a video and have an agent produce an on-brand draft in minutes, then iterate.

Our Take

HyperFrames is one of those tools that quietly changes what a small team can ship. We use it to turn product updates into explainer clips and to spin up social-ready video without booking a motion designer for every asset. If you want help wiring it into your Claude Code or Codex workflow - or you just want a batch of branded videos built - that is exactly the kind of work we do. Book a free call and we will scope it to your team.

Sources

HyperFramesvideo as codeHeyGen HyperFramesAI video generationClaude Code videoCodex videoprogrammatic videoOneWave AI
Share this article

Need help implementing AI?

OneWave AI helps small and mid-sized businesses adopt AI with practical, results-driven consulting. Book a free 30-minute call — no pitch, just a clear look at what's possible.

Not ready to talk? Stay in the loop.

Practical Claude & AI tips for small teams. No fluff, unsubscribe anytime.