🔧

Remotion Internal Architecture — The Pipeline from Frames to Video

Dissecting the rendering pipeline: Headless Chrome → screenshots → FFmpeg encoding

How Remotion converts React code to video is a creative use of web technologies.

First, Webpack or esbuild builds React components into a browser-executable bundle. This bundle includes video metadata (fps, resolution, total frame count).

Next, Puppeteer launches Headless Chrome, loads the bundled page, and sequentially changes the frame number from 0 to the last frame, screenshotting each as PNG/JPEG. The delayRender()/continueRender() API can delay screenshots until async data loading (API calls, font loading etc.) completes.

Finally, FFmpeg takes the image sequence and encodes it to H.264 (MP4) or VP8/VP9 (WebM). If there's an audio track, it's muxed together.

Multi-threaded rendering is also supported. The --concurrency option runs multiple Chrome instances simultaneously to distribute frame processing.

How It Works

1

Webpack/esbuild bundles React components + metadata (fps, width, height, durationInFrames)

2

Puppeteer launches Headless Chrome instance and loads the bundled HTML page

3

Iterate from frame 0 to N, inject currentFrame → React re-render → screenshot (PNG/JPEG)

4

If delayRender() is active, wait for continueRender() before screenshotting (async safety)

5

FFmpeg encodes image sequence + audio track to H.264/VP9 for final MP4/WebM output

Pros

  • Uses only web standard tech: proven combination of Chrome + FFmpeg
  • Everything the browser can render (CSS/SVG/Canvas/WebGL) becomes a video frame
  • delayRender guarantees async data completion → prevents blank frames in data-driven videos

Cons

  • Chrome dependency: cannot render in environments without Headless Chrome
  • Memory usage: holding full HD frames in memory means high RAM consumption for long videos
  • Complex debugging: hard to pinpoint whether rendering errors occur in Puppeteer/Chrome/FFmpeg

Use Cases

Diagnosing and optimizing Remotion rendering pipeline bottlenecks Resource (CPU/memory) estimation for rendering automation in CI/CD Understanding criteria for choosing Lambda vs local rendering