Remotion Audio & Subtitles — Synchronizing Sound and Visuals
Synchronize audio/video sources to frames with <Audio>, <OffthreadVideo>, useAudioData()
Synchronizing sound and visuals is a core element of video.
: Inserts MP3/WAV files into video. Use startFrom/endAt props for specific audio segments, or pass a function to volume for per-frame volume control. Placing inside a Sequence plays audio only during that scene.
useAudioData() + getWaveformPortion(): Extracts waveform data from audio files to reflect current frame amplitude in visuals. Enables audio visualizers where bars dance to music.
@remotion/captions: Synchronizes subtitle data (with timestamps) generated from TTS engines or Whisper at frame precision.
How It Works
Insert audio track into video with <Audio src={audioUrl} /> (can be placed inside Sequence)
Pass (f) => interpolate(f, [0, 30], [0, 1]) to volume prop for fade-in effect
Load waveform with useAudioData() → extract current frame amplitude with getWaveformPortion()
Reflect amplitude data in SVG/CSS styles to render audio visualizer
Parse subtitle JSON with @remotion/captions and display subtitle text matching current frame
Pros
- ✓ Frame precision: audio/subtitles/visuals sync to exactly the same frame
- ✓ OffthreadVideo composites existing footage without blocking main thread
- ✓ Per-frame volume control via volume function → natural crossfade
Cons
- ✗ Audio waveform analysis is CPU-intensive → performance degradation with long audio files
- ✗ Timestamp data for subtitle sync must be prepared separately