WAV vs MP3 Stem Separation: Controlled Test Results

By CleanStems Editorial | Tested and published May 26, 2026 | About this site

Important limitation: this is a synthetic stress test, not a sung performance. The lead is a vibrating tone melody used as a known reference. It tests encoding sensitivity and whether a non-human melodic lead is classified as a vocal, not real-world vocal-removal quality.

Test Method

We generated an 8-second, mono 22.05 kHz lead-melody reference and a separate synthesized backing reference.
We mixed those references into lossless WAV, then encoded the same mix as a 128 kbps MP3.
On May 26, 2026, we submitted each input to the same public separation processor currently embedded in the CleanStems vocal tool.
The processor returned vocal and instrumental WAV files at 44.1 kHz stereo. We measured output RMS level in dBFS and made all clips available to audition.

Reference and Inputs

Lead melody reference

Known synthetic lead proxy.

Backing reference

Known synthesized accompaniment.

WAV mixed input

Lossless test input, -23.55 dBFS RMS.

MP3 mixed input

Same mix encoded at 128 kbps.

Returned Outputs to Audition

WAV input: vocals output

Returned vocal stem from lossless input.

WAV input: instrumental output

Returned non-vocal stem from lossless input.

MP3 input: vocals output

Returned vocal stem from lossy input.

MP3 input: instrumental output

Returned non-vocal stem from lossy input.

Measured Result

Returned output	WAV input	128 kbps MP3 input
Vocals stem RMS level	-82.08 dBFS	-77.51 dBFS
Instrumental stem RMS level	-25.73 dBFS	-25.22 dBFS

What This Result Shows

The synthetic melody was not treated as a human vocal in this run. Almost all audible material remained in the instrumental output, while both vocal outputs were near-silent. The MP3 run left a slightly louder vocal-output residue than the WAV run, a difference of 4.57 dB in this near-silent output.

This does not establish that WAV will always produce cleaner separated vocals on real songs. It establishes a narrower and useful fact: a simple non-human lead tone is not a fair proxy for judging singing separation, and lossy encoding can alter low-level residual output in this controlled case.

Next Quality Test Needed

The next publishable comparison should use a short recording with separate authorized human vocal and accompaniment reference stems. That would allow listening tests and reference-based separation measurements for a task the tool is actually meant to perform.

All source and returned clips on this page are published for repeatable inspection. The synthesized source material was created for CleanStems and contains no commercial recording.

Try Permitted Audio Understand Artifacts