Demo page

UNIVERSR: UNIFIED AND VERSATILE AUDIO SUPER RESOLUTION VIA VOCODER-FREE FLOW MATCHING

Submitted to ICASSP 2026


Authors

Woongjib Choi, Sangmin Lee, Hyungseob Lim, Hong-Goo Kang


Abstract

In this paper, we present a vocoder-free framework for audio super-resolution that employs a flow matching generative model to capture the conditional distribution of complex-valued spectral coefficients. Unlike conventional two-stage diffusion-based approaches that predict a mel-spectrogram and then rely on a pre-trained neural vocoder to synthesize waveforms, our method directly reconstructs waveforms via the inverse Short-Time Fourier Transform (iSTFT), thereby eliminating the dependence on a separate vocoder. This design not only simplifies end-to-end optimization but also overcomes a critical bottleneck of two-stage pipelines, where the final audio quality is fundamentally constrained by vocoder performance. Experiments show that our model consistently produces high-fidelity 48 kHz audio across diverse upsampling factors, achieving state-of-the-art performance on both speech and general audio datasets.


Pipeline of UniverSR


Overall illustrations of UniverSR showing (a) training stage, (b) inference stage, (c) vector field estimator architecture, and (d) feature encoder architecture. Specifically, ODE solver consists of feature encoder and vector field estimator.


Audio Super Resoultion in Speech Domain

Ground Truth Ground Truth (Vocoded)
8 → 48 kHz 12 → 48 kHz 16 → 48 kHz 24 → 48 kHz
Input
AudioSR
FlashSR
UniverSR (Proposed)


Audio Super Resoultion in Music Domain

Ground Truth Ground Truth (Vocoded)
8 → 48 kHz 12 → 48 kHz 16 → 48 kHz 24 → 48 kHz
Input
AudioSR
FlashSR
UniverSR (Proposed)


Audio Super Resoultion in Sound Effect Domain

Ground Truth Ground Truth (Vocoded)
8 → 48 kHz 12 → 48 kHz 16 → 48 kHz 24 → 48 kHz
Input
AudioSR
FlashSR
UniverSR (Proposed)


Comparison with Speech Super Resolution Models

Ground Truth Ground Truth (Vocoded)
8 → 48 kHz 12 → 48 kHz 16 → 48 kHz 24 → 48 kHz
Input
Fre-Painter
FlowHigh
NU-Wave2
UDM+
UniverSR (Proposed)