← Station / Broadcasts

Recovering a !Send Resource on the UI's Tick

cpal::Stream can't move between threads. So I didn't move it.

BROADCAST · · Rust · 8 min read

cpal::Stream is !Send. Pulling headphones on a Windows WASAPI setup silently kills the audio stream and the obvious fix — spawn a thread that rebuilds it — is forbidden at the type level. The fix that works: an AtomicBool, the existing 250ms UI tick, and no new thread.

Entropy is a Rust desktop music player. cpal is the audio backend — it owns the stream that delivers samples to the soundcard. On Windows the host API is WASAPI, and the most common way a stream dies isn’t a crash. It’s a user action: unplugging headphones, changing the default output device in Sound settings, undocking a laptop. The OS revokes the stream. cpal forwards the error to a callback you registered. Then nothing. Forever.

Audio is silent until you stop and start a new track, which re-creates the stream from scratch.

This had been in the audit backlog as “Audio device hot-swap glitch on Windows WASAPI” for three releases. Deferred each time, because the obvious fix doesn’t work — and the actual fix took a minute to see.

What I tried first#

The normal shape of this kind of recovery:

error callback fires
    → set a flag, or push to a channel
watchdog thread polls
    → drops the dead stream
    → re-acquires the device
    → builds a new stream
    → resumes

It’s the same pattern as reconnecting a websocket, rebuilding a closed file handle, or recovering a poisoned mutex. You can find a hundred examples in the Rust ecosystem.

The wall: cpal’s Stream is !Send.

// from cpal's source, abridged:
pub struct Stream {
    inner: Box<dyn StreamTrait>,
    _marker: std::marker::PhantomData<*mut ()>,
}

That *mut () makes it !Send. The thread that built the stream is the only thread that can drop it. A watchdog thread that owns the Stream is illegal at the type level — thread::spawn won’t even compile.

Workarounds that don’t survive contact#

I went through the usual escape hatches before accepting the constraint.

Arc<Mutex<Stream>> — doesn’t help. The *mut () doesn’t disappear behind a mutex. Mutex<Stream> is still !Send, and the Arc makes nothing better.

Drop the stream from the error callback itself — cpal callbacks run on the audio thread. They’re meant to be short and non-blocking. Dropping a stream from inside its own error callback is also where dragons live (re-entrant cleanup, callback runs the destructor that detaches the callback). Even if it worked, you’ve now done all the cleanup work but can’t build the replacement from there.

A dedicated audio thread that owns the stream and processes commands via a channel — this is the structurally correct answer for a long-lived audio engine. It’s also ~150 LOC of refactor: thread spawn at startup, state migration, lifetime threading, a command enum (Play, Pause, SetVolume, Stop, Recover), reply channels for sync calls, panic handling for the audio thread crashing. The audit item was tagged “low impact, isolated thread.” Spending an afternoon on this would have been correct in the abstract; spending it for this specific symptom wasn’t.

What was already there#

Going through the codebase looking for somewhere to put the recovery, I noticed something obvious in hindsight.

The UI already ticks. Every 250ms, hooks/src/use_player_task.rs runs a loop that:

  • Polls the player’s position and updates the progress signal
  • Reports playback to Last.fm + Discord + Jellyfin
  • Drives the auto-skip-to-next-track logic at end of file
  • Handles the crossfade ramp in the last N seconds of a track

Some of this work mutates the player. ctrl.player.write().get_position() is called once per tick. So the UI thread already takes a write lock on the Player every quarter second. And the UI thread is the thread that built the stream.

The recovery doesn’t need its own thread. It needs to be one more line in the existing tick.

The diff#

The error callback gets a clone of an AtomicBool that lives on the Player:

pub struct Player {
    // ... existing fields ...

    /// Set by the cpal error callback when the device disappears. Polled
    /// from the hooks tick so we can rebuild the stream on the new default
    /// device. Primary trigger: Windows WASAPI default-output hot-swap.
    device_invalidated: Arc<AtomicBool>,
}

The callback closure captures invalidated = self.device_invalidated.clone() when the stream is built. The callback itself is one line:

move |err| {
    tracing::warn!("cpal stream error: {err}");
    invalidated.store(true, Ordering::SeqCst);
},

That’s the entire signal mechanism. No channel. No condvar. An atomic write the audio thread can do without allocating, blocking, or contaminating itself with Send requirements it can’t satisfy.

The recovery itself is a new method on Player. It only runs when the flag is set, and it runs on whatever thread polls it:

pub fn poll_device_recovery(&mut self) {
    if !self.device_invalidated.swap(false, Ordering::SeqCst) {
        return;
    }
    if self._stream.is_none() || self.ring_buf_consumer.is_none() {
        return;
    }
    tracing::warn!("audio device invalidated; rebuilding stream");

    self._stream = None;  // drop dead stream

    // Discard buffered samples so playback resumes near real-time on the
    // new device instead of replaying the moment of disconnect.
    if let Some(consumer) = &self.ring_buf_consumer {
        if let Ok(c) = consumer.lock() {
            let mut buf = [0.0f32; 4096];
            while c.read(&mut buf).unwrap_or(0) > 0 {}
        }
    }

    let device = match cpal::default_host().default_output_device() {
        Some(d) => d,
        None => {
            // No device yet (mid-swap) — re-arm so we try again next tick.
            self.device_invalidated.store(true, Ordering::SeqCst);
            return;
        }
    };

    match self.build_cpal_stream(&device) {
        Ok(stream) => {
            if stream.play().is_err() {
                self.device_invalidated.store(true, Ordering::SeqCst);
                return;
            }
            self._stream = Some(stream);
            self._device = Some(device);
        }
        Err(_) => {
            self.device_invalidated.store(true, Ordering::SeqCst);
        }
    }
}

The call site adds three words to the existing tick:

ctrl.player.write().poll_device_recovery();

Net change for the feature: ~120 LOC including the extracted build_cpal_stream helper that both the initial build and the recovery share. The recovery path itself is 30.

Side effects worth keeping#

A few things fell out for free.

Self-healing across multiple swap events. If recovery itself fails — usually because there’s no default output device yet, in the brief window between unplugging USB headphones and the system switching to internal speakers — the recovery flips the flag back on before returning. The next tick tries again. No retry-with-backoff scaffolding. Tick cadence is the backoff.

Ring buffer reuse. The decoder thread doesn’t know any of this happened. It keeps writing samples into the same ring buffer at the same rate. After recovery, the new stream’s callback drains the same buffer. Decoder state isn’t disturbed. Playback position isn’t reset. The only thing the user notices is a glitch of bounded duration.

Bounded glitch. The audible silence is the gap between the hot-swap event and the next tick — bounded at 250ms. The brief replay of stale buffered samples that would have happened without the drain is eliminated by the buffer flush. The transition sounds less weird than a quarter-second of “the moment you unplugged the headphones” playing back through the laptop speakers a beat too late.

No new thread to tear down. Recovery state has the same lifetime as the Player. When the Player drops, the Arc<AtomicBool> drops, and the closure that captured a clone of it drops with the dead stream. Done. No JoinHandle. No shutdown signal. No graceful-stop handshake.

What I’d write down for next time#

The thread that owns the resource is doing other work. When you can’t share a !Send resource, the answer isn’t always “introduce a thread that owns it”. Look at the threads already running on the affected pinning. If one of them ticks regularly, hand it one more responsibility for free. Single-thread ownership is the cheapest form of synchronisation, and a UI thread that polls position is already doing exactly the kind of “wake up every quarter second and check on things” work that recovery needs.

An atomic flag is the smallest API for “I need help”. Channels and condvars are right when you have data to deliver. When the only payload is “something is wrong, please look”, an AtomicBool::store is cheaper everywhere — no allocations, no lifetimes to worry about, no Send contamination on the producer side. The audio thread can store into an AtomicBool from inside an FFI callback that wouldn’t tolerate a channel send.

Self-arming flags eat retry logic. If recovery fails for a transient reason (device temporarily unavailable mid-swap), set the flag back to true. The polling site is already there, and the tick is already running. You get retry-on-cadence without writing a retry loop, without choosing a backoff curve, without timer scaffolding. For transient unavailability — the only kind that happens during a hot-swap — tick cadence usually catches it on the second or third attempt.

Recovery is allowed to drop in-flight audio. I considered preserving the buffered samples and resuming exactly where the stream died. It’s a worse experience. By the time the new device is available, those samples are 200-500ms stale. Draining the consumer means playback resumes near real-time on the new output instead of dragging the user backwards in time. The user lost some audio at the moment of disconnect; that’s already gone. Reaching backward to recover it just produces a stutter on top of the loss.

!Send is a constraint, not a problem. Spending the afternoon refactoring the audio subsystem into its own thread would have been the wrong answer for this symptom — bigger blast radius, longer review, more places to introduce regressions. The point of !Send here is that cpal can’t let you move the stream because the OS handle is bound to the calling thread on some platforms. The right move is to respect the constraint and look for the existing thread that already satisfies it, not to spawn a new one and try to negotiate.

The change shipped in v0.5.4 (876db25). Code at player/src/player.rs and hooks/src/use_player_task.rs.

// Discussion

Comments are powered by GitHub Discussions via Giscus. Sign in with your GitHub account to add a reply, or discuss on X.

Keyboard Shortcuts

// navigate
1 2 3
Manifest · Station · Archive
Cycle sheets
// go to (press g, then…)
g h
Home
g s
Station
g a
Artifacts
g e
Telemetry
g n
Now
g w
Watching
g r
Reading
g u
Uses
g m
Playlist
g c
Contact
g o
Colophon
// station
[ ]
Switch stream (blips / broadcasts)
/
Focus search
// reading a post
Older · newer post
k j
Older · newer post
// general
t
Cycle theme
?
Toggle this panel
Esc
Close panel