Commit Graph

2 Commits

Author SHA1 Message Date
Aiden McClelland
2bb1463f4f fix: mitigate tokio I/O driver starvation (tokio-rs/tokio#4730)
Tokio's multi-thread scheduler has an unfixed vulnerability where all
worker threads can end up parked on condvars with no worker driving the
I/O reactor.  Condvar-parked workers have no timeout and sleep
indefinitely, so once in this state the runtime never recovers.

This was observed on a box migrating from 0.3.5.1: after heavy task
churn (package reinstalls, container operations, logging) all 16 workers
ended up on futex_wait with no thread on epoll_wait.  The web server
listened on both HTTP and HTTPS but never replied.  The box was stuck
for 7+ hours with 0% CPU.

Two mitigations:

1. Watchdog OS thread (startd.rs): a plain std::thread that every 30s
   injects a no-op task via Handle::spawn.  This forces a condvar-parked
   worker to wake, cycle through park, and grab the driver TryLock —
   breaking the stall regardless of what triggered it.

2. block_in_place in the logger (logger.rs): the TeeWriter holds a
   std::sync::Mutex across blocking file + stderr writes on worker
   threads.  Wrapping in block_in_place tells tokio to hand off driver
   duties before the worker blocks, reducing the window for starvation.
   Guarded by runtime_flavor() to avoid panicking on current-thread
   runtimes used by the CLI.
2026-03-25 10:14:03 -06:00
Aiden McClelland
96ae532879 Refactor/project structure (#3085)
* refactor project structure

* environment-based default registry

* fix tests

* update build container

* use docker platform for iso build emulation

* simplify compat

* Fix docker platform spec in run-compat.sh

* handle riscv compat

* fix bug with dep error exists attr

* undo removal of sorting

* use qemu for iso stage

---------

Co-authored-by: Mariusz Kogen <k0gen@pm.me>
Co-authored-by: Matt Hill <mattnine@protonmail.com>
2025-12-22 13:39:38 -07:00