start-os

Start9/start-os

Fork 0

mirror of https://github.com/Start9Labs/start-os.git synced 2026-03-30 20:14:49 +00:00

Commit Graph

Author	SHA1	Message	Date
Aiden McClelland	2bb1463f4f	fix: mitigate tokio I/O driver starvation (tokio-rs/tokio#4730 ) Tokio's multi-thread scheduler has an unfixed vulnerability where all worker threads can end up parked on condvars with no worker driving the I/O reactor. Condvar-parked workers have no timeout and sleep indefinitely, so once in this state the runtime never recovers. This was observed on a box migrating from 0.3.5.1: after heavy task churn (package reinstalls, container operations, logging) all 16 workers ended up on futex_wait with no thread on epoll_wait. The web server listened on both HTTP and HTTPS but never replied. The box was stuck for 7+ hours with 0% CPU. Two mitigations: 1. Watchdog OS thread (startd.rs): a plain std::thread that every 30s injects a no-op task via Handle::spawn. This forces a condvar-parked worker to wake, cycle through park, and grab the driver TryLock — breaking the stall regardless of what triggered it. 2. block_in_place in the logger (logger.rs): the TeeWriter holds a std::sync::Mutex across blocking file + stderr writes on worker threads. Wrapping in block_in_place tells tokio to hand off driver duties before the worker blocks, reducing the window for starvation. Guarded by runtime_flavor() to avoid panicking on current-thread runtimes used by the CLI.	2026-03-25 10:14:03 -06:00
Aiden McClelland	96ae532879	Refactor/project structure (#3085 ) * refactor project structure * environment-based default registry * fix tests * update build container * use docker platform for iso build emulation * simplify compat * Fix docker platform spec in run-compat.sh * handle riscv compat * fix bug with dep error exists attr * undo removal of sorting * use qemu for iso stage --------- Co-authored-by: Mariusz Kogen <k0gen@pm.me> Co-authored-by: Matt Hill <mattnine@protonmail.com>	2025-12-22 13:39:38 -07:00

Author

SHA1

Message

Date

Aiden McClelland

2bb1463f4f

fix: mitigate tokio I/O driver starvation (tokio-rs/tokio#4730 )

Tokio's multi-thread scheduler has an unfixed vulnerability where all
worker threads can end up parked on condvars with no worker driving the
I/O reactor.  Condvar-parked workers have no timeout and sleep
indefinitely, so once in this state the runtime never recovers.

This was observed on a box migrating from 0.3.5.1: after heavy task
churn (package reinstalls, container operations, logging) all 16 workers
ended up on futex_wait with no thread on epoll_wait.  The web server
listened on both HTTP and HTTPS but never replied.  The box was stuck
for 7+ hours with 0% CPU.

Two mitigations:

1. Watchdog OS thread (startd.rs): a plain std::thread that every 30s
   injects a no-op task via Handle::spawn.  This forces a condvar-parked
   worker to wake, cycle through park, and grab the driver TryLock —
   breaking the stall regardless of what triggered it.

2. block_in_place in the logger (logger.rs): the TeeWriter holds a
   std::sync::Mutex across blocking file + stderr writes on worker
   threads.  Wrapping in block_in_place tells tokio to hand off driver
   duties before the worker blocks, reducing the window for starvation.
   Guarded by runtime_flavor() to avoid panicking on current-thread
   runtimes used by the CLI.

2026-03-25 10:14:03 -06:00

Aiden McClelland

96ae532879

Refactor/project structure (#3085 )

* refactor project structure

* environment-based default registry

* fix tests

* update build container

* use docker platform for iso build emulation

* simplify compat

* Fix docker platform spec in run-compat.sh

* handle riscv compat

* fix bug with dep error exists attr

* undo removal of sorting

* use qemu for iso stage

---------

Co-authored-by: Mariusz Kogen <k0gen@pm.me>
Co-authored-by: Matt Hill <mattnine@protonmail.com>

2025-12-22 13:39:38 -07:00

2 Commits