Commit Graph

3218 Commits

Author SHA1 Message Date
Matt Hill
392ae2d675 fix: correct false breakage detection for flavored packages and config changes
Two bugs caused the UI to incorrectly warn about dependency breakages:

1. dryUpdate (version path): Flavored package versions (e.g. #knots:27.0.0:0)
   failed exver.satisfies() against flavorless ranges (e.g. >=26.0.0) due to
   flavor mismatch. Now checks the manifest's `satisfies` declarations,
   matching the pattern already used in DepErrorService. Added `satisfies`
   field to PackageVersionInfo so it's available from registry data.

2. checkConflicts (config path): fast-json-patch's compare() treated missing
   keys as conflicts (add ops) and used positional array comparison, diverging
   from the backend's conflicts() semantics. Replaced with a conflicts()
   function that mirrors core/src/service/action.rs — missing keys are not
   conflicts, and arrays use set-based comparison.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 11:38:01 -06:00
Matt Hill
b0b4b41c42 feat: unified restart notification with reason-specific messaging (#3147)
* feat: unified restart notification with reason-specific messaging

Replace statusInfo.updated (bool) with serverInfo.restart (nullable enum)
to unify all restart-needed scenarios under a single PatchDB field.

Backend sets the restart reason in RPC handlers for hostname change (mdns),
language change, kiosk toggle, and OS update download. Init clears it on
boot. The update flow checks this field to prevent updates when a restart
is already pending.

Frontend shows a persistent action bar with reason-specific i18n messages
instead of per-feature restart dialogs. For .local hostname changes, the
existing "open new address" dialog is preserved — the restart toast
appears after the user logs in on the new address.

Also includes migration in v0_4_0_alpha_23 to remove statusInfo.updated
and initialize serverInfo.restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix broken styling and improve settings layout

* refactor: move restart field from ServerInfo to ServerStatus

The restart reason belongs with other server state (shutting_down,
restarting, update_progress) rather than on the top-level ServerInfo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix PR comment

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Aiden McClelland <me@drbonez.dev>
2026-03-29 02:23:59 -06:00
Aiden McClelland
bbbc8f7440 fix: correct BIOS boot partition type GUID for backup target filtering
The BIOS_BOOT_TYPE_GUID constant had the wrong value, so
find_bios_boot_partition never matched the actual BIOS boot partition
created by the gpt crate. This caused it to appear as an available
backup target.
2026-03-28 20:00:59 -06:00
Aiden McClelland
c7a4dd617e fix: resolve tunnel add delay and connectivity loss in gateway watcher
Split poll_ip_info into two phases: write IP info (addresses, subnets,
gateway, DNS, NTP) to the watch immediately, then fetch WAN IP in a
second pass. Previously the echoip HTTP fetch (5s timeout per URL)
blocked the write and was repeatedly cancelled by D-Bus signals during
interface activation, preventing the gateway from ever appearing.

Replace PolicyRoutingCleanup Drop with gc_policy_routing. The old Drop
spawned async route flushes that raced with new apply_policy_routing
calls when the watcher restarted on device_added, wiping freshly-created
routing tables for existing interfaces like eth0. Now policy routing is
managed idempotently by apply_policy_routing, and stale rules are
garbage-collected at the start of each watcher iteration.
2026-03-28 20:00:36 -06:00
Aiden McClelland
d6b81f3c9b fix: assorted fixes across container-runtime, core, and sdk
- Fix parseInt callback in container-runtime to avoid extra map arguments
- Use proper error propagation in list_service_interfaces instead of unwrap_or_default
- Handle non-plain objects by reference in deepEqual
2026-03-27 15:58:52 -06:00
Aiden McClelland
879f953a9f feat: delete ext2_saved subvolume after btrfs-convert
Removes the ext2_saved subvolume (created by btrfs-convert to preserve
original ext4 metadata) before running defrag to reclaim space.
2026-03-26 23:38:54 -06:00
Matt Hill
782f2e83bf ensure correct locale on 035 update (#3145) 2026-03-26 21:35:25 -06:00
Matt Hill
6cefc27c5f build: use org-hosted large runners for fast CI builds
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 19:24:19 -06:00
Matt Hill
2b676808a9 feat: generate certificates signed by the root CA (#3144)
Co-authored-by: Aiden McClelland <me@drbonez.dev>
2026-03-26 18:57:11 -06:00
Aiden McClelland
7c1c15073d fix: default tor proxy for registry 2026-03-26 16:56:02 -06:00
Aiden McClelland
025d569dfa build: replace buildjet runners with github actions large runners 2026-03-26 16:12:25 -06:00
Matt Hill
976bdf3e53 disable finish unless valid form 2026-03-26 15:57:56 -06:00
Aiden McClelland
dce0f075ce feat: cascade address enable/disable to related bindings on same gateway 2026-03-26 15:16:08 -06:00
Aiden McClelland
f3d2782f18 chore: add i18n about strings for CLI commands 2026-03-26 15:15:51 -06:00
Aiden McClelland
8d9be64c19 build: skip compat/utils images for riscv64 architecture 2026-03-26 15:13:54 -06:00
Matt Hill
9bc0fbd5b1 hide 0 capacity drives 2026-03-26 08:13:19 -06:00
Aiden McClelland
b7f7202e25 chore: bump version to 0.4.0-alpha.23 2026-03-25 23:27:57 -06:00
Aiden McClelland
0719c227ee fix: deduplicate tor keys using BTreeMap in v0_3_6 migration 2026-03-25 23:24:44 -06:00
Aiden McClelland
621da47990 fix: import data drive before setup if not mounted 2026-03-25 23:24:35 -06:00
Aiden McClelland
9fa81a0c9d feat: add deploy job to startos-iso workflow 2026-03-25 23:24:25 -06:00
Matt Hill
2dac2bb2b3 restart after server name change 2026-03-25 21:15:07 -06:00
Matt Hill
58f1dc5025 mask password in ST 2026-03-25 17:29:19 -06:00
Matt Hill
cc89023bbd fix spinner alignment 2026-03-25 13:39:13 -06:00
Matt Hill
7e35ad57e7 tor http is secure 2026-03-25 13:31:53 -06:00
Aiden McClelland
010e439d1d fix: guard against null startCursor in logs component 2026-03-25 13:24:52 -06:00
Aiden McClelland
cdbb512cca fix: trim whitespace from package data version file 2026-03-25 13:24:35 -06:00
Aiden McClelland
bb2e69299e fix: only log WAN IP error when all echoip URLs fail 2026-03-25 13:24:18 -06:00
Aiden McClelland
fd0dc9a5b8 fix: silence journalctl setup error in init 2026-03-25 13:24:02 -06:00
Aiden McClelland
e2e88f774e chore: add i18n entries for new CLI args and commands 2026-03-25 13:22:47 -06:00
Aiden McClelland
4bebcafdde fix: tolerate setsid EPERM in subcontainer pre_exec
In TTY mode, pty_process already calls setsid() on the child before
our pre_exec runs. The second setsid() fails with EPERM since the
process is already a session leader. This is harmless — ignore it.
2026-03-25 10:31:29 -06:00
Aiden McClelland
2bb1463f4f fix: mitigate tokio I/O driver starvation (tokio-rs/tokio#4730)
Tokio's multi-thread scheduler has an unfixed vulnerability where all
worker threads can end up parked on condvars with no worker driving the
I/O reactor.  Condvar-parked workers have no timeout and sleep
indefinitely, so once in this state the runtime never recovers.

This was observed on a box migrating from 0.3.5.1: after heavy task
churn (package reinstalls, container operations, logging) all 16 workers
ended up on futex_wait with no thread on epoll_wait.  The web server
listened on both HTTP and HTTPS but never replied.  The box was stuck
for 7+ hours with 0% CPU.

Two mitigations:

1. Watchdog OS thread (startd.rs): a plain std::thread that every 30s
   injects a no-op task via Handle::spawn.  This forces a condvar-parked
   worker to wake, cycle through park, and grab the driver TryLock —
   breaking the stall regardless of what triggered it.

2. block_in_place in the logger (logger.rs): the TeeWriter holds a
   std::sync::Mutex across blocking file + stderr writes on worker
   threads.  Wrapping in block_in_place tells tokio to hand off driver
   duties before the worker blocks, reducing the window for starvation.
   Guarded by runtime_flavor() to avoid panicking on current-thread
   runtimes used by the CLI.
2026-03-25 10:14:03 -06:00
Aiden McClelland
f20ece44a1 chore: bump sdk version in container-runtime lockfile 2026-03-24 19:26:56 -06:00
Aiden McClelland
9fddcb957f chore: bump direct_io buffer from 256KiB to 1MiB 2026-03-24 19:26:56 -06:00
Aiden McClelland
fd502cfb99 fix: probe active block device before vg import cycle
When the target VG is already active (e.g. the running system's own
VG), probe the block device directly instead of going through the
full import/activate/open/cleanup sequence.
2026-03-24 19:26:56 -06:00
Aiden McClelland
ee95eef395 fix: mark backup progress complete unconditionally
Remove the backup_succeeded gate so the progress indicator updates
regardless of the backup outcome — the status field already captures
success/failure separately.
2026-03-24 19:26:56 -06:00
Aiden McClelland
aaa43ce6af fix: network error resilience and wifi state tracking
- Demote transient route-replace errors (vanishing interfaces) to trace
- Tolerate errors during policy routing cleanup on drop
- Use join_all instead of try_join_all for gateway watcher jobs
- Simplify wifi interface detection to always use find_wifi_iface()
- Write wifi enabled state to db instead of interface name
2026-03-24 19:26:55 -06:00
Aiden McClelland
e0f27281d1 feat: load bundled migration images and log progress during os migration
Load pre-saved container images from /usr/lib/startos/migration-images
before migrating packages, removing the need for internet access during
the v1→v2 s9pk conversion.  Add a periodic progress logger so the user
can see which package is being migrated.
2026-03-24 19:26:55 -06:00
Aiden McClelland
ecc4703ae7 build: add migration image bundling to build pipeline
Bundle start9/compat, start9/utils, and tonistiigi/binfmt container
images into the OS image so the v1→v2 s9pk migration can run without
internet access.
2026-03-24 19:26:55 -06:00
Aiden McClelland
d478911311 fix: restore chown on /proc/self/fd/* for subcontainer exec
The pipe-wrap binary guarantees FDs are always pipes (not sockets),
making the chown safe. The chown is still needed because anonymous
pipes have mode 0600 — without it, non-root users cannot re-open
/dev/stderr via /proc/self/fd/2.
2026-03-24 19:26:55 -06:00
Matt Hill
23fe6fb663 align checkbox 2026-03-24 18:57:19 -06:00
Matt Hill
186925065d sdk db backups, wifi ux, release notes, minor copy 2026-03-24 16:39:31 -06:00
Aiden McClelland
53dff95365 revert: remove websocket shutdown signal from RpcContinuations 2026-03-24 11:13:59 -06:00
Aiden McClelland
7f6abf2a80 Merge pull request #3140 from Start9Labs/fix/wifi
bugfixes for alpha.21
v0.4.0-alpha.22
2026-03-23 10:26:04 -06:00
Aiden McClelland
19fa1cb4e3 fix build 2026-03-23 10:12:15 -06:00
Matt Hill
521f61c647 bump sdk for republish 2026-03-23 09:45:16 -06:00
Matt Hill
3d45234aae fix password input for backups and add adjective noun randomizer 2026-03-23 08:58:37 -06:00
Aiden McClelland
f60a1a9ed0 fix: set backup progress complete atomically with status revert
Move BackupProgress { complete: true } into the same db.mutate() as the
DesiredStatus revert in the backup transition. Previously these were
separate mutations—the status would revert to Running before progress
showed complete, causing a visible gap in the UI.
2026-03-23 01:15:54 -06:00
Aiden McClelland
2aa910a3e8 fix: replace stdio chown with prctl(PR_SET_DUMPABLE) and pipe-wrap
After setuid, the kernel clears the dumpable flag, making /proc/self/
entries owned by root. This broke open("/dev/stderr") for non-root
users inside subcontainers. The previous fix (chowning /proc/self/fd/*)
was dangerous because it chowned whatever file the FD pointed to (could
be the journal socket).

The proper fix is prctl(PR_SET_DUMPABLE, 1) after setuid, which restores
/proc/self/ ownership to the current uid.

Additionally, adds a `pipe-wrap` subcommand that wraps a child process
with piped stdout/stderr, relaying to the original FDs. This ensures all
descendants inherit pipes (which support re-opening via /proc/self/fd/N)
even when the outermost FDs are journal sockets. container-runtime.service
now uses this wrapper.

With pipe-wrap guaranteeing pipe-based FDs, the exec and launch non-TTY
paths no longer need their own pipe+relay threads, eliminating the bug
where exec would hang when a child daemonized (e.g. pg_ctl start).
2026-03-23 01:14:49 -06:00
Aiden McClelland
8d1e11e158 fix: pg_dump/pg_restore permission errors in backup subcontainer
- Pre-create and chown dump file for postgres user before pg_dump
- Chown volume mountpoint to postgres before initdb on restore
- Add --no-privileges to pg_restore to skip GRANT/REVOKE for missing roles
2026-03-23 01:13:20 -06:00
Aiden McClelland
b7e4df44bf wip: subcontainer exec log drain via SCM_RIGHTS (reference only)
Implemented pipe FD handoff from exec to launch via Unix socket +
SCM_RIGHTS for grandchild log capture. Superseded by the simpler
PR_SET_DUMPABLE approach which eliminates the need for pipes entirely.
2026-03-22 23:58:14 -06:00