fix: give StartOS UI interface a non-empty id
The iface object in StartOsUiComponent had id: '' (empty string).
Any plugin whose action calls sdk.serviceInterface.get() with
that id triggers an RPC to the host with an empty
serviceInterfaceId, which Rust's ServiceInterfaceId type rejects
via its ID regex (^[a-z0-9]+(-[a-z0-9]+)*$).
The container runtime appends the method name to every error
message as "${msg}@${method}", so the empty-string failure
surfaces in the UI as:
Action Failed: Deserialization Error: Invalid ID: @get-service-interface
Setting id: 'startos-ui' makes it a valid, stable identifier
that passes the regex and accurately names the interface.
The POSTROUTING MASQUERADE rules in forward-port failed to handle two
hairpin scenarios:
1. Host-to-target hairpin (OUTPUT DNAT): when sip is a WAN IP (tunnel
case), the old rule matched `-s sip` but the actual source of
locally-originated packets is a local interface IP, not the WAN IP.
Fix: use `-m addrtype --src-type LOCAL -m conntrack --ctorigdst sip`
to match any local source while tying the rule to the specific sip.
2. Same-subnet self-hairpin (PREROUTING DNAT): when a WireGuard peer
connects to itself via the tunnel's public IP, traffic is DNAT'd back
to the peer. Without MASQUERADE the response takes a loopback shortcut,
bypassing the tunnel server's conntrack and breaking NAT reversal.
Fix: add `-s dip/dprefix -d dip` to masquerade same-subnet traffic,
which also subsumes the old bridge_subnet rule.
Also bind the hairpin detection socket to the gateway interface and local
IP for consistency with the echoip client.
Using Ipv4Addr::UNSPECIFIED (0.0.0.0) as the local address with
SO_BINDTODEVICE caused bind(0.0.0.0:0) to fail with "Address in use"
on interfaces where port 443 was already in use. Binding to the
gateway's actual IPv4 address instead still forces IPv4 DNS filtering
while avoiding the kernel-level conflict.
fix: correct false breakage detection for flavored packages and config changes
Two bugs caused the UI to incorrectly warn about dependency breakages:
1. dryUpdate (version path): Flavored package versions (e.g. #knots:27.0.0:0)
failed exver.satisfies() against flavorless ranges (e.g. >=26.0.0) due to
flavor mismatch. Now checks the manifest's `satisfies` declarations,
matching the pattern already used in DepErrorService. Added `satisfies`
field to PackageVersionInfo so it's available from registry data.
2. checkConflicts (config path): fast-json-patch's compare() treated missing
keys as conflicts (add ops) and used positional array comparison, diverging
from the backend's conflicts() semantics. Replaced with a conflicts()
function that mirrors core/src/service/action.rs — missing keys are not
conflicts, and arrays use set-based comparison.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: unified restart notification with reason-specific messaging
Replace statusInfo.updated (bool) with serverInfo.restart (nullable enum)
to unify all restart-needed scenarios under a single PatchDB field.
Backend sets the restart reason in RPC handlers for hostname change (mdns),
language change, kiosk toggle, and OS update download. Init clears it on
boot. The update flow checks this field to prevent updates when a restart
is already pending.
Frontend shows a persistent action bar with reason-specific i18n messages
instead of per-feature restart dialogs. For .local hostname changes, the
existing "open new address" dialog is preserved — the restart toast
appears after the user logs in on the new address.
Also includes migration in v0_4_0_alpha_23 to remove statusInfo.updated
and initialize serverInfo.restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix broken styling and improve settings layout
* refactor: move restart field from ServerInfo to ServerStatus
The restart reason belongs with other server state (shutting_down,
restarting, update_progress) rather than on the top-level ServerInfo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix PR comment
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Aiden McClelland <me@drbonez.dev>
The BIOS_BOOT_TYPE_GUID constant had the wrong value, so
find_bios_boot_partition never matched the actual BIOS boot partition
created by the gpt crate. This caused it to appear as an available
backup target.
Split poll_ip_info into two phases: write IP info (addresses, subnets,
gateway, DNS, NTP) to the watch immediately, then fetch WAN IP in a
second pass. Previously the echoip HTTP fetch (5s timeout per URL)
blocked the write and was repeatedly cancelled by D-Bus signals during
interface activation, preventing the gateway from ever appearing.
Replace PolicyRoutingCleanup Drop with gc_policy_routing. The old Drop
spawned async route flushes that raced with new apply_policy_routing
calls when the watcher restarted on device_added, wiping freshly-created
routing tables for existing interfaces like eth0. Now policy routing is
managed idempotently by apply_policy_routing, and stale rules are
garbage-collected at the start of each watcher iteration.
- Fix parseInt callback in container-runtime to avoid extra map arguments
- Use proper error propagation in list_service_interfaces instead of unwrap_or_default
- Handle non-plain objects by reference in deepEqual
In TTY mode, pty_process already calls setsid() on the child before
our pre_exec runs. The second setsid() fails with EPERM since the
process is already a session leader. This is harmless — ignore it.
Tokio's multi-thread scheduler has an unfixed vulnerability where all
worker threads can end up parked on condvars with no worker driving the
I/O reactor. Condvar-parked workers have no timeout and sleep
indefinitely, so once in this state the runtime never recovers.
This was observed on a box migrating from 0.3.5.1: after heavy task
churn (package reinstalls, container operations, logging) all 16 workers
ended up on futex_wait with no thread on epoll_wait. The web server
listened on both HTTP and HTTPS but never replied. The box was stuck
for 7+ hours with 0% CPU.
Two mitigations:
1. Watchdog OS thread (startd.rs): a plain std::thread that every 30s
injects a no-op task via Handle::spawn. This forces a condvar-parked
worker to wake, cycle through park, and grab the driver TryLock —
breaking the stall regardless of what triggered it.
2. block_in_place in the logger (logger.rs): the TeeWriter holds a
std::sync::Mutex across blocking file + stderr writes on worker
threads. Wrapping in block_in_place tells tokio to hand off driver
duties before the worker blocks, reducing the window for starvation.
Guarded by runtime_flavor() to avoid panicking on current-thread
runtimes used by the CLI.
When the target VG is already active (e.g. the running system's own
VG), probe the block device directly instead of going through the
full import/activate/open/cleanup sequence.
Remove the backup_succeeded gate so the progress indicator updates
regardless of the backup outcome — the status field already captures
success/failure separately.
- Demote transient route-replace errors (vanishing interfaces) to trace
- Tolerate errors during policy routing cleanup on drop
- Use join_all instead of try_join_all for gateway watcher jobs
- Simplify wifi interface detection to always use find_wifi_iface()
- Write wifi enabled state to db instead of interface name
Load pre-saved container images from /usr/lib/startos/migration-images
before migrating packages, removing the need for internet access during
the v1→v2 s9pk conversion. Add a periodic progress logger so the user
can see which package is being migrated.
Bundle start9/compat, start9/utils, and tonistiigi/binfmt container
images into the OS image so the v1→v2 s9pk migration can run without
internet access.
The pipe-wrap binary guarantees FDs are always pipes (not sockets),
making the chown safe. The chown is still needed because anonymous
pipes have mode 0600 — without it, non-root users cannot re-open
/dev/stderr via /proc/self/fd/2.