start-os

mirror of https://github.com/Start9Labs/start-os.git synced 2026-03-26 10:21:52 +00:00

Author	SHA1	Message	Date
Aiden McClelland	cdbb512cca	fix: trim whitespace from package data version file	2026-03-25 13:24:35 -06:00
Aiden McClelland	bb2e69299e	fix: only log WAN IP error when all echoip URLs fail	2026-03-25 13:24:18 -06:00
Aiden McClelland	fd0dc9a5b8	fix: silence journalctl setup error in init	2026-03-25 13:24:02 -06:00
Aiden McClelland	e2e88f774e	chore: add i18n entries for new CLI args and commands	2026-03-25 13:22:47 -06:00
Aiden McClelland	4bebcafdde	fix: tolerate setsid EPERM in subcontainer pre_exec In TTY mode, pty_process already calls setsid() on the child before our pre_exec runs. The second setsid() fails with EPERM since the process is already a session leader. This is harmless — ignore it.	2026-03-25 10:31:29 -06:00
Aiden McClelland	2bb1463f4f	fix: mitigate tokio I/O driver starvation (tokio-rs/tokio#4730 ) Tokio's multi-thread scheduler has an unfixed vulnerability where all worker threads can end up parked on condvars with no worker driving the I/O reactor. Condvar-parked workers have no timeout and sleep indefinitely, so once in this state the runtime never recovers. This was observed on a box migrating from 0.3.5.1: after heavy task churn (package reinstalls, container operations, logging) all 16 workers ended up on futex_wait with no thread on epoll_wait. The web server listened on both HTTP and HTTPS but never replied. The box was stuck for 7+ hours with 0% CPU. Two mitigations: 1. Watchdog OS thread (startd.rs): a plain std::thread that every 30s injects a no-op task via Handle::spawn. This forces a condvar-parked worker to wake, cycle through park, and grab the driver TryLock — breaking the stall regardless of what triggered it. 2. block_in_place in the logger (logger.rs): the TeeWriter holds a std::sync::Mutex across blocking file + stderr writes on worker threads. Wrapping in block_in_place tells tokio to hand off driver duties before the worker blocks, reducing the window for starvation. Guarded by runtime_flavor() to avoid panicking on current-thread runtimes used by the CLI.	2026-03-25 10:14:03 -06:00
Aiden McClelland	f20ece44a1	chore: bump sdk version in container-runtime lockfile	2026-03-24 19:26:56 -06:00
Aiden McClelland	9fddcb957f	chore: bump direct_io buffer from 256KiB to 1MiB	2026-03-24 19:26:56 -06:00
Aiden McClelland	fd502cfb99	fix: probe active block device before vg import cycle When the target VG is already active (e.g. the running system's own VG), probe the block device directly instead of going through the full import/activate/open/cleanup sequence.	2026-03-24 19:26:56 -06:00
Aiden McClelland	ee95eef395	fix: mark backup progress complete unconditionally Remove the backup_succeeded gate so the progress indicator updates regardless of the backup outcome — the status field already captures success/failure separately.	2026-03-24 19:26:56 -06:00
Aiden McClelland	aaa43ce6af	fix: network error resilience and wifi state tracking - Demote transient route-replace errors (vanishing interfaces) to trace - Tolerate errors during policy routing cleanup on drop - Use join_all instead of try_join_all for gateway watcher jobs - Simplify wifi interface detection to always use find_wifi_iface() - Write wifi enabled state to db instead of interface name	2026-03-24 19:26:55 -06:00
Aiden McClelland	e0f27281d1	feat: load bundled migration images and log progress during os migration Load pre-saved container images from /usr/lib/startos/migration-images before migrating packages, removing the need for internet access during the v1→v2 s9pk conversion. Add a periodic progress logger so the user can see which package is being migrated.	2026-03-24 19:26:55 -06:00
Aiden McClelland	ecc4703ae7	build: add migration image bundling to build pipeline Bundle start9/compat, start9/utils, and tonistiigi/binfmt container images into the OS image so the v1→v2 s9pk migration can run without internet access.	2026-03-24 19:26:55 -06:00
Aiden McClelland	d478911311	fix: restore chown on /proc/self/fd/* for subcontainer exec The pipe-wrap binary guarantees FDs are always pipes (not sockets), making the chown safe. The chown is still needed because anonymous pipes have mode 0600 — without it, non-root users cannot re-open /dev/stderr via /proc/self/fd/2.	2026-03-24 19:26:55 -06:00
Matt Hill	23fe6fb663	align checkbox	2026-03-24 18:57:19 -06:00
Matt Hill	186925065d	sdk db backups, wifi ux, release notes, minor copy	2026-03-24 16:39:31 -06:00
Aiden McClelland	53dff95365	revert: remove websocket shutdown signal from RpcContinuations	2026-03-24 11:13:59 -06:00
Aiden McClelland	7f6abf2a80	Merge pull request #3140 from Start9Labs/fix/wifi bugfixes for alpha.21 v0.4.0-alpha.22	2026-03-23 10:26:04 -06:00
Aiden McClelland	19fa1cb4e3	fix build	2026-03-23 10:12:15 -06:00
Matt Hill	521f61c647	bump sdk for republish	2026-03-23 09:45:16 -06:00
Matt Hill	3d45234aae	fix password input for backups and add adjective noun randomizer	2026-03-23 08:58:37 -06:00
Aiden McClelland	f60a1a9ed0	fix: set backup progress complete atomically with status revert Move BackupProgress { complete: true } into the same db.mutate() as the DesiredStatus revert in the backup transition. Previously these were separate mutations—the status would revert to Running before progress showed complete, causing a visible gap in the UI.	2026-03-23 01:15:54 -06:00
Aiden McClelland	2aa910a3e8	fix: replace stdio chown with prctl(PR_SET_DUMPABLE) and pipe-wrap After setuid, the kernel clears the dumpable flag, making /proc/self/ entries owned by root. This broke open("/dev/stderr") for non-root users inside subcontainers. The previous fix (chowning /proc/self/fd/*) was dangerous because it chowned whatever file the FD pointed to (could be the journal socket). The proper fix is prctl(PR_SET_DUMPABLE, 1) after setuid, which restores /proc/self/ ownership to the current uid. Additionally, adds a `pipe-wrap` subcommand that wraps a child process with piped stdout/stderr, relaying to the original FDs. This ensures all descendants inherit pipes (which support re-opening via /proc/self/fd/N) even when the outermost FDs are journal sockets. container-runtime.service now uses this wrapper. With pipe-wrap guaranteeing pipe-based FDs, the exec and launch non-TTY paths no longer need their own pipe+relay threads, eliminating the bug where exec would hang when a child daemonized (e.g. pg_ctl start).	2026-03-23 01:14:49 -06:00
Aiden McClelland	8d1e11e158	fix: pg_dump/pg_restore permission errors in backup subcontainer - Pre-create and chown dump file for postgres user before pg_dump - Chown volume mountpoint to postgres before initdb on restore - Add --no-privileges to pg_restore to skip GRANT/REVOKE for missing roles	2026-03-23 01:13:20 -06:00
Aiden McClelland	b7e4df44bf	wip: subcontainer exec log drain via SCM_RIGHTS (reference only) Implemented pipe FD handoff from exec to launch via Unix socket + SCM_RIGHTS for grandchild log capture. Superseded by the simpler PR_SET_DUMPABLE approach which eliminates the need for pipes entirely.	2026-03-22 23:58:14 -06:00
Aiden McClelland	25aa140174	fix: backup status reporting	2026-03-22 23:55:26 -06:00
Matt Hill	7ffb462355	better smtp and backups for postgres and mysql	2026-03-22 19:49:58 -06:00
Aiden McClelland	6ed0afc75f	chore: bump sdk to 0.4.0-beta.63	2026-03-22 14:13:28 -06:00
Aiden McClelland	cb7618cb34	fix: e2fsck exit codes 1-3 are non-fatal during btrfs conversion e2fsck returns 1 when errors are corrected and 2 when corrections require a reboot. These are expected during ext4→btrfs conversion. Only exit codes >= 4 indicate actual failure. Previously, .invoke() treated any non-zero exit as an error, causing the conversion to fail after successful filesystem repairs.	2026-03-21 18:20:55 -06:00
Aiden McClelland	456c5d6725	fix: graceful shutdown for subcontainer daemons Two issues fixed: 1. Process group cascade: exec-command processes inherited the container runtime's process group. When an entrypoint script did kill(0, SIGTERM) during shutdown, it signaled ALL processes in the group — including other subcontainers' launch wrappers, causing their PID namespaces to collapse. Fixed by calling setsid() in exec-command's pre_exec to isolate each service in its own process group. 2. Unordered daemon termination: removeChild("main") fired onLeaveContext callbacks for all Daemon.of() instances simultaneously, bypassing Daemons.term()'s reverse-dependency ordering. Fixed by having Daemons.build() mark individual daemons as managed (suppressing their onLeaveContext) and registering a single onLeaveContext that calls the ordered Daemons.term(). The term() method is deduplicated so system.stop() and onLeaveContext share the same shutdown.	2026-03-21 18:20:50 -06:00
Matt Hill	bdfa918a33	a bunch of UI cleanup around backups as well as other bug fixes and UII improvements	2026-03-21 16:32:46 -06:00
Aiden McClelland	8b65490d0e	feat: add progress step for btrfs conversion during setup/init	2026-03-20 19:32:41 -06:00
Aiden McClelland	c9a93f0a33	fix: rsync progress regex never matched, spamming logs during backup The regex used `$` (end-of-string anchor) instead of no anchor, so it never matched the percentage in rsync output. Every line, including empty ones, was logged instead of parsed.	2026-03-20 17:13:35 -06:00
Aiden McClelland	f5bfbe0465	Revert "fix: RunAction task re-evaluation compared against partial input, not full config" also apply alternative fix: only re-activate a task that explicitly conflicts with a run action's input This reverts commit `2999d22d2a`.	2026-03-20 16:35:09 -06:00
Aiden McClelland	8bccffcb5c	feat: add --arch flag to `start-cli registry package download` Use the new flag in the image build recipe to download the tor s9pk for the target architecture, replacing the standalone download script.	2026-03-20 15:28:45 -06:00
Aiden McClelland	9ff65497a8	fix: replace fire-and-forget restart loop in Daemon with tracked AbortController - Track the restart loop as an awaitable { abort, done } handle - Remove shouldBeRunning flag — signal.aborted serves the same purpose - Remove exiting field — term() awaits command termination inline - Guard start() on loop existence to prevent concurrent restart loops - Make backoff sleep abortable so term() returns immediately - Suppress error logging during intentional termination - Loop clears its own handle in finally block for natural exit (oneshot)	2026-03-20 14:31:46 -06:00
Aiden McClelland	7335e52ab3	fix: daemon lifecycle cleanup and error logging improvements - Refactor HealthDaemon to use a tracked session (AbortController + awaitable promise) instead of fire-and-forget health check loops, preventing health checks from running after a service is stopped - Stop health checks before terminating daemon to avoid false crash reports during intentional shutdown - Guard onExit callbacks with AbortSignal to prevent stale session callbacks - Add logErrorOnce utility to deduplicate repeated error logging - Fix SystemForEmbassy.stop() to capture clean promise before deleting ref - Treat SIGTERM (signal 15) as successful exit in subcontainer sync - Fix asError to return original Error instead of wrapping in new Error - Remove unused ExtendedVersion import from Backups.ts	2026-03-20 13:50:57 -06:00
Aiden McClelland	b54f10af55	fix: rsync backup bugs and optimize flags for encrypted CIFS targets - Fix restoreBackup using backupOptions instead of restoreOptions - Add missing await on preRestore/postRestore hooks - Remove -c (checksum) flag that forced full reads on every run - Add --partial to keep partially transferred files on interruption - Add --inplace to avoid temp-file+rename metadata churn - Add --timeout=300 to prevent hangs on stalled mounts	2026-03-20 11:56:53 -06:00
Matt Hill	0549c7c0ef	fix build	2026-03-20 08:50:54 -06:00
Matt Hill	2a8d8c7154	alpha.22	2026-03-20 08:37:36 -06:00
Andreas Schjønhaug	b9f2446cee	Fix Safari hard refresh instructions (#3141 )	2026-03-20 08:35:11 -06:00
Matt Hill	03d7d5f123	Merge branch 'fix/wifi' of github.com:Start9Labs/start-os into fix/wifi	2026-03-20 08:23:14 -06:00
Matt Hill	2fd674eca8	bump tor	2026-03-20 08:23:12 -06:00
waterplea	0e9c90f2c0	chore: fix icons in marketplace	2026-03-20 11:16:45 +04:00
Matt Hill	bca2e4d630	feat: add restart button to start-tunnel settings page Adds a VPS restart button to the settings page, above logout. Shows a spinner while the RPC completes, then a dialog telling the user to wait 1-2 minutes and refresh. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 23:27:21 -06:00
Alex Inkin	f41fc75024	chore: make service icons not round and add wifi lock badge (#3139 ) * chore: make service icons not round and add wifi lock badge * chore: comments	2026-03-19 18:18:25 -06:00
Matt Hill	56cb3861bc	fix build	2026-03-19 18:00:22 -06:00
Matt Hill	2999d22d2a	fix: RunAction task re-evaluation compared against partial input, not full config Bug: After running an action (e.g. bitcoin's autoconfig), update_tasks was called with the submitted form input — which for task-triggered actions is filtered to only the task's fields (e.g. {zmqEnabled: true}). Other services' tasks targeting the same action were then compared against this partial via is_partial_of, so any task wanting a field NOT in the submission (e.g. {blocknotify: "curl..."}) would incorrectly become active, even though the full config still satisfied it. This caused a cycling bug: running LND's autoconfig (zmqEnabled) would activate Datum's task (blocknotify), and vice versa, despite the merge correctly preserving both values in the config. Fix: After running an action, fetch the full current config via get_action_input (same as create_task and recheck_tasks already do) and compare tasks against that. The one-liner fix would have been to add a get_action_input call in the RunAction handler. Instead, we extracted eval_action_tasks on ServiceActorSeed — a single method that both RunAction and recheck_tasks now call — because the duplication between these two sites is exactly how this bug happened: recheck_tasks fetched the full config, RunAction didn't, and they silently diverged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 14:43:08 -06:00
Matt Hill	bb745c43cc	fix: createTask with undefined input values fails to create task Bug: Setting a task input property to undefined (e.g. { prune: undefined }) to express "this key should be deleted" resulted in no task being created. JSON.stringify strips undefined values, so { prune: undefined } serialized as {}, and is_partial_of({}, any_config) always returns true — meaning input-not-matches saw a "match" and never activated the task. Fix (two parts): - SDK: coerce undefined to null in task input values before serialization, so they survive JSON.stringify and reach the Rust backend - Rust: treat null in a partial as matching a missing key in the full config, so tasks correctly deactivate when the key is already absent Assumption: null and undefined/absent are semantically equivalent for StartOS config values. Input specs produce concrete values (strings, numbers, booleans, objects, arrays) — null never appears as a meaningful distinct-from-absent value in real-world configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 14:28:04 -06:00
Matt Hill	de9a7e4189	fix types	2026-03-19 13:38:40 -06:00

1 2 3 4 5 ...

3193 Commits