Commit graph

3 commits

Author SHA1 Message Date
Farhad H. P. Shirvan
d86e87ed30
Fix: traffic writer restart freeze (#4265)
* feat(traffic_writer): enhance traffic writer with concurrency safety and state management

* Revert "feat(traffic_writer): enhance traffic writer with concurrency safety and state management"

This reverts commit e6760ae396.

* feat(traffic_writer): enhance traffic writer with concurrency safety and state management

* feat(web): implement panel-only start/stop methods for in-process restarts
2026-05-12 11:36:05 +02:00
MHSanaei
8f3202f431
fix(traffic-writer): replace sync.Once with Start/Stop cycle so SIGHUP restart works
Some checks are pending
Release 3X-UI / build (386) (push) Waiting to run
Release 3X-UI / build (amd64) (push) Waiting to run
Release 3X-UI / build (arm64) (push) Waiting to run
Release 3X-UI / build (armv5) (push) Waiting to run
Release 3X-UI / build (armv6) (push) Waiting to run
Release 3X-UI / build (armv7) (push) Waiting to run
Release 3X-UI / build (s390x) (push) Waiting to run
Release 3X-UI / Build for Windows (push) Waiting to run
After a SIGHUP-driven panel restart (which is exactly what the frontend
triggers after a successful DB import via /panel/setting/restartPanel),
the previous implementation deadlocked:

1. server.Stop() called StopTrafficWriter — cancels the context and waits
   for the consumer goroutine to exit. The goroutine dies.
2. server.Start() called StartTrafficWriter, but sync.Once had already
   fired, so it was a no-op. twQueue still pointed to the old channel
   with no consumer.
3. startTask() → RestartXray(true) → GetXrayConfig() →
   InboundService.AddTraffic(nil, nil) → submitTrafficWrite. The send
   to twQueue succeeded (buffer space) but <-req.done blocked forever
   because no goroutine was draining the channel.
4. RestartXray held the global xray lock for the entire hang, so every
   subsequent restart attempt from the panel UI also blocked on
   lock.Lock(). User-visible symptom: xray stopped silently after DB
   import and no panel action could revive it.

Replace sync.Once with a mutex-guarded Start that spawns a fresh
goroutine on each cycle, and a Stop that resets the package state so
the next Start works. runTrafficWriter now takes its channels as
parameters instead of reading package vars, so the old goroutine can't
interfere with a new one if their lifetimes briefly overlap.
2026-05-11 16:01:04 +02:00
MHSanaei
8e7d215b4a
feat(nodes): traffic-writer queue, full-mirror sync, WS event fixes
- Traffic-writer single-consumer queue (web/service/traffic_writer.go)
  serialises every DB write that touches up/down/all_time/last_online
  (AddTraffic, SetRemoteTraffic, Reset*, UpdateClientTrafficByEmail) so
  overlapping goroutines can no longer clobber each other's column-scoped
  Updates with a stale tx.Save.

- DB pool: WAL + busy_timeout=10s + synchronous=NORMAL + _txlock=
  immediate, MaxOpenConns=8 / MaxIdleConns=4. The immediate-tx PRAGMA
  fixes residual "database is locked [0ms]" cases where deferred-tx
  writer-upgrade conflicts bypass busy_timeout.

- SetRemoteTraffic full-mirrors node-authoritative state into central:
  settings JSON, remark, listen, port, total, expiry, all_time, enable,
  plus per-client total/expiry/reset/all_time. Inbounds and
  client_traffics rows present on node but missing from central are
  created; rows missing from snap are deleted (with cascading
  client_traffics removal).

- NodeTrafficSyncJob detects structural changes from the mirror and
  broadcasts invalidate(inbounds) so open central UIs re-fetch via REST
  on node-side add/del/edit without manual refresh.

- XrayTrafficJob broadcasts invalidate(inbounds) when auto-disable flips
  client_traffics.enable so the per-client toggle reflects depletion
  without manual refresh.

- Frontend: inbounds page now subscribes to the BroadcastInbounds 'inbounds'
  WS event (full-list pushes from add/del/update controllers were silently
  dropped). Fixes invalidate payload field (dataType -> type). Restart-
  panel modal switched from Promise-wrap to onOk-only so Cancel actually
  cancels.

- Node files trimmed of stale prose-comments; cron cadence dropped
  10s -> 5s to match the inbounds page UX.

- README badges and Go module path bumped v2 -> v3 to match module rename.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 16:25:23 +02:00