12 KiB
Multi-Node Shared Control Design
Context
3x-ui already supports MariaDB as a database backend, but the runtime model remains single-node in practice: the panel reads persisted data, generates a local Xray configuration, and starts a local Xray process. Xray itself does not read MariaDB directly.
The target capability is a minimal multi-node control model where multiple VPS nodes share account definitions and aggregate traffic through MariaDB, while each node still runs its own local Xray instance.
This design intentionally keeps the existing panel pattern intact. It adds synchronization and role boundaries around the current database and Xray services instead of redesigning the runtime into a distributed cluster.
Goals
- Support multiple nodes sharing account definitions through MariaDB.
- Keep
masteras the only writer for shared account definitions. - Keep
workernodes running local Xray instances built from synchronized shared data. - Aggregate traffic from all nodes without last-write-wins corruption.
- Preserve survivability when MariaDB is temporarily unavailable by using local cache files.
- Minimize changes to the existing
3x-uiarchitecture and operator workflow.
Non-Goals
- No direct MariaDB access from Xray.
- No leader election or automatic role failover.
- No active-active shared account writes across nodes.
- No push-based real-time config propagation.
- No synchronization of panel-global settings such as panel port, web domain, TLS files, Telegram bot settings, or panel users.
- No first-pass frontend-only read-only redesign for worker nodes.
Accepted Scope
The first phase only synchronizes the shared-account surface:
- inbounds and inbound client definitions
- passwords and credential-bearing settings
- quota, expiry, enabled/disabled state
- aggregate upload and download counters
The first phase does not synchronize node-local operational settings or observability data beyond minimal sync status metadata.
Recommended Approach
Use a single-writer control model:
masteris the only node allowed to mutate shared account definitions.workernodes keep the existing panel pages, but shared-account write requests are rejected in the backend.- all nodes, including
master, continue to run local Xray instances and carry traffic - workers poll MariaDB for a shared account-set version and rebuild their local Xray state when that version changes
- all nodes accumulate local traffic deltas and periodically flush them back to MariaDB using atomic increments
This is the lowest-risk option because it keeps the current controller -> service -> database -> local Xray model intact while adding explicit control-plane boundaries.
Architecture
Runtime Model
The system remains node-local at runtime:
- each node runs its own
3x-uiprocess - each node generates its own local Xray configuration
- each node starts and restarts its own local Xray process
- MariaDB acts only as the shared control database
There is no direct node-to-node RPC. Synchronization happens indirectly through MariaDB and local background loops.
Role Boundaries
master responsibilities:
- accept and apply shared-account writes
- bump the shared account-set version after successful writes
- rebuild local Xray state immediately after successful writes
- flush local traffic deltas like any other node
worker responsibilities:
- reject shared-account write requests in the backend
- load synchronized shared account snapshots
- rebuild local Xray state from the synchronized snapshot
- flush local traffic deltas
Shared vs Local Data
Shared through MariaDB:
- shared inbound definitions
- shared client definitions
- shared quota and expiry state
- aggregate traffic totals
- synchronization metadata
Kept local to each node:
- panel port and path
- web domain and TLS files
- Telegram bot settings
- panel users and login state
- local cache and pending traffic files
- local logs and node-specific operational state
Configuration Model
Store node-control settings in /etc/x-ui/x-ui.json:
nodeRole:masterorworker, defaultmasternodeId: unique node identifier, required forworkersyncInterval: shared snapshot poll interval in seconds, default30trafficFlushInterval: traffic delta flush interval in seconds, default10
Validation rules:
nodeRolemust bemasterorworkerworkerrequires non-emptynodeIdworkerrequiresdbType = mariadbsyncIntervalmust be positivetrafficFlushIntervalmust be positive
Compatibility rules:
- legacy configs without these keys must continue to load with defaults
- existing grouped JSON layout rules for other settings must remain compatible
- switching roles is a config change plus restart, not a schema migration
Data Model
The first phase keeps shared business data in the existing tables and adds minimal synchronization metadata.
Shared Metadata Tables
shared_state
keyprimary keyversionmonotonic int64updated_at
Reserved key:
shared_accounts_version
node_state
node_idprimary keynode_rolelast_sync_atlast_heartbeat_atlast_seen_versionlast_errorupdated_at
Local Persistent Files
/etc/x-ui/shared-cache.json
- stores the last valid shared account snapshot
- primarily used by workers for startup and MariaDB outage survivability
- is not the system of record
/etc/x-ui/traffic-pending.json
- stores traffic deltas that have not yet been flushed successfully
- is used by all nodes
- prevents delta loss across retries or restarts
Versioning Strategy
The design uses a single account-set version instead of per-record versions.
Rule:
- whenever
mastersuccessfully changes shared account definitions, it incrementsshared_accounts_versionin the same transaction
This version answers only one question:
- has the shared account set changed since the node last synchronized
That keeps worker polling cheap and avoids per-row merge logic in the first phase.
Synchronization and Write Flow
Shared Account Write Flow
For shared-account writes such as add, update, delete, enable, disable, quota change, expiry change, or client mutation:
- request enters the existing controller/service path
- service layer enforces
RequireMaster() workerrequests are rejected before any database or Xray mutationmasterapplies the write- the same transaction increments
shared_accounts_version - after transaction commit,
masterrebuilds local Xray configuration immediately
The master node does not wait for polling to apply its own writes.
Worker Snapshot Sync Flow
Workers synchronize on startup and on a fixed interval:
- load
shared-cache.jsonif present - perform an immediate version check against MariaDB
- if the shared version is newer, fetch the full shared account snapshot
- persist the snapshot to
shared-cache.json - rebuild local Xray configuration from that snapshot
- update
node_statewith sync status
During steady state:
- poll
shared_accounts_versioneverysyncInterval - if unchanged, only update node heartbeat/sync status
- if changed, fetch the full snapshot and rebuild local Xray state
Traffic Flush Flow
All nodes, including master, follow the same traffic accounting pattern:
- accumulate local per-account upload and download deltas
- persist pending deltas locally
- flush pending deltas every
trafficFlushInterval - apply deltas in MariaDB using atomic increments
- remove flushed deltas from local pending state after success
Required rule:
- nodes must never write absolute aggregate totals back to MariaDB
Forbidden behavior:
- reading the current total, adding locally, then overwriting the row
- allowing concurrent nodes to race with last-write-wins semantics
Trigger Timing
Synchronization is triggered by control-state changes and fixed intervals, not by direct node-to-node notifications.
Triggers:
mastershared-account write success: bump version in-transaction and rebuild local Xray immediately- worker startup: load local cache, then perform an immediate version check
- worker periodic sync: poll version every
syncInterval - all nodes periodic traffic flush: flush deltas every
trafficFlushInterval - optional best-effort shutdown flush: attempt one final delta flush on graceful shutdown
There is no direct push from master to worker in the first phase.
Failure Handling
If MariaDB is temporarily unreachable:
- workers continue using the last valid shared snapshot
- traffic deltas remain in local pending storage
- shared-account writes cannot proceed
If shared-cache.json is missing or invalid and MariaDB is unreachable at worker startup:
- do not construct a speculative snapshot
- preserve the last successfully loaded runtime state if one already exists
- record the error in node status
If snapshot fetch succeeds but local Xray rebuild fails:
- keep the last known-good local runtime configuration active
- record the failure in
node_state.last_error - retry on the next synchronization cycle
If master writes shared state successfully but its own local rebuild fails:
- MariaDB remains the source of truth with the new version
- other nodes still synchronize from that committed state
masterrecords the local rebuild error for operator action
If a traffic flush fails:
- keep the delta in pending storage
- retry later
- never drop pending deltas on flush failure
Testing Strategy
Config Tests
Cover:
- legacy config defaults
- invalid
nodeRole workerwithoutnodeIdworkerwithsqlite- non-positive interval validation
Service Tests
Cover:
RequireMaster()enforcement- version bumping on successful shared-account writes
- worker no-op behavior when version is unchanged
- worker snapshot fetch and rebuild trigger when version changes
- traffic flush success and retry behavior
Database Tests
Cover:
- migration of
shared_stateandnode_state - seeding of
shared_accounts_version - atomic increment semantics for traffic totals
Verification Expectations
Phase-one acceptance:
masterwrites are visible to workers within one poll cycle- worker-side shared-account writes are rejected
- concurrent node traffic flushes do not lose aggregate totals
- workers continue operating from the last valid cache during temporary MariaDB outages
Operational Visibility
Expose minimal operator-visible node information:
nodeRolenodeIdsyncIntervaltrafficFlushInterval
Record minimal per-node status in node_state:
- last sync time
- last heartbeat time
- last seen version
- last error
This is sufficient for the first phase to answer:
- whether a worker is lagging behind the current shared version
- whether a node is alive but failing to synchronize
- whether traffic flush retries are accumulating due to database problems
Implementation Boundary
Expected change areas:
- config loading and validation
- startup validation and CLI setters
- database model registration and metadata helpers
- service-layer master-only guards
- worker sync and cache services
- traffic pending and flush services
- startup wiring in the web server
- operator-facing shell scripts and docs
Areas intentionally unchanged in the first phase:
- Xray protocol implementation
- local Xray process ownership model
- panel-global settings model
- distributed leadership or push-based synchronization
Summary
This design adds a minimal shared control plane to 3x-ui without turning the system into a distributed cluster. master owns shared account writes, all nodes keep local Xray ownership, workers rebuild from synchronized snapshots, and traffic returns to MariaDB only as atomic deltas. The result is a bounded first phase that matches the existing architecture and keeps future extension paths open.