rsync

Session architectural overview: DDP, async transports, io_uring pools, CI expansion

This page is the architectural narrative for the work that landed in the session preceding it: the parallel-deterministic-delete (DDP) pipeline, the async SSH transport stack, the io_uring session and per-thread ring pools, the tokio-based daemon listener, and the cross-platform CI expansion. Each section names the modules that ship the behaviour and points at the design docs that justify the shape. It is intentionally a map, not a redesign; the design docs cited at the end of every section are the source of truth.

Executive summary

The shipped work pushes oc-rsync further along three independent axes without changing the wire protocol or the user-visible CLI surface. First, the deletion path is restructured into a two-phase pipeline (parallel candidate compute fanned out across rayon, single emitter draining in upstream order) so every --delete-* mode matches upstream 3.4.1 byte-for-byte by default while retaining internal parallelism. Second, an opt-in async SSH transport (async-ssh feature) and an opt-in tokio-based daemon listener (async-daemon feature) provide the high-concurrency I/O surfaces required for fan-out workloads, layered underneath the existing sync transfer engine via spawn_blocking bridges. Third, io_uring grew two complementary pool primitives - a bounded session ring pool keyed by SessionId and a thread-local pool for pinned consumers - so daemon-burst sessions no longer pay io_uring_setup(2) per connection and rayon-resident consumers can submit without locks. CI grew a cross-OS feature matrix plus macOS and Windows interop smoke harnesses so these surfaces are exercised on every target platform before they are promoted to default.


1. DDP pipeline (parallel-deterministic delete)

DDP replaces the previous batched pre-transfer sweep (delete_extraneous_files over a HashMap<PathBuf, HashSet<OsString>>) with a two-phase model that produces byte-identical wall-clock event order against upstream rsync 3.4.1 for every --delete-* mode. No new user-visible flag controls this; parity is the default. The legacy batched sweep no longer exists in the tree.

Phase split

   flist segment #N -----------+
                               v
                  +---------------------------+
                  | compute_extras  (rayon)   |   pure read_dir + filter snapshot
                  +---------------------------+
                               |
                  publish DeletePlan(D)
                               v
   flist segment #N+1 ---------+
                  ...                                       (parallel fan-in)
                               v
                  +---------------------------+
                  |  DeletePlanMap            |   keyed by relative dir path
                  +---------------------------+
                               |
                               v
                  +---------------------------+
                  |  DirTraversalCursor       |   upstream depth-first + f_name_cmp
                  +---------------------------+
                               |
                               v
                  +---------------------------+
                  |  DeleteEmitter (single)   |   unlink + itemize + DeleteStats++
                  +---------------------------+

Phase-mode integration

The emitter wiring observes the existing --delete-before / --delete-during / --delete-delay / --delete-after selector without code duplication:

The opt-in --delete-strict-order gate is gone (the flag and its 94 references were removed alongside the F3 sweep removal); parity is unconditional.

Source map

Design references


2. Async SSH transport stack

The SSH client transport now has two parallel implementations sharing one argv builder. The synchronous SshConnection (the production default) and the async AsyncSshTransport (opt-in via the async-ssh feature) both render identical bytes for a given (remote, args, config) triple; only the process backing differs.

Sync half (default)

Async half (--features async-ssh)

Embedded russh half (--features embedded-ssh)

Selection matrix

Build / feature Default Transport used by the client remote path
--no-default-features n/a SshConnection (sync subprocess), tokio not linked
Workspace default yes SshConnection (sync subprocess)
--features async-ssh no AsyncSshTransport (tokio-process subprocess); wired through the core remote dispatch
--features embedded-ssh no russh channel wrapped in SyncAsyncBridge / into_sync_halves
--features async-ssh,embedded-ssh no Caller selects; argv-equivalence and channel-bridge tests cover both

Default builds stay tokio-free at the CLI level; the runtime is only pulled in when one of the async or embedded gates is enabled.

Design references


3. io_uring topology: session pool vs per-thread pool

Two complementary primitives now coexist in crates/fast_io/src/io_uring/session_pool.rs. Both target the same problem (per-construction io_uring_setup(2) cost amortised across many consumers) along orthogonal axes.

SessionRingPool - bounded fleet, MPMC

ThreadLocalRingPool - one ring per OS thread

When to use which

Consumer profile Pool
Daemon connection bursts (short-lived sessions) SessionRingPool
Disk-commit thread (one per transfer, pinned) ThreadLocalRingPool
Rayon workers issuing fixed-buffer reads / writes ThreadLocalRingPool
Per-file readers / writers (legacy) Existing SharedRing or migrate to ThreadLocalRingPool per the migration list in the design doc

Existing single-owner SharedRing consumers (disk_batch, file_writer, file_reader) keep working unchanged and migrate one at a time as the bench evidence in #1410 / #4197 lands.

Other io_uring primitives in this slice

Design references


4. Daemon async listener (hybrid model)

The daemon now exposes a tokio-based accept loop behind the async-daemon Cargo feature (crates/daemon/Cargo.toml::async-daemon = ["dep:tokio"]). The model is intentionally hybrid: only the accept boundary is async; the existing synchronous transfer worker continues to own the wire protocol, filters, signature, and engine pipelines.

                              tokio runtime (rt-multi-thread, cap 8 workers)
                              +-----------------------------------------+
                              |                                         |
   incoming TCP ------------->| tokio::net::TcpListener::accept().await |
                              |                                         |
                              |   per-connection async task:            |
                              |     stream.into_std() + set_blocking    |
                              |     tokio::task::spawn_blocking(|| {    |
                              |         run_sync_worker(stream, ...)    |
                              |     }).await                            |
                              +-----------------------------------------+
                                                |
                                                v
                              +-----------------------------------------+
                              | blocking pool (size = max_conns + slack)|
                              |   existing sync transfer pipeline       |
                              |   (protocol, engine, transfer, filters) |
                              +-----------------------------------------+

Why hybrid

The blocking transfer engine is rayon-parallel and 100% sync today. Async-colouring protocol, engine, transfer, and checksums would be a long migration with no measured throughput win at the single-session level. The accept boundary is where async wins (many concurrent connections, signal handling, idle low-cost waits); the transfer body is where sync wins (blocking syscalls, rayon, no reactor starvation hazards). Splitting at exactly that boundary keeps both.

Design references


5. Cross-platform CI expansion

CI grew along two axes: a cross-OS feature-flag matrix for the rows the audit flagged as OS-agnostic but Linux-only, and dedicated macOS / Windows interop smoke harnesses against the platform’s native upstream rsync packaging.

New matrix rows

.github/workflows/_test-features.yml::feature-flags-cross-os runs the following rows on ubuntu-latest, macos-latest, and windows-latest (3 OS x 4 rows = 12 jobs):

Row name Scoped crates Feature(s)
async daemon, core, protocol, engine async
tracing daemon, core, engine tracing
serde logging, protocol, flist serde
concurrent-sessions daemon concurrent-sessions

Linux-only rows (io_uring, copy_file_range, the crypto / deflate backends) stay in the feature-flags-linux matrix; they overlap with the per-OS --all-features jobs already in ci.yml.

New interop jobs

macOS-specific additions

The macos-test matrix now also runs the metadata and apple-fs crates (-p metadata -p apple-fs) on every toolchain row, covering the Darwin acl_exacl branch, the macOS timestamp path (crates/metadata/src/apply/timestamps.rs), and the AppleDouble round-trip + resource-fork pipeline. Tests requiring root self-skip via geteuid(); xattr-dependent tests probe support and skip on filesystems that lack it.

Required vs informational

Per CLAUDE.md, the gating required checks are: fmt+clippy, nextest (stable), Windows (stable), macOS (stable), Linux musl (stable), plus the macOS interop smoke harness. The new cross-OS feature matrix and the Windows interop job are informational until they accumulate two release cycles of green runs, after which they flip to required.

Design references


6. How the pieces fit together

A pull transfer from a remote SSH endpoint, with --delete-during and async-ssh, executes the following pipeline:

  1. AsyncSshTransport::execute_remote_rsync spawns the ssh child via tokio; split() hands the receiver thread an (AsyncRead, AsyncWrite) pair that is bridged into the sync multiplex layer via the existing channel adapter (the inverse of embedded::sync_bridge::into_sync_halves).
  2. The receiver consumes INC_RECURSE flist segments. For each FLAG_CONTENT_DIR directory in a segment, receive_extra_file_lists posts a compute_extras job to rayon; the resulting DeletePlan lands in DeletePlanMap.
  3. The DeleteEmitter (single thread) walks DirTraversalCursor in upstream order, blocks until each plan is ready, issues unlink / rmdir / recursion, emits *deleting via the multiplex writer, and updates DeleteStats.
  4. The disk-commit thread owns a ThreadLocalRingPool ring for io_uring submissions; the per-file writer side reuses SharedRing until its migration to ThreadLocalRingPool lands.
  5. On a daemon endpoint compiled with --features async-daemon, the accept boundary is tokio (SessionRingPool per-session amortises io_uring setup) while the transfer body runs on the blocking pool exactly as it does today.

Every step above is covered by either the design docs cited in sections 1-4 or the CI matrices in section 5. None of the pieces require coordinated rollout: DDP is on by default, async SSH and the async daemon are opt-in, the io_uring pools are additive primitives that migrate consumers one at a time, and the CI expansion is infrastructure-only.


7. Index of design and audit documents

Topic Document
DDP specification docs/design/parallel-deterministic-delete.md
DDP F3 sweep removal readiness docs/design/ddp-f3-sweep-removal-readiness.md
Delete architecture (consumer view) docs/architecture/delete-during.md
SSH transport async I/O evaluation docs/design/ssh-transport-async-io-eval.md
Async SSH pipe wrapper docs/design/async-ssh-pipe-wrapper.md
Async SSH evaluation docs/design/async-ssh-evaluation.md
Async runtime SSH evaluation docs/design/async-runtime-ssh-eval.md
SSH async default flip (follow-up) docs/design/ssh-async-default-linux.md
SSH decouple delta from socket read docs/design/ssh-decouple-delta-from-socket-read.md
SSH explicit backpressure controls docs/design/ssh-explicit-backpressure-controls.md
io_uring session ring pool spec docs/design/iouring-session-ring-pool.md
io_uring session ring pool impl plan docs/design/iouring-session-ring-pool-impl.md
io_uring per-thread rings docs/design/iouring-per-thread-rings.md
io_uring rayon composition docs/design/io-uring-rayon-composition.md
io_uring rayon submission docs/design/iouring-rayon-submission.md
io_uring borrowed-slice consumer docs/design/iouring-borrowed-slice-consumer.md
io_uring socket daemon TCP readiness docs/design/iouring-socket-daemon-tcp-readiness.md
macOS kqueue fast I/O docs/design/macos-kqueue-fast-io.md
Daemon async runtime choice docs/design/daemon-async-runtime-choice.md
Daemon async accept + sync workers docs/design/daemon-async-accept-sync-workers.md
Daemon tokio async listener impl docs/design/daemon-tokio-async-listener-impl.md
Async migration plan (roadmap) docs/design/async-migration-plan.md
Tokio spawn_blocking + rayon bridge docs/design/tokio-spawn-blocking-rayon.md
Cross-platform CI coverage docs/audits/cross-platform-ci-coverage.md
Cross-platform parity matrix docs/audits/cross-platform-parity-matrix.md
Windows ACL / xattr CI matrix docs/audits/windows-acl-xattr-ci-matrix.md