This guide is for operators upgrading from the previous oc-rsync release to vNEXT, the version that ships the parallel-deterministic-delete (DDP) pipeline, the opt-in async SSH transport, the opt-in tokio-based daemon listener, and a small set of Cargo feature flags that gate the new performance surfaces.
It calls out the behavioural changes vs prior versions, the flags that moved or disappeared, the opt-in switches that are new, the CI matrix changes that mean macOS and Windows are now first-class targets, and the rollback procedure for pinning to the previous release if a regression surfaces.
Architectural context for everything below is in
docs/architecture/session-overview-ddp-async-iouring.md.
The DDP specification is in
docs/design/parallel-deterministic-delete.md.
The async daemon and async SSH evaluations are in
docs/design/daemon-async-runtime-choice.md
and
docs/design/ssh-transport-async-io-eval.md.
No protocol changes. vNEXT speaks protocol 32, byte-for-byte identically with the previous release and with upstream rsync 3.4.1.
MSG_* envelopes, and exit codes are
unchanged from the previous release.crates/protocol/tests/golden/ continues to pass against upstream
rsync 3.0.9, 3.1.3, and 3.4.1.If your monitoring relies on parsing oc-rsync output, the only
observable change is the wall-clock ordering of *deleting itemize
lines under --delete-during (section 2). Everything else - message
text, error format, exit codes, role trailers, statistics summary -
is unchanged.
vNEXT replaces the previous batched pre-transfer delete sweep with a two-phase pipeline (parallel candidate compute on rayon, single emitter draining in upstream order). The final on-disk state is identical; what changes is the wall-clock event order and the interleave with the transfer loop.
| Mode | Previous behaviour | vNEXT behaviour |
|---|---|---|
--delete-before |
Single batched sweep before the transfer loop. | Single emitter drains the whole tree before the transfer loop. Same placement, deterministic per-directory order. |
--delete-during |
Single batched sweep before the transfer loop; itemize order non-deterministic above 64 entries. | Per-directory interleave with the transfer loop, matching upstream generator.c::generate_files() byte-for-byte. |
--delete-delay |
Same batched sweep, just deferred placement. | Per-segment plans buffered, replayed at finalisation in upstream order, mirroring do_delayed_deletions(). |
--delete-after |
Batched sweep after the transfer loop. | Single emitter drains after the transfer loop, deterministic per-directory order. |
*deleting itemize order changes under --delete-during and
becomes deterministic in every mode. Log scrapers that depend on
the previous arbitrary ordering must be updated to handle the new
upstream-identical order. The order is now:
per directory, entries in reverse f_name_cmp order; directories
in upstream depth-first traversal order.--delete-during now interleaves with transfers. Previously the
entire deletion sweep ran before any transfer. Now deletions and
transfers happen per-directory as upstream does. If an operator’s
workflow assumed deletions complete before any data is written
(rare; this was never documented), use --delete-before instead..rsync-filter merge
files loaded by enter_directory for a subtree are now honoured by
the deletion path for that subtree, matching upstream. Previously a
single chain snapshot was taken at sweep start.--max-delete semantics, exit code, and ordering of the
enforcement check.DeleteStats totals (files, dirs, symlinks, devices, specials) and
the NDX_DEL_STATS wire frame in the goodbye phase (protocol >= 31).*deleting itemize line format itself; only the order changes.--ignore-errors, --force, and --protect-args interaction with
deletion.--delete-strict-order removalThe opt-in --delete-strict-order / --no-delete-strict-order flags
introduced in the prior prerelease for #1940 are removed. Upstream
per-directory ordering is now the unconditional default for every
--delete-* mode.
--delete-strict-order, remove it.
The behaviour the flag selected is now the default and only
behaviour.--no-delete-strict-order to opt out of
the strict-order path, remove it as well. The legacy batched sweep
no longer exists; there is no off switch. The final on-disk state is
unchanged either way, so this should be a no-op for any successful
transfer.--help | grep will
no longer find delete-strict-order. Remove the check.Background: the historical design at
docs/design/delete-during-strict-order-gate.md
is marked SUPERSEDED. The replacement is the always-on DDP model in
docs/design/parallel-deterministic-delete.md.
vNEXT introduces six Cargo feature flags that gate new performance surfaces. None are enabled by default. None change wire bytes. All can be combined.
async-ssh (core, rsync_io)SshConnection
subprocess wrapper with an AsyncSshTransport built on
tokio::process::Command. The argv handed to ssh is byte-identical
(covered by execute_remote_rsync_argv_matches_sync_path).docs/design/ssh-transport-async-io-eval.md
section 2 (8 to 20% on RTT-bound or rotational-disk transfers,
thread-count reduction of 4x to 8x on fan-out workloads).cargo build --release --features async-ssh.async-daemon (daemon)spawn_blocking. Same max-connections semaphore, same shutdown
semantics, same panic isolation.std::thread::spawn per connection.cargo build --release --features async-daemon.parallel-receive-delta (transfer, experimental)ParallelDeltaPipeline infrastructure with a
threshold short-circuit, so above-threshold batches dispatch
multiple files in parallel through the reorder buffer.parallel_pipeline_wire_parity.rs, audit follow-up G2) and the
drain benchmark must be green for your build; the flag is
experimental until it flips default per the phased rollout in
docs/design/parallel-receive-delta-application.md
section 6.3.cargo build --release --features parallel-receive-delta.thread-slab-pool (engine)BufferPool with a depth-bounded LIFO slab per thread
(default 1 MiB byte cap). Cross-thread returns still fall through
to the central overflow queue.N_threads * byte_cap, so do not
enable on memory-constrained endpoints.cargo build --release --features thread-slab-pool.ssh-socketpair-stderr (rsync_io, experimental)socketpair(AF_UNIX, SOCK_STREAM, 0) constructed
via UnixStream::pair. The parent end is a bidirectional socket that
can be registered with epoll/kqueue (or tokio AsyncFd) and woken
out-of-band via shutdown(2), which is the seam SSE-4 uses to drive
the drain off a tokio task instead of a dedicated thread per
connection. The child still sees a plain stream of bytes on fd 2.
Capture semantics, line forwarding to host stderr, and the bounded
64 KiB ring buffer used by stderr_output() are unchanged.async-ssh and want the SSH stderr drain integrated into the same
tokio reactor as the wire path, instead of consuming a per-connection
blocking thread; long-running fan-out clients that open many
concurrent SSH children where the saved drain threads matter; and
any deployment that wants the larger default kernel buffer
(~208 KiB on Linux vs 64 KiB for pipes) and shutdown(SHUT_RD)
as the wake primitive for the drain loop. macOS works too, with
the same UnixStream::pair construction.lsof / procfs); Windows endpoints,
where the TCP-loopback shim is still in flight under SSE-5 and
falls back to Stdio::piped() on any error; sync-only SSH
deployments that do not link tokio, since the existing sync
transport already uses the socketpair when available and gains
nothing from the flag. The default-off ships exactly what master
shipped before the SSE series.cargo build --release -p rsync_io --features ssh-socketpair-stderr.
Combine with async-ssh to actually exercise the async drain path:
cargo build --release --features "async-ssh" -p core and
cargo build --release --features "ssh-socketpair-stderr" -p rsync_io.docs/design/socketpair-stderr-channel.md
(#2371). The companion stderr-handling audit that motivated the
series is in
docs/audits/ssh-stderr-handling.md
(#2370).vmsplice (fast_io, transfer, Linux only)vmsplice(2) +
splice(2). The trigger workload is kernel < 5.6 or io_uring
disabled, large literal tokens, and a splice-capable filesystem.cargo build --release --features vmsplice
(Linux only; no-op on other targets).SpillPolicy)The concurrent-delta receiver bounds its ReorderBuffer through a
process-wide SpillPolicy introduced in STN-1 (design) and STN-2
(struct). The default policy keeps the historical behaviour - no spill,
everything in memory, byte-equivalent to the previous release - so
existing operators see no behavioural change.
threshold_bytes is set, the receiver writes
the oldest-eligible items in the reorder window to a tempfile once
the in-memory footprint crosses the threshold and reloads them when
they reach the head of the delivery order. The on-disk format is a
length-prefixed binary payload; compression is opt-in.threshold_bytes = None (spill disabled),
dir = None (defers to std::env::temp_dir(3) via a 1 MiB
spooled tempfile), reclaim_mode = KeepInMemory,
granularity = WholeBatch, compression = None. The defaults
table and rationale are in
docs/design/spill-policy-public-api.md
section 2.SpillPolicy fields are
reachable through OC_RSYNC_SPILL_* env vars; precedence (highest
wins) is CLI flag > env var > programmatic policy > default.| Variable | Maps to | Accepted values |
|---|---|---|
OC_RSYNC_SPILL_THRESHOLD_BYTES |
threshold_bytes |
Integer with optional K/M/G suffix (case-insensitive, base 1024). Empty string clears. 0 is rejected. |
OC_RSYNC_SPILL_DIR |
dir |
Absolute or relative path. Created on first spill via create_dir_all. |
OC_RSYNC_SPILL_RECLAIM |
reclaim_mode |
keep (default) or re-spill. |
OC_RSYNC_SPILL_GRANULARITY |
granularity |
whole-batch (default) or per-item. |
OC_RSYNC_SPILL_COMPRESSION |
compression |
none (default), zstd, or zstd:LEVEL where LEVEL is in [-22, 22]. |
OC_RSYNC_SPILL_THRESHOLD_BYTES to a value below the cgroup
limit (typical starting point: 64 MiB) and point
OC_RSYNC_SPILL_DIR at a fast tmpfs or local SSD.OC_RSYNC_SPILL_RECLAIM=re-spill to keep the post-reload
footprint bounded under sustained pressure.OC_RSYNC_SPILL_COMPRESSION=zstd to trade CPU for disk
bandwidth; default level 3 is usually appropriate, raise to
zstd:7 only when the spill device is the bottleneck.OC_RSYNC_SPILL_GRANULARITY=per-item smooths the memory curve
at the cost of more syscalls per spill event.--spill-dir and --spill-threshold-bytes are
planned for STN-11 and will land in a future release; they shadow
the two highest-value env vars. The remaining three knobs stay
env-only.docs/design/spill-policy-public-api.md.
Spillable buffer internals:
docs/design/reorderbuffer-spill-to-tempfile.md.The feature flags are independent. A common production combination
for a high-concurrency Linux daemon endpoint is
--features async-daemon,thread-slab-pool. A common client-side
combination for high-RTT remote pulls is --features async-ssh.
Default builds remain tokio-free and ship every previous-release
behaviour unchanged.
vNEXT expands CI to include cross-OS coverage for the new feature flags plus dedicated macOS and Windows interop smoke harnesses. This is infrastructure-only; operators do not need to do anything, but it means users on macOS and Windows now see the same green-CI signal that Linux users have always seen.
The feature-flags-cross-os matrix runs four feature rows
(async, tracing, serde, concurrent-sessions) on
ubuntu-latest, macos-latest, and windows-latest (12 jobs).
Linux-only rows (io_uring, copy_file_range, crypto / deflate
backends) stay in the feature-flags-linux matrix.
interop (macOS) runs tools/ci/run_interop_smoke.sh against
Homebrew’s current upstream rsync (>= 3.4.x). Scenarios: baseline
upstream local copy, push, pull, quick-check no-op, delta both
directions, --list-only parity. Required check.interop (Windows, best-effort) validates oc-rsync.exe against
MSYS2/Cygwin upstream rsync for push, pull, and delta. Marked
continue-on-error until baseline parity is green; promotes to
required after that.The macos-test matrix now also runs the metadata and apple-fs
crates on every toolchain row, covering the Darwin acl_exacl
branch, the macOS timestamp path, and the AppleDouble + resource-fork
pipeline. Tests requiring root self-skip via geteuid();
xattr-dependent tests probe support and skip on filesystems that lack
it.
DDP and the new pool primitives change the shape of receiver-side performance vs the previous release. Wall-clock totals are unchanged to within noise on the common workloads (local copy, single-file push/pull); the differences appear at the tails.
Under the previous release, the entire deletion sweep ran in a single batched pre-transfer phase, with per-directory scans fanning out on rayon above 64 entries. The wall-clock cost of deletion was front-loaded.
Under vNEXT:
compute_extras
jobs run on rayon as INC_RECURSE segments arrive. The CPU cost of
scanning destination directories and intersecting with the filter
chain is amortised across the transfer loop instead of front-loaded.DeleteEmitter thread owns every
unlink, every *deleting line, and every DeleteStats mutation.
This guarantees byte-identical event order with upstream but caps
the emission throughput at single-thread speed.--delete-during with thousands of extras per directory) the
parallel compute typically completes before the emitter is
bottlenecked on unlink syscalls, so the net is neutral to slightly
positive. For deletion-light workloads the serial emitter is
trivially fast.unlink is slow
(NFS, FUSE, network filesystems) become the long pole earlier than
before. If you previously relied on parallel batched unlinks to
hide NFS latency, profile your workload; the deterministic single
emitter cannot parallelise across that filesystem call. For such
cases, consider running deletion as a pre-pass with
--delete-before (still serial, but moved out of the transfer
interleave) or running the transfer without --delete and
separately reconciling.Two new pool primitives ship in crates/fast_io/src/io_uring/:
SessionRingPool - bounded MPMC fleet
(min(available_parallelism(), 16) slots) for daemon-session
bursts. Amortises io_uring_setup(2) across many short-lived
sessions.ThreadLocalRingPool - one ring per OS thread for pinned
consumers (disk-commit, rayon workers). No locks on the submit/reap
fast path.These are additive primitives. Existing single-owner SharedRing
consumers (disk_batch, file_writer, file_reader) keep working
unchanged. Operators see no behavioural change; the wins are paid out
as consumers migrate to the new pools in subsequent releases.
thread-slab-pool (section 4) shifts buffer-pool memory from a
single shared queue to a per-thread slab. Steady-state idle memory
grows by N_threads * byte_cap (default 1 MiB). Operators running
with more than 32 worker threads per pool will see lower contention
and slightly higher RSS.
--acls/-A now works on Windows targets via
GetNamedSecurityInfoW/SetNamedSecurityInfoW, but the implementation
is a Tier 1C partial path. Operators migrating Windows workloads should
budget for the documented lossy cases before flipping -A on:
SE_SECURITY_NAME is held.rwx access bits (DELETE, WRITE_DAC, WRITE_OWNER, generic
bits) collapse to r/w/x plus SYNCHRONIZE on receive.The cross-platform payload remains byte-compatible with upstream rsync
and POSIX peers. The planned –windows-acls opt-in adds a higher-
fidelity SDDL payload over the existing xattr stream for Windows-to-
Windows transfers, and –fail-on-windows-acl-loss turns the lossy
cases into a hard failure (exit code 23) for environments that need to
preserve every NTFS ACE verbatim or abort. None of these three flags
ship in this release; track docs/design/windows-ntfs-acl-support.md
section 4 for the rollout schedule.
The full mapping matrix, hardlink-safe DACL application rules, and the
SDDL wire format details are in
docs/design/windows-ntfs-acl-support.md. The user-facing
–acls entry in docs/oc-rsync.1.md enumerates the lossy cases
alongside the flag synopsis.
If a regression surfaces in vNEXT, pin to the previous release. The wire protocol is unchanged, so a partial rollback (some clients new, some old; or client on one version, daemon on another) is safe.
cargo install oc-rsync --version <PREVIOUS_VERSION> --locked
Replace <PREVIOUS_VERSION> with the last known-good tag (e.g.
0.6.2). The --locked flag pins transitive dependencies to the
release’s Cargo.lock.
Download the platform binary from
https://github.com/oferchen/rsync/releases for the previous tag.
Replace /usr/local/bin/oc-rsync (or your install path) with the
downloaded binary. The binary is statically linked on Linux musl
targets; on macOS and Windows the platform-native build is used.
brew install oc-rsync@<PREVIOUS_VERSION> if the formula
publishes pinned versions; otherwise download the bottle from the
release page.oc-rsync = "=<PREVIOUS_VERSION>" in
Cargo.toml and rerun cargo update -p oc-rsync.--delete-during sweep. The --delete-strict-order opt-in flag
from the prior prerelease becomes available again.async-ssh,
async-daemon, parallel-receive-delta, ssh-socketpair-stderr,
thread-slab-pool, vmsplice) do not exist in earlier releases.
Builds that enabled them must drop the flag from the build command
when downgrading.If you trip a regression, capture:
oc-rsync --version on both ends.-vvv log from a minimal reproducer.Open an issue at https://github.com/oferchen/rsync/issues with those five pieces of information. Wire-level regressions are highest priority; performance regressions on the workloads in section 6 are next.
| Topic | Document |
|---|---|
| Session architectural overview | docs/architecture/session-overview-ddp-async-iouring.md |
| DDP specification | docs/design/parallel-deterministic-delete.md |
| Legacy strict-order gate (SUPERSEDED) | docs/design/delete-during-strict-order-gate.md |
| Delete architecture | docs/architecture/delete-during.md |
| SSH transport async I/O evaluation | docs/design/ssh-transport-async-io-eval.md |
| Daemon async runtime choice | docs/design/daemon-async-runtime-choice.md |
| Daemon async accept + sync workers | docs/design/daemon-async-accept-sync-workers.md |
| Parallel receive-side delta application | docs/design/parallel-receive-delta-application.md |
| SSH stderr socketpair channel | docs/design/socketpair-stderr-channel.md |
| SSH stderr handling audit | docs/audits/ssh-stderr-handling.md |
| Per-thread buffer slab | docs/design/per-thread-buffer-slab.md |
| vmsplice / splice zero copy | docs/design/splice-vmsplice-zero-copy.md |
| io_uring session ring pool | docs/design/iouring-session-ring-pool.md |
| io_uring per-thread rings | docs/design/iouring-per-thread-rings.md |
| Cross-platform CI coverage | docs/audits/cross-platform-ci-coverage.md |
| Windows NTFS ACL support | docs/design/windows-ntfs-acl-support.md |