Issue #2689
closedDon't use ssh connection sharing in rsync distributor
Description
In pulp platform, server/pulp/plugins/rsync/publish.py , ssh for rsync distributor is assembled like this:
cmd = ['ssh', '-l', user]
key = self.get_config().flatten()["remote"]['ssh_identity_file']
cmd += ['-i', key,
'-o', 'StrictHostKeyChecking no',
'-o', 'UserKnownHostsFile /dev/null',
'-S', '/tmp/rsync_distributor-%r@%h:%p',
'-o', 'ControlMaster auto',
'-o', 'ControlPersist 10']
Can the unconditional use of ControlMaster please be removed?
The usage of ControlMaster=auto here causes the following problem: if an ssh command is the ControlMaster, it won't exit until all clients using that control socket has exited.
Consider the following scenario to understand why this is an issue:
- there's an idle Pulp with 8 celery workers on one host
- many repo rsync publishes are enqueued
- worker 1 starts publishing with rsync distributor; when it starts rsync, it becomes the ControlMaster
- worker 2 through 8 start publishing other repos & connect to the ControlMaster
- worker 1 upload finishes, but ssh can't exit until the control socket is idle
- worker 1's task can't complete until workers 2 through 8 are all idle at the same time which is unlikely to happen until the entire queue of rsync publishes has completed
Overall, the impact of this is that one worker's rsync publish can unpredictably block completion on the publish from other workers, leading to less throughput due to resources being locked longer than they should, and misleading repo publish times.
I note that if the options relating to ControlMaster here were removed, it would still be possible for Pulp users to enable the feature by editing the ssh_config on the system where Pulp is deployed, if they have a use-case for that.
Updated by mhrivnak about 6 years ago
Just to confirm, is this all you're hoping to see?
diff --git a/server/pulp/plugins/rsync/publish.py b/server/pulp/plugins/rsync/publish.py
index 331eb42..d0da3cd 100644
--- a/server/pulp/plugins/rsync/publish.py
+++ b/server/pulp/plugins/rsync/publish.py
@@ -103,9 +103,7 @@ class RSyncPublishStep(PublishStep):
cmd += ['-i', key,
'-o', 'StrictHostKeyChecking no',
'-o', 'UserKnownHostsFile /dev/null',
- '-S', '/tmp/rsync_distributor-%r@%h:%p',
- '-o', 'ControlMaster auto',
- '-o', 'ControlPersist 10']
+ '-S', '/tmp/rsync_distributor-%r@%h:%p']
if args:
cmd += args
return cmd
Updated by rmcgover about 6 years ago
Yeah, that would be sufficient, (I believe the -S may as well be removed too, but would be harmless to leave it.)
Updated by ttereshc about 6 years ago
- Triaged changed from No to Yes
- Tags Easy Fix added
Updated by ipanova@redhat.com about 6 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ipanova@redhat.com
Added by ipanova@redhat.com about 6 years ago
Updated by ipanova@redhat.com about 6 years ago
- Status changed from ASSIGNED to POST
Updated by jortel@redhat.com about 6 years ago
- Sprint/Milestone changed from 37 to 38
Updated by ipanova@redhat.com about 6 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulp|80ff8c8a7f9459424e07731d02dfa193dae84df0.
Updated by bizhang about 6 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Don't use ssh connection sharing in rsync distributor.
closes #2689 https://pulp.plan.io/issues/2689