Project

Profile

Help

Issue #2066

closed

rsync distributor - retry on failed rsync commands

Added by rmcgover almost 8 years ago. Updated almost 5 years ago.

Status:
CLOSED - WONTFIX
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Master
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Could the rsync distributor please retry failed rsync commands a few times before raising a fatal error?

This distributor may be used to copy files from within pulp to an external hosting service. It would be useful if it could tolerate temporary issues in the network or the external hosting. rsync, being idempotent, seems naturally well-suited for retries.

Actions #1

Updated by rmcgover almost 8 years ago

Main rsync distributor story is https://pulp.plan.io/issues/1990 .

Actions #2

Updated by dkliban@redhat.com almost 8 years ago

The current rsync implementation does perform up to 10 retries, but only when erroring due to max-concurrent-connections limit being reached.

Could you please specify some specific errors rsync distributor should be prepared to handle?

Actions #3

Updated by rmcgover almost 8 years ago

I had two specific errors in mind when reporting:

Race conditions in concurrent rsync

If there are multiple rsync operations concurrently, and some of them are expected to create the same directory trees, then occasionally there'll be some "File exists" errors as one task created a directory that another task also intended to create.

e.g. from a shell:

[rmcgover@picallow ~]$ rm -rf /tmp/rsync-test ; rsync -avzR dir/a/b/c/d/e localhost:/tmp/rsync-test & rsync -avzR dir/a/b/c/d/ee localhost:/tmp/rsync-test & rsync -avzR dir/a/b/c/d/eee localhost:/tmp/rsync-test
[1] 21257
[2] 21258
sending incremental file list
sending incremental file list
rsync: mkdir "/tmp/rsync-test" failed: File exists (17)created directory /tmp/rsync-test

rsync error: error in file IO (code 11) at main.c(587) [Receiver=3.0.9]
dir/
dir/a/
dir/a/b/
dir/a/b/c/
dir/a/b/c/d/
dir/a/b/c/d/e/
dir/a/b/c/d/e/f/
sending incremental file list
dir/a/b/c/d/eee/
dir/a/b/c/d/eee/fff/

sent 164 bytes  received 40 bytes  136.00 bytes/sec
total size is 0  speedup is 0.00

sent 149 bytes  received 25 bytes  116.00 bytes/sec
total size is 0  speedup is 0.00
[1]-  Done                    rsync -avzR dir/a/b/c/d/e localhost:/tmp/rsync-test
[rmcgover@picallow ~]$ rsync: connection unexpectedly closed (9 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]

[2]+  Exit 12                 rsync -avzR dir/a/b/c/d/ee localhost:/tmp/rsync-test

However I couldn't say right now exactly what types of tasks in practice would be expected to trigger this error.

TCP connection stalled

Not sure exactly what this will look like for rsync's output and exit code. It probably manifests as exit code 30 "Timeout in data send/receive".

Actions #4

Updated by amacdona@redhat.com almost 8 years ago

  • Triaged changed from No to Yes
Actions #5

Updated by bmbouter almost 7 years ago

  • Tags RCM added
Actions #6

Updated by bmbouter about 5 years ago

  • Status changed from NEW to CLOSED - WONTFIX
Actions #7

Updated by bmbouter about 5 years ago

Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the developer mailing list.

Actions #8

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added
Actions #9

Updated by bmbouter almost 5 years ago

  • Tags deleted (RCM)

Also available in: Atom PDF