Issue #1823: RPMs partially downloaded - RPM Support - Pulp

Actions

Send by e-mail Copy link

Issue #1823

closed

RPMs partially downloaded

Added by mhrivnak about 8 years ago. Updated about 5 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

High

Assignee:

Sprint/Milestone:

-

Start date:

Due date:

Estimated time:

Severity:

2. Medium

Version:

2.6.5

Platform Release:

2.12.0

OS:

Triaged:

Yes

Groomed:

No

Sprint Candidate:

No

Tags:

Pulp 2

Sprint:

Sprint 7

Quarter:

Description

There have been multiple reports of RPMs ending up in /var/lib/pulp/content/ with either 0 bytes, or partially downloaded. Looking at the 6.1 code, it is difficult to identify how that is possible.

You can see here that in pulp 2.6, an rpm does not get saved into the DB until after validation has happened, and the file has been moved into place without errors. Katello has assured us that they supply "validate: true" with each sync request, so validation should be happening.

https://github.com/pulp/pulp_rpm/blob/2.6-release/plugins/pulp_rpm/plugins/importers/yum/listener.py#L85

And yet, users are seeing this happen, so we need to investigate further.

Related issues

Actions

#1

Updated by mhrivnak about 8 years ago

Triaged changed from No to Yes

Actions

#3

Updated by Anonymous almost 8 years ago

Groomed changed from No to Yes
Sprint Candidate changed from No to Yes

Actions

#4

Updated by Anonymous almost 8 years ago

Sprint/Milestone set to 22
Groomed changed from Yes to No
Sprint Candidate changed from Yes to No

Actions

#5

Updated by mhrivnak almost 8 years ago

Sprint/Milestone changed from 22 to 23

Actions

#6

Updated by ttereshc almost 8 years ago

Status changed from NEW to ASSIGNED
Assignee set to ttereshc

Actions

#7

Updated by mhrivnak over 7 years ago

Sprint/Milestone changed from 23 to 24

Actions

#8

Updated by ttereshc over 7 years ago

An attempt to do an audit of the code path for downloading RPMs (pulp 2.6-release, pulp_rpm 2.6-release, nectar 1.3.3-1) results in the following.
Principal suspicions so far:
- non-atomic `shutil.move` could silently collide with each other
- pre-move content validation may not notice corrupted content
- possible race condition on concurrent downloads

`shutil.move` may make the process of putting temp file into its final location non-atomic (between different FS) and thus unsafe unless proper locking is used.
Copy to the final location + concurrent downloads of the same RPM could be an issue.
Assuming that `shutil.move` can't silently fail there are few potential scenarios in which RPM file could be corrupted:

Scenario 1
- Upstream repo A and repo B which both contain the same RPM.
- RPM from both repos is synced nearly at the same time.
- Download of RPM from repo B started before download from repo A was finished (second download started because there is no reference for RPM in DB yet).
- Download from repo A succeeded, validation succeeded, file was moved to the final location and is not corrupted anyhow, reference to this RPM is saved into the DB.
- Download from repo B succeeded, validation succeeded, there was some problem during copy to the final location.
- Because the final location will be the same in both cases, the file could be corrupted at the end.

Scenario 2
- One repo, two nearly concurrent downloads of the same RPM from the same repo
- First download succeeded, validation succeeded but before it was copied to the final location second download started to write into the same file in the working directory.
- Possibly corrupted file is moved to the final location and saved into the DB.

Uncertainties:
- Both scenarios seem to me to be very unlikely but in most cases RPM were partially downloaded in case of some network problems, so the probability of the race conditions is getting higher.
- How to end up with concurrent downloads of the same RPM from the same repo?
-- I do not think it is possible if there is only one feed specified for the repo. I followed the code path and all the locks look good to me.
-- So alternate content sources are under more suspicion, several queues, one for each source, are used in this case. More investigation needed.

Suggestion/workaround for both scenarios so far:
Validate file at its final location (copy file from working directory into the temp name into the final directory, then validate and atomically rename it to the final name)

Any thoughts? or issues in my logic?

Actions

#9

Updated by mhrivnak over 7 years ago

Related to Issue #2142: Units created with 0-byte files when sync runs out of disk space added

Actions

#10

Updated by mhrivnak over 7 years ago

That all seems reasonable. I like the idea of validating it in the final location but with a different filename, and doing the atomic rename.

I added an association to issue #2142, which may also reveal a possible cause. That one seems to be quite reproducible with docker content. I could not reproduce that behavior with an rpm sync, but it's possible that something has changed in the yum importer since 2.6 to explain it. Maybe whatever problem is present in the docker importer used to also affect the yum importer.

Actions

#11

Updated by mhrivnak over 7 years ago

Sprint/Milestone changed from 24 to 25

Actions

#12

Updated by ttereshc over 7 years ago

Blocked by Issue #2190: Unit is associated with the repo before it is copied to the final location added

Actions

#13

Updated by ttereshc over 7 years ago

Status changed from ASSIGNED to POST

https://github.com/pulp/pulp/pull/2730
https://github.com/pulp/pulp_rpm/pull/974

Added by ttereshc over 7 years ago

Revision 96fd275c | View on GitHub

Verify RPM/SRPM/DRPM unit at its final location

closes #1823 https://pulp.plan.io/issues/1823

Added by ttereshc over 7 years ago

Revision a583b757 | View on GitHub

Verify size of a file at its final location

re #1823 https://pulp.plan.io/issues/1823

Added by ttereshc over 7 years ago

Revision a583b757 | View on GitHub

Verify size of a file at its final location

re #1823 https://pulp.plan.io/issues/1823

Actions

#14

Updated by ttereshc over 7 years ago

Status changed from POST to MODIFIED

Applied in changeset 96fd275c2b8b3e6f1ce866a21af2ce47736c8f39.

Actions

#15

Updated by ttereshc over 7 years ago

Now we are doing an additional validation during copy to the final location of the file, so there will be errors during sync when files are corrupted. However, since root cause of the corrupted files is still unclear, this approach does not fix the root issue.

Actions

#16

Updated by semyers over 7 years ago

Platform Release set to 2.12.0

Actions

#17

Updated by semyers over 7 years ago

Status changed from MODIFIED to 5

Actions

#18

Updated by semyers about 7 years ago

Status changed from 5 to CLOSED - CURRENTRELEASE

Actions

#19

Updated by bmbouter about 6 years ago

Sprint set to Sprint 7

Actions

#20

Updated by bmbouter about 6 years ago

Sprint/Milestone deleted (25)

Actions

#21

Updated by bmbouter about 5 years ago

Tags Pulp 2 added

Actions

Send by e-mail Copy link

Also available in: Atom PDF