An attempt to do an audit of the code path for downloading RPMs (pulp 2.6-release, pulp_rpm 2.6-release, nectar 1.3.3-1) results in the following.
Principal suspicions so far:
- non-atomic `shutil.move` could silently collide with each other
- pre-move content validation may not notice corrupted content
- possible race condition on concurrent downloads
`shutil.move` may make the process of putting temp file into its final location non-atomic (between different FS) and thus unsafe unless proper locking is used.
Copy to the final location + concurrent downloads of the same RPM could be an issue.
Assuming that `shutil.move` can't silently fail there are few potential scenarios in which RPM file could be corrupted:
Scenario 1
- Upstream repo A and repo B which both contain the same RPM.
- RPM from both repos is synced nearly at the same time.
- Download of RPM from repo B started before download from repo A was finished (second download started because there is no reference for RPM in DB yet).
- Download from repo A succeeded, validation succeeded, file was moved to the final location and is not corrupted anyhow, reference to this RPM is saved into the DB.
- Download from repo B succeeded, validation succeeded, there was some problem during copy to the final location.
- Because the final location will be the same in both cases, the file could be corrupted at the end.
Scenario 2
- One repo, two nearly concurrent downloads of the same RPM from the same repo
- First download succeeded, validation succeeded but before it was copied to the final location second download started to write into the same file in the working directory.
- Possibly corrupted file is moved to the final location and saved into the DB.
Uncertainties:
- Both scenarios seem to me to be very unlikely but in most cases RPM were partially downloaded in case of some network problems, so the probability of the race conditions is getting higher.
- How to end up with concurrent downloads of the same RPM from the same repo?
-- I do not think it is possible if there is only one feed specified for the repo. I followed the code path and all the locks look good to me.
-- So alternate content sources are under more suspicion, several queues, one for each source, are used in this case. More investigation needed.
Suggestion/workaround for both scenarios so far:
Validate file at its final location (copy file from working directory into the temp name into the final directory, then validate and atomically rename it to the final name)
Any thoughts? or issues in my logic?
Verify RPM/SRPM/DRPM unit at its final location
closes #1823 https://pulp.plan.io/issues/1823