Issue #4731
closedOrder of data in PULP_MANIFEST returned by Pulp is different from feed url
Description
Ticket moved to GitHub: "pulp/pulp_file/612":https://github.com/pulp/pulp_file/issues/612
Order of data in PULP_MANIFEST returned by Pulp is different than what is provided by the synced repo.
1. Create a repository
2. Create a file remote - https://repos.fedorapeople.org/pulp/pulp/fixtures/file/
3. Sync the file remote
4. Create a file publisher
5. Create a publication
6. Create a distribution from the publication
7. Fetch PULP_MANIFEST
PULP_MANIFEST provided by feed url
'1.iso,cbd1d07a63f8ac122b7adf75658fc22f9754796f8bbcd9395f1bcc00bbc6e2d8,1024\n2.iso,7ab0ad049b044879b03d3bc5acbe4e43c98c359fe52a60475e6611ee55033646,1024\n3.iso,ddc5a9ac99a0cb546cce44be3da447c6e591df8d4860c592f3f1be6e33b66e62,1024\n'
PULP_MANIFEST downloaded from Pulp
'3.iso, ddc5a9ac99a0cb546cce44be3da447c6e591df8d4860c592f3f1be6e33b66e62, 1024\n2.iso, 7ab0ad049b044879b03d3bc5acbe4e43c98c359fe52a60475e6611ee55033646, 1024\n1.iso, cbd1d07a63f8ac122b7adf75658fc22f9754796f8bbcd9395f1bcc00bbc6e2d8, 1024\n'
See: https://github.com/pulp/pulp_file
This makes verifying the integrity of PULP_MANIFEST downloaded from Pulp a bit more complex.
Related issues
Updated by kersom over 5 years ago
- Related to Test #4519: Test 500 error while getting published metadata added
Updated by ttereshc over 5 years ago
I don't think it's a bug.
In general, it's not safe to rely on the order of metadata.
E.g. RPM packages in primary.xml can be in different order every time, depends on how createrepo_c handles it, not under Pulp's control.
For PULP_MANIFEST: each row has a specific format - data in a certain order separated by commas: relative_path, checksum, size.
The order of rows is not guaranteed to be preserved.
If we decide to publish in incremental way at some point in the future (adding metadata to the existing file), there will be no good way to preserve the order, even if we want to.
If this is needed for test purposes: split by newline and sort.
The inconsistency I can see which is potentially not good and inconvenient is that Pulp produces additional spaces between comma-separated values.
Updated by daviddavis over 5 years ago
I don't think the ordering is a bug either but I find it strange that we sort the files deterministically by when they are created[0]. If we sort at all it should be by filename but we sort by created as a way to eliminate duplicates in the manifest (see [1]).
I agree that the space after commas should be fixed. Also, users shouldn't rely on the ordering of the manifest file--we should probably focus our efforts on #4028 instead.
[0] https://github.com/pulp/pulp_file/blob/dd366601de3ae8741a7f0c2ee8f288f90f74d142/pulp_file/app/tasks/publishing.py#L73
[1] https://pulp.plan.io/issues/4028
Updated by ttereshc over 5 years ago
@daviddavis, the ordering by created date is a way to keep the most recently added file if there are duplicates.
+1 to focus on #4028
Updated by kersom over 5 years ago
My goal filing this one was to share and make people aware of it, I was not sure if it was a bug as well.
Updated by amacdona@redhat.com over 5 years ago
- Triaged changed from No to Yes
I think this is a problem isolated to the pulp_file plugin. Most plugins implement the API of some existing ecosystem, but pulp_file creates its own (the PULP_MANIFEST).
Since this API does not exist elsewhere, a section needs to be added to the pulp_file docs to explain what this manifest is, how it is structured (and that it is not ordered).
Updated by pulpbot about 3 years ago
- Description updated (diff)
- Status changed from NEW to CLOSED - DUPLICATE