Task #1935

Redesign the yum_repo_metadata_file model

Added by over 5 years ago. Updated over 2 years ago.

Start date:
Due date:
% Done:


Estimated time:
Platform Release:
Sprint Candidate:
Pulp 2


While investigating the productid problem Katello is experiencing, I took a look at the ``yum_repo_metadata_file`` model.

The unit key for this content unit is ('data_type', 'repo_id'). Obviously, this is not going to be unique. For a productid file, ``data_type = 'productid'``. If you happen to copy a second file of the same type into the repository, or if the upstream repository being synced has two files of the same type, Pulp will treat them as the same unit. This makes no sense.

When copying, this causes files to be overwritten in /var/lib/pulp/content/ silently (I suspect because there is "special" behaviour for this unit type). Underlying files will change and Pulp doesn't know and has no way to recover.

To see this in action:

1. Sync two repositories with productids, or upload two yum_repo_metadata_file content units.
2. Make a third repository: ``pulp-admin rpm repo create --repo-id temp``
3. Copy from the first repo: ``pulp-admin rpm repo copy metafile --from-repo-id el6-updates --to-repo-id temp``
4. Checksum the file associated with the unit in /var/lib/pulp/content
5. Copy from the second repo: ``pulp-admin rpm repo copy metafile --from-repo-id el7-updates --to-repo-id temp``
6. Checksum the file again (it's going to be the same path, probably ``/var/lib/pulp/content/units/yum_repo_metadata_file/62/7145ed1fac7340e78687188d35142824b140825a2e70ba3135317b8cf70961/productid.gz``, because the path is derived from the unit key)

For bonus points, inspect all the units in the database:

1. mongo pulp_database
2. db.units_yum_repo_metadata_file.find().pretty()

Note none of the units actually reference the new file anyway and actually reference the path for the unit it was copied from:

    "_id" : "4f55aa36-8b7b-48a3-a0dc-fc12aa3e414c",
    "pulp_user_metadata" : {

    "_last_updated" : 1463838950,
    "_storage_path" : "/var/lib/pulp/content/units/yum_repo_metadata_file/b1/5f7988ebaa45d1306b7b804d1229107c83f631305f1a1cc583347a220600c7/productid.gz",
    "downloaded" : true,
    "data_type" : "productid",
    "repo_id" : "el7-updates",
    "checksum" : "59e34cd37839bae40bc66079dccec6afdb781d27",
    "checksum_type" : "sha1",
    "_ns" : "units_yum_repo_metadata_file",
    "_content_type_id" : "yum_repo_metadata_file"
    "_id" : "68dd3a99-2d5a-4d63-bd20-84396e808898",
    "pulp_user_metadata" : {

    "_last_updated" : 1463839004,
    "_storage_path" : "/var/lib/pulp/content/units/yum_repo_metadata_file/8f/da3a9f092b0b717c48f8eda7f6f5deb4cefe0786216449d693023d5a9d8bae/productid.gz",
    "downloaded" : true,
    "data_type" : "productid",
    "repo_id" : "el6-updates",
    "checksum" : "276976ed33fe7502357c4502147656f6cd2d9a1e",
    "checksum_type" : "sha1",
    "_ns" : "units_yum_repo_metadata_file",
    "_content_type_id" : "yum_repo_metadata_file"
    "_id" : "fd1f4296-6ec9-4954-a289-03d7453d01a9",
    "pulp_user_metadata" : {

    "_last_updated" : 1463840904,
    "_storage_path" : "/var/lib/pulp/content/units/yum_repo_metadata_file/8f/da3a9f092b0b717c48f8eda7f6f5deb4cefe0786216449d693023d5a9d8bae/productid.gz",
    "downloaded" : true,
    "data_type" : "productid",
    "repo_id" : "temp",
    "checksum" : "276976ed33fe7502357c4502147656f6cd2d9a1e",
    "checksum_type" : "sha1",
    "_ns" : "units_yum_repo_metadata_file",
    "_content_type_id" : "yum_repo_metadata_file"

If you happen to orphan either one of this units, the storage of the other will be blown away during an orphan cleanup.

Related issues

Related to RPM Support - Issue #1944: YumMetadataFile copy does not save its new storage_pathCLOSED - CURRENTRELEASE<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>


#2 Updated by mhrivnak over 5 years ago

To answer one question that came up in discussion, modifyrepo_c (the standard tool for adding files to repomd.xml) will not let you have multiple entries with the same data type. If you try to add a second one, it replaces the original and gives you a warning.

$ touch foo
$ touch bar
$ modifyrepo_c foo repodata/
$ modifyrepo_c --mdtype=foo bar repodata/
C_CREATEREPOLIB: Warning: Record with type "foo" already exists in repomd.xml

#3 Updated by semyers over 5 years ago

  • Platform Release set to 2.8.4

#4 Updated by mhrivnak over 5 years ago

  • Related to Issue #1944: YumMetadataFile copy does not save its new storage_path added

#5 Updated by mhrivnak over 5 years ago

I tested the current design and found bug #1944. Other than that one bug, I have not been able to find any other incorrect behavior. Here are the things I tested:

Given a repo A with a feed to a remote yum repo that contains a YumMetadataFile, and repo B that is just a local pulp repo...

Test 1

  • sync A
  • copy from A to B
  • verify that two units exist in the DB with unique storage paths

Test 2

  • sync A
  • copy from A to B
  • delete B without an orphan remove
  • create a new repo B
  • copy from A to B
  • verify that only two units exist in the DB and have unique storage paths (the orphan was handled correctly)

Test 3

  • sync A
  • copy from A to B
  • change the file in the remote repo
  • sync A again
  • verify that A's unit references the new file
  • verify that B's unit references the old file
  • copy from A to B
  • verify that both units reference the new file but have unique copies

I did some other minor variations, but those are the three general workflows that exercise the required functionality. I wanted to verify that:

  • copy looks for an orphan and deletes it before creating a new unit
  • copy overwrites the file in the target repo if a unit already exists there
  • sync overwrites the file if it already exists
  • each unit has a unique copy of its file and references the correct storage path in the DB

#6 Updated by over 5 years ago

  • Priority changed from Urgent to High
  • Triaged changed from No to Yes

#7 Updated by semyers over 5 years ago

  • Platform Release changed from 2.8.4 to 2.8.5

#8 Updated by mhrivnak over 5 years ago

  • Platform Release deleted (2.8.5)

#9 Updated by Anonymous over 5 years ago

  • Tracker changed from Issue to Task
  • Subject changed from yum_repo_metadata_file model is fundementally flawed and leads to data corruption/loss to Redesign the yum_repo_metadata_file model
  • Priority changed from High to Normal

#10 Updated by bmbouter over 2 years ago

  • Status changed from NEW to CLOSED - WONTFIX

Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the developer mailing list.

#11 Updated by bmbouter over 2 years ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF