Project

Profile

Help

Issue #2328

Repository syncs show all units updated even when there are no changes

Added by jsherril@redhat.com about 5 years ago. Updated over 1 year ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.8.7
Platform Release:
2.10.1
OS:
CentOS 7
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 9
Quarter:

Description

This seems to have been introduced somewhere between 2.8.3 and 2.8.7, unsure where it was introduced.

When I sync a repository and then sync it again the 'result' shows an updated_count of 36 (there are 36 units in the repos). Previously it would show as 0 for the updated_count.

this messes with the optimizations that katello has added where we skip some steps if there is no new content.

Steps to reproduce:
1. Create and sync a repo
2. Sync the repo again
3. Look at the sync results

added_count: 0
removed_count: 0
updated_count: 36

Associated revisions

Revision f230a4a2 View on GitHub
Added by ipanova@redhat.com about 5 years ago

Repository syncs show all units updated even when there are no changes.

closes #2328 https://pulp.plan.io/issues/2328

Revision f230a4a2 View on GitHub
Added by ipanova@redhat.com about 5 years ago

Repository syncs show all units updated even when there are no changes.

closes #2328 https://pulp.plan.io/issues/2328

Revision 8902a331 View on GitHub
Added by ipanova@redhat.com about 5 years ago

Repository syncs show all units updated even when there are no changes.

closes #2328 https://pulp.plan.io/issues/2328

History

#1 Updated by jsherril@redhat.com about 5 years ago

  • Subject changed from Repository syncs show all rpms updated even when there are no changes to Repository syncs show all units updated even when there are no changes

#3 Updated by mhrivnak about 5 years ago

I just tried this with 2.8.7 upstream and was not able to reproduce. I used all default importer settings.

Can you provide the full set of importer config options used when you were able to reproduce?

#4 Updated by jsherril@redhat.com about 5 years ago

Here's the importer configuration:

[{"scratchpad"=>{"repomd_revision"=>1454123382, "previous_skip_list"=>[]}, "_href"=>"/pulp/api/v2/repositories/Default_Organization-syncfixtest-tester/importers/yum_importer/", "_ns"=>"repo_importers", "importer_type_id"=>"yum_importer", "last_sync"=>"2016-10-10T14:38:59Z", "repo_id"=>"Default_Organization-syncfixtest-tester", "_id"=>{"$oid"=>"57f79ecab0eaf22dc4394b31"}, "config"=>{"feed"=>"https://jlsherrill.fedorapeople.org/fake-repos/needed-errata/", "ssl_validation"=>true, "remove_missing"=>true, "download_policy"=>"immediate"}, "id"=>"yum_importer"}]

Since my initial report, i've tested on a different repo and was not able to reproduce (but still am able to reproduce on the repo specified in that importer). I will try some more.

#5 Updated by mhrivnak about 5 years ago

I tried this:

pulp-admin rpm repo create --repo-id=j --remove-missing=true --feed=https://jlsherrill.fedorapeople.org/fake-repos/needed-errata/

and then sync several times. The updated_count is always 0.

#6 Updated by mhrivnak about 5 years ago

I added "validate=true" since I remembered that katello sets that with each sync, and I set the download policy explicitly. It still shows an updated_count of 0 every time.

pulp-admin rpm repo create --repo-id=j --remove-missing=true --validate=true --download-policy=immediate --feed=https://jlsherrill.fedorapeople.org/fake-repos/needed-errata/

#7 Updated by jsherril@redhat.com about 5 years ago

Odd, i reproduced with just pulp-admin:

pulp-admin rpm repo create --repo-id=j --remove-missing=true --validate=true --download-policy=immediate --feed=https://jlsherrill.fedorapeople.org/fake-repos/needed-errata/

pulp-admin rpm repo sync run --repo-id=j

pulp-admin -vvv rpm repo sync run --repo-id=j

and this was the sync task json:

 {
  "exception": null, 
  "task_type": "pulp.server.managers.repo.sync.sync", 
  "_href": "/pulp/api/v2/tasks/f0fdc95b-1f94-4b2b-b0bd-189a244c0581/", 
  "task_id": "f0fdc95b-1f94-4b2b-b0bd-189a244c0581", 
  "tags": [
    "pulp:repository:j", 
    "pulp:action:sync"
  ], 
  "finish_time": "2016-10-10T15:22:44Z", 
  "_ns": "task_status", 
  "start_time": "2016-10-10T15:22:38Z", 
  "traceback": null, 
  "spawned_tasks": [
    {
      "_href": "/pulp/api/v2/tasks/cdb835eb-e5e6-46d7-8ae2-8be3eca84d09/", 
      "task_id": "cdb835eb-e5e6-46d7-8ae2-8be3eca84d09"
    }
  ], 
  "progress_report": {
    "yum_importer": {
      "content": {
        "size_total": 0, 
        "items_left": 0, 
        "items_total": 0, 
        "state": "FINISHED", 
        "size_left": 0, 
        "details": {
          "rpm_total": 0, 
          "rpm_done": 0, 
          "drpm_total": 0, 
          "drpm_done": 0
        }, 
        "error_details": []
      }, 
      "comps": {
        "state": "FINISHED"
      }, 
      "purge_duplicates": {
        "state": "FINISHED"
      }, 
      "distribution": {
        "items_total": 0, 
        "state": "FINISHED", 
        "error_details": [], 
        "items_left": 0
      }, 
      "errata": {
        "state": "FINISHED"
      }, 
      "metadata": {
        "state": "FINISHED"
      }
    }
  }, 
  "queue": "reserved_resource_worker-1@robot.example.com.dq", 
  "state": "finished", 
  "worker_name": "reserved_resource_worker-1@robot.example.com", 
  "result": {
    "importer_type_id": "yum_importer", 
    "importer_id": "yum_importer", 
    "exception": null, 
    "repo_id": "j", 
    "started": "2016-10-10T15:22:38Z", 
    "_ns": "repo_sync_results", 
    "completed": "2016-10-10T15:22:44Z", 
    "traceback": null, 
    "error_message": null, 
    "summary": {
      "content": {
        "state": "FINISHED"
      }, 
      "comps": {
        "state": "FINISHED"
      }, 
      "purge_duplicates": {
        "state": "FINISHED"
      }, 
      "distribution": {
        "state": "FINISHED"
      }, 
      "errata": {
        "state": "FINISHED"
      }, 
      "metadata": {
        "state": "FINISHED"
      }
    }, 
    "added_count": 0, 
    "result": "success", 
    "updated_count": 36, 
    "details": {
      "content": {
        "size_total": 0, 
        "items_left": 0, 
        "items_total": 0, 
        "state": "FINISHED", 
        "size_left": 0, 
        "details": {
          "rpm_total": 0, 
          "rpm_done": 0, 
          "drpm_total": 0, 
          "drpm_done": 0
        }, 
        "error_details": []
      }, 
      "comps": {
        "state": "FINISHED"
      }, 
      "purge_duplicates": {
        "state": "FINISHED"
      }, 
      "distribution": {
        "items_total": 0, 
        "state": "FINISHED", 
        "error_details": [], 
        "items_left": 0
      }, 
      "errata": {
        "state": "FINISHED"
      }, 
      "metadata": {
        "state": "FINISHED"
      }
    }, 
    "id": "57fbb244b0eaf209bccd6af8", 
    "removed_count": 0
  }, 
  "error": null, 
  "_id": {
    "$oid": "57fbb23ec14a969650adc0db"
  }, 

#8 Updated by jsherril@redhat.com about 5 years ago

I can also reproduce on 2.10.0-1 on a 2nd installation with pulp-admin

#9 Updated by amacdona@redhat.com about 5 years ago

  • Priority changed from Normal to High
  • Triaged changed from No to Yes

#10 Updated by jsherril@redhat.com about 5 years ago

Possible reproducer steps:

   pulp-admin rpm repo create --repo-id=j --remove-missing=true --validate=true --download-policy=immediate   --feed=https://jlsherrill.fedorapeople.org/fake-repos/needed-errata/
   pulp-admin  -vvv rpm repo  sync run --repo-id=j

Now clear your pulp database, but leave the content in /var/lib/pulp/content alone.

   pulp-admin rpm repo create --repo-id=j --remove-missing=true --validate=true --download-policy=background   --feed=https://jlsherrill.fedorapeople.org/fake-repos/needed-errata/
   pulp-admin  -vvv rpm repo  sync run --repo-id=j

    pulp-admin rpm repo create --repo-id=i --remove-missing=true --validate=true --download-policy=immediate   --feed=https://jlsherrill.fedorapeople.org/fake-repos/needed-errata/  --relative-url=foo
   pulp-admin  -vvv rpm repo  sync run --repo-id=i
   pulp-admin  -vvv rpm repo  sync run --repo-id=i

#11 Updated by ipanova@redhat.com about 5 years ago

The reproducer Justin suggested works.
I figured out the root cause.

It is enough to:

1 Create a repo with immediate download, sync and then drop the DB.
2 Re-create same repo but with lazy download policy( on_demand/background) and sync it.
3 Update the repo download policy to immediate and sync it. All subsequent syncs will show units_updated when no new content was added.

When a repo with lazy policy is created, downloaded=false is set on each unit and a storage path is calculated. Since we re-create same repo with same content, the unit with downloaded=false will point to already previously calculated storage path that already exists and obviously is not empty, because we dropped the DB but /var/lib/pulp/content remained untouched.
When we perform syncs in step 3, first we determine which units are missing by looking at the `downloaded` flag. Then before actually downloading content we check if there is some already existing by looking at the storage_path( and we do have such units). There is a gap in logic between the downloaded flag and storage path, we need to set the flag to true if we want to escape these magical syncs.

This is supposed to be the fix( though i did not do any thorough testing that it did not break anything else)

$ git diff
diff --git a/plugins/pulp_rpm/plugins/importers/yum/existing.py b/plugins/pulp_rpm/plugins/importers/yum/existing.py
index ea995bb..50257c6 100644
--- a/plugins/pulp_rpm/plugins/importers/yum/existing.py
+++ b/plugins/pulp_rpm/plugins/importers/yum/existing.py
@@ -124,6 +124,9 @@ def check_all_and_associate(wanted, conduit, download_deferred, catalog):
                     ids.TYPE_ID_RPM, ids.TYPE_ID_SRPM, ids.TYPE_ID_DRPM):
                 if unit._storage_path is None or not os.path.isfile(unit._storage_path):
                     continue
+                if not unit.downloaded:
+                    unit.downloaded = True
+                    unit.save()
             catalog.add(unit)
             repo_controller.associate_single_unit(conduit.repo, unit)
             values.discard(unit.unit_key_as_named_tuple)

#12 Updated by ipanova@redhat.com about 5 years ago

And it is reproducible on master , i bet on 2.10 also.

#13 Updated by mhrivnak about 5 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ipanova@redhat.com
  • Sprint/Milestone set to 27

#15 Updated by ipanova@redhat.com about 5 years ago

To test:

1. create repo with immediate policy, sync it
2. drop DB
3. re-create repo with on_demand policy, sync it( check added_count should be non zero)
4. check in the DB that downloaded flag on the units is set to false
5. trigger 'download repo' task(check updated_count should be non zero, added_count should be zero)
6. check in the DB that downloaded flag on the units is set to true
7. with the next sync added/updated count should zero

1. create repo with immediate policy, sync it
2. drop DB
3. re-create repo with background policy, sync it
4. check in the DB that downloaded flag on the units is set to true
5. sync repo again
6. check added_count and updated_count is 0

1. create repo with immediate policy, sync it
2. drop DB
3. re-create repo with on_demand policy, sync it( check added_count should be non zero)
4. check in the DB that downloaded flag on the units is set to false
5. update repo policy to immediate, sync it
6. check in the DB that downloaded flag on the units is set to true
7. sync repo again
8. check added_count and updated_count is 0

1. create repo with immediate policy, sync it
2. drop DB
3. re-create repo with on_demand policy, sync it
4. check in the DB that downloaded flag on the units is set to false
5. remove one unit from the repo, clean orphans, sync repo
6. check in the DB that downloaded flag on the units is set to true, except 1 unit that was synced in step 5

#16 Updated by ipanova@redhat.com about 5 years ago

  • Status changed from POST to MODIFIED

#18 Updated by ipanova@redhat.com about 5 years ago

  • Platform Release set to 2.10.1

I am not sure if i've put the correct target release @smyers

#19 Updated by semyers about 5 years ago

  • Status changed from MODIFIED to 5

#20 Updated by semyers about 5 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE

#22 Updated by bmbouter over 3 years ago

  • Sprint set to Sprint 9

#23 Updated by bmbouter over 3 years ago

  • Sprint/Milestone deleted (27)

#24 Updated by bmbouter over 2 years ago

  • Tags Pulp 2 added

#25 Updated by bmbouter over 1 year ago

  • Category deleted (14)

We are removing the 'API' category per open floor discussion June 16, 2020.

Please register to edit this issue

Also available in: Atom PDF