Project

Profile

Help

Issue #2532

closed

rsync distributor without force_full incorrectly skips publishing some units

Added by rmcgover over 7 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Master
Platform Release:
2.13.3
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 21
Quarter:

Description

In pulp.plugins.rsync.publish.Publisher#__init__ method there is following logic:

if self.is_fastforward():
    start_date = self.last_published
    end_date = None
    if self.predistributor:
        end_date = self.predistributor["last_publish"]
    date_filter = self.create_date_range_filter(start_date=start_date, end_date=end_date)

This code calculates a date range for the distributor to process.
In summary, it will only process units associated to the repo between the last publish of the rsync distributor, and the last publish of the predistributor.

That seems to be incorrect. If association and publish is done in a certain order, this can cause units to be permanently lost from the publish (until a publish is explicitly done with "force_full").

Using a yum repo as an example, here's a sequence of events which demonstrates the problem:

  • Trigger yum publish.
  • Time A: yum publish completes
  • Time B: associate x.rpm into yum repo
  • Trigger rsync publish
  • Time C: rsync publish completes
    (Note: this publish will not include x.rpm since yum publish hasn't happened for that unit yet)
  • Trigger yum publish
  • Time D: yum publish completes
  • Trigger rsync publish, wait for it to complete

Expected result: after last step, repository is fully published, including x.rpm

Actual result: x.rpm is still not published, because rsync distributor only processed units associated between time C and D. Republishing won't fix it. Explicitly publishing with force_full: True will fix it.

Note I haven't attempted to reproduce this, the bug report is based on code review of latest master ( 6fc2861fd14793f8461d232cb641b5112d271519 ).


Files

IMG_20170504_142218506.jpg (3.07 MB) IMG_20170504_142218506.jpg dalley, 05/04/2017 08:29 PM
Actions #1

Updated by bizhang about 7 years ago

  • Priority changed from Normal to High
  • Sprint/Milestone set to 32
  • Severity changed from 2. Medium to 3. High
  • Triaged changed from No to Yes
Actions #2

Updated by daviddavis about 7 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis
Actions #3

Updated by daviddavis about 7 years ago

  • Blocked by Issue #2550: Publishing via rsync does not correctly look at publish records added
Actions #4

Updated by daviddavis about 7 years ago

  • Blocked by deleted (Issue #2550: Publishing via rsync does not correctly look at publish records)
Actions #6

Updated by daviddavis about 7 years ago

  • Status changed from ASSIGNED to POST
Actions #7

Updated by mhrivnak about 7 years ago

  • Sprint/Milestone changed from 32 to 33
Actions #8

Updated by mhrivnak about 7 years ago

  • Sprint/Milestone changed from 33 to 34
Actions #9

Updated by mhrivnak about 7 years ago

  • Sprint/Milestone changed from 34 to 36
Actions #10

Updated by dalley about 7 years ago

  • Status changed from POST to ASSIGNED
  • Assignee changed from daviddavis to dalley

I am taking ownership of this issue after discussing with @daviddavis

Actions #11

Updated by mhrivnak about 7 years ago

  • Sprint/Milestone changed from 36 to 37
Actions #12

Updated by dalley almost 7 years ago

rmcgover, I was not able to reproduce this issue using this sequence of events on the commit provided (6fc2861fd). Here is the steps I used to test within our vagrant development environment:

dnf download cowsay

pulp-admin rpm repo sync run --repo-id zoo
phttp POST https://localhost/pulp/api/v2/repositories/zoo/distributors/ < rsync_distributor.json
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=yum_distributor
pulp-admin rpm repo uploads rpm --repo-id zoo --file cowsay-3.04-4.fc25.noarch.rpm
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=my_rpm_rsync_distributor
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=yum_distributor
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=my_rpm_rsync_distributor

where:

rsync_distributor.json
===================


{
    "distributor_id": "my_rpm_rsync_distributor",
    "distributor_type_id": "rpm_rsync_distributor",
    "distributor_config": {
        "remote": {
            "auth_type": "publickey",
            "ssh_user": "vagrant",
            "ssh_identity_file": "/home/vagrant/.ssh/id_rsa",
            "host": "dev.example.com",
            "root": "/home/vagrant/pulp_root"
        },
        "predistributor_id": "yum_distributor"
    }
}
  • There are two vagrant boxes dev.example.com and dev2.example.com
  • Commands are being run on dev2.example.com, dev.example.com is used as the target for distributing files
  • "phttp" is aliased to "http --verify no --cert ~/.pulp/user-cert.pem"

When taking the above steps, the cowsay package gets distributed in the first rsync publish after associating the package with the repository.

I will look into it a bit further to ensure that something is indeed covering for this scenario

Actions #13

Updated by jortel@redhat.com almost 7 years ago

  • Sprint/Milestone changed from 37 to 38
Actions #14

Updated by rmcgover almost 7 years ago

I found two reasons why that sequence won't work to reproduce:

1) It's a special case of the first usage of rsync distributor on the repo after it was created,
and that will always be treated as a non-fastforward publish (and rightly so).

2) There is an additional bug in rsync distributor which interferes with the test.
If a unit has never been deleted from the repo, fast-forward publish is not used.
This is https://pulp.plan.io/issues/2666 .

Taking these into account, I could reproduce the issue with the following modified sequence:

# started with clean pulp
pclean ; ppopulate

dnf download cowsay

pulp-admin rpm repo sync run --repo-id zoo
phttp POST https://localhost/pulp/api/v2/repositories/zoo/distributors/ < rsync_distributor.json

# ensure an RPM has been removed from the repo (required to avoid https://pulp.plan.io/issues/2743 )
pulp-admin rpm repo remove rpm --repo-id zoo --str-eq 'filename=kangaroo-0.2-1.noarch.rpm'

# ensure both distributors are fully published

phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=yum_distributor
# and wait until the publish task completes

phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=my_rpm_rsync_distributor
# and wait until the publish task completes

pulp-admin rpm repo uploads rpm --repo-id zoo --file cowsay-3.04-4.fc25.noarch.rpm

phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=my_rpm_rsync_distributor
# and wait until publish completes
# expected behavior: it is acceptable for cowsay not to be published
# actual behavior: cowsay is not published

phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=yum_distributor
# and wait until publish completes

phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=my_rpm_rsync_distributor
# expected behavior: cowsay must be published
# actual behavior: cowsay is not published
Actions #15

Updated by dalley almost 7 years ago

I made a test matrix for the patch that comes out of this. A check means that the cowsay package has been or should be published. The column on the left is expected results, the column on the right is actual results, circled results are inconsistent with expectations.

The fix for #2666 has been merged so that should no longer be an issue.

I will make sure this tests well in all of the scenarios listed and also try to get some of them automated so that it doesn't get broken in the future. Unfortunately the code is a bit deceptively complicated due to the interaction of the various special cases.

Actions #16

Updated by dalley almost 7 years ago

  • Sprint/Milestone changed from 37 to 38
Actions #17

Updated by bmbouter almost 7 years ago

  • Tags RCM added
Actions #18

Updated by mhrivnak almost 7 years ago

  • Sprint/Milestone changed from 38 to 39
Actions #20

Updated by dalley almost 7 years ago

  • Status changed from ASSIGNED to POST

Added by dalley almost 7 years ago

Revision f891bf1d | View on GitHub

Fix rsync distributor skipping units

Corrects the rsync distributor fast-forward logic so that it uses the proper date range to determine which units to include. This fixes an issue where units were not being published if the association, predistributor publish and rsync publish occurred in a certain order.

closes #2532 https://pulp.plan.io/issues/2532

Added by dalley almost 7 years ago

Revision 3bbf6f64 | View on GitHub

Fix rsync distributor skipping units

Corrects the rsync distributor fast-forward logic so that it uses the proper date range to determine which units to include. This fixes an issue where units were not being published if the association, predistributor publish and rsync publish occurred in a certain order.

closes #2532 https://pulp.plan.io/issues/2532

Actions #21

Updated by mhrivnak almost 7 years ago

  • Sprint/Milestone changed from 39 to 40
Actions #22

Updated by dalley almost 7 years ago

  • Status changed from POST to MODIFIED

Added by dalley almost 7 years ago

Revision 60133628 | View on GitHub

Fix rsync distributor skipping units

Corrects the rsync distributor fast-forward logic so that it uses the proper date range to determine which units to include. This fixes an issue where units were not being published if the association, predistributor publish and rsync publish occurred in a certain order.

closes #2532 https://pulp.plan.io/issues/2532

(cherry picked from commit 3bbf6f64fafa78ad7010ff26b65ed2f4817e224e)

Added by dalley almost 7 years ago

Revision f524040d | View on GitHub

Fix rsync distributor skipping units

Corrects the rsync distributor fast-forward logic so that it uses the proper date range to determine which units to include. This fixes an issue where units were not being published if the association, predistributor publish and rsync publish occurred in a certain order.

closes #2532 https://pulp.plan.io/issues/2532

(cherry picked from commit f891bf1dc6330bbe95ad60cabf8912e9ea4c1609)

Actions #26

Updated by pcreech almost 7 years ago

  • Platform Release set to 2.13.3
Actions #27

Updated by Ichimonji10 almost 7 years ago

The automated test for this issue passes.

Actions #28

Updated by pcreech over 6 years ago

  • Status changed from MODIFIED to 5
Actions #29

Updated by pcreech over 6 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #30

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 21
Actions #31

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (40)
Actions #32

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added
Actions #33

Updated by bmbouter almost 5 years ago

  • Tags deleted (RCM)

Also available in: Atom PDF