Issue #2532
closedrsync distributor without force_full incorrectly skips publishing some units
Description
In pulp.plugins.rsync.publish.Publisher#__init__ method there is following logic:
if self.is_fastforward():
start_date = self.last_published
end_date = None
if self.predistributor:
end_date = self.predistributor["last_publish"]
date_filter = self.create_date_range_filter(start_date=start_date, end_date=end_date)
This code calculates a date range for the distributor to process.
In summary, it will only process units associated to the repo between the last publish of the rsync distributor, and the last publish of the predistributor.
That seems to be incorrect. If association and publish is done in a certain order, this can cause units to be permanently lost from the publish (until a publish is explicitly done with "force_full").
Using a yum repo as an example, here's a sequence of events which demonstrates the problem:
- Trigger yum publish.
- Time A: yum publish completes
- Time B: associate x.rpm into yum repo
- Trigger rsync publish
- Time C: rsync publish completes
(Note: this publish will not include x.rpm since yum publish hasn't happened for that unit yet) - Trigger yum publish
- Time D: yum publish completes
- Trigger rsync publish, wait for it to complete
Expected result: after last step, repository is fully published, including x.rpm
Actual result: x.rpm is still not published, because rsync distributor only processed units associated between time C and D. Republishing won't fix it. Explicitly publishing with force_full: True will fix it.
Note I haven't attempted to reproduce this, the bug report is based on code review of latest master ( 6fc2861fd14793f8461d232cb641b5112d271519 ).
Files
Updated by bizhang almost 8 years ago
- Priority changed from Normal to High
- Sprint/Milestone set to 32
- Severity changed from 2. Medium to 3. High
- Triaged changed from No to Yes
Updated by daviddavis almost 8 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to daviddavis
Updated by daviddavis almost 8 years ago
- Blocked by Issue #2550: Publishing via rsync does not correctly look at publish records added
Updated by daviddavis almost 8 years ago
- Blocked by deleted (Issue #2550: Publishing via rsync does not correctly look at publish records)
Updated by daviddavis almost 8 years ago
- Status changed from ASSIGNED to POST
Updated by dalley over 7 years ago
- Status changed from POST to ASSIGNED
- Assignee changed from daviddavis to dalley
I am taking ownership of this issue after discussing with @daviddavis
Updated by dalley over 7 years ago
rmcgover, I was not able to reproduce this issue using this sequence of events on the commit provided (6fc2861fd). Here is the steps I used to test within our vagrant development environment:
dnf download cowsay
pulp-admin rpm repo sync run --repo-id zoo
phttp POST https://localhost/pulp/api/v2/repositories/zoo/distributors/ < rsync_distributor.json
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=yum_distributor
pulp-admin rpm repo uploads rpm --repo-id zoo --file cowsay-3.04-4.fc25.noarch.rpm
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=my_rpm_rsync_distributor
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=yum_distributor
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=my_rpm_rsync_distributor
where:
rsync_distributor.json
===================
{
"distributor_id": "my_rpm_rsync_distributor",
"distributor_type_id": "rpm_rsync_distributor",
"distributor_config": {
"remote": {
"auth_type": "publickey",
"ssh_user": "vagrant",
"ssh_identity_file": "/home/vagrant/.ssh/id_rsa",
"host": "dev.example.com",
"root": "/home/vagrant/pulp_root"
},
"predistributor_id": "yum_distributor"
}
}
- There are two vagrant boxes dev.example.com and dev2.example.com
- Commands are being run on dev2.example.com, dev.example.com is used as the target for distributing files
- "phttp" is aliased to "http --verify no --cert ~/.pulp/user-cert.pem"
When taking the above steps, the cowsay package gets distributed in the first rsync publish after associating the package with the repository.
I will look into it a bit further to ensure that something is indeed covering for this scenario
Updated by jortel@redhat.com over 7 years ago
- Sprint/Milestone changed from 37 to 38
Updated by rmcgover over 7 years ago
I found two reasons why that sequence won't work to reproduce:
1) It's a special case of the first usage of rsync distributor on the repo after it was created,
and that will always be treated as a non-fastforward publish (and rightly so).
2) There is an additional bug in rsync distributor which interferes with the test.
If a unit has never been deleted from the repo, fast-forward publish is not used.
This is https://pulp.plan.io/issues/2666 .
Taking these into account, I could reproduce the issue with the following modified sequence:
# started with clean pulp
pclean ; ppopulate
dnf download cowsay
pulp-admin rpm repo sync run --repo-id zoo
phttp POST https://localhost/pulp/api/v2/repositories/zoo/distributors/ < rsync_distributor.json
# ensure an RPM has been removed from the repo (required to avoid https://pulp.plan.io/issues/2743 )
pulp-admin rpm repo remove rpm --repo-id zoo --str-eq 'filename=kangaroo-0.2-1.noarch.rpm'
# ensure both distributors are fully published
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=yum_distributor
# and wait until the publish task completes
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=my_rpm_rsync_distributor
# and wait until the publish task completes
pulp-admin rpm repo uploads rpm --repo-id zoo --file cowsay-3.04-4.fc25.noarch.rpm
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=my_rpm_rsync_distributor
# and wait until publish completes
# expected behavior: it is acceptable for cowsay not to be published
# actual behavior: cowsay is not published
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=yum_distributor
# and wait until publish completes
phttp POST https://localhost/pulp/api/v2/repositories/zoo/actions/publish/ id=my_rpm_rsync_distributor
# expected behavior: cowsay must be published
# actual behavior: cowsay is not published
Updated by dalley over 7 years ago
- File IMG_20170504_142218506.jpg IMG_20170504_142218506.jpg added
- Sprint/Milestone changed from 38 to 37
I made a test matrix for the patch that comes out of this. A check means that the cowsay package has been or should be published. The column on the left is expected results, the column on the right is actual results, circled results are inconsistent with expectations.
The fix for #2666 has been merged so that should no longer be an issue.
I will make sure this tests well in all of the scenarios listed and also try to get some of them automated so that it doesn't get broken in the future. Unfortunately the code is a bit deceptively complicated due to the interaction of the various special cases.
Updated by dalley over 7 years ago
Added by dalley over 7 years ago
Added by dalley over 7 years ago
Revision 3bbf6f64 | View on GitHub
Fix rsync distributor skipping units
Corrects the rsync distributor fast-forward logic so that it uses the proper date range to determine which units to include. This fixes an issue where units were not being published if the association, predistributor publish and rsync publish occurred in a certain order.
Updated by dalley over 7 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulp|f891bf1dc6330bbe95ad60cabf8912e9ea4c1609.
Updated by dalley over 7 years ago
Applied in changeset pulp_rpm:3bbf6f64fafa78ad7010ff26b65ed2f4817e224e.
Added by dalley over 7 years ago
Revision 60133628 | View on GitHub
Fix rsync distributor skipping units
Corrects the rsync distributor fast-forward logic so that it uses the proper date range to determine which units to include. This fixes an issue where units were not being published if the association, predistributor publish and rsync publish occurred in a certain order.
closes #2532 https://pulp.plan.io/issues/2532
(cherry picked from commit 3bbf6f64fafa78ad7010ff26b65ed2f4817e224e)
Updated by dalley over 7 years ago
Applied in changeset pulp_rpm:60133628872f1fd925c7d2a85fd12b47a6630c50.
Added by dalley over 7 years ago
Revision f524040d | View on GitHub
Fix rsync distributor skipping units
Corrects the rsync distributor fast-forward logic so that it uses the proper date range to determine which units to include. This fixes an issue where units were not being published if the association, predistributor publish and rsync publish occurred in a certain order.
closes #2532 https://pulp.plan.io/issues/2532
(cherry picked from commit f891bf1dc6330bbe95ad60cabf8912e9ea4c1609)
Updated by dalley over 7 years ago
Applied in changeset pulp|f524040d2d0b0e25bc384de0187b8dc157e05678.
Updated by pcreech over 7 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Fix rsync distributor skipping units
Corrects the rsync distributor fast-forward logic so that it uses the proper date range to determine which units to include. This fixes an issue where units were not being published if the association, predistributor publish and rsync publish occurred in a certain order.
closes #2532 https://pulp.plan.io/issues/2532