Issue #2844
closedRsync publish for RPM repo fails to run in fast forward mode
Description
To reproduce, do the following:
1. Create a yum repository with a yum and rsync distributor.
2. Add some content to the repository.
3. Publish with the yum and rsync distributors.
4. Publish with the yum and rsync distributors again.
The second publish with the rsync distributor should be a fast-forward publish, as no new units were added to the repository between the first and second publishes. Unfortunately, the second publish appears to be a full publish. This issue cropped up three months ago and was fixed in Pulp 2.13.1. See issue #2666. It has re-appeared in the Pulp 2.14 nightlies.
To figure out whether an issue is a full or fast-forward publish, one can examine the asynchronous task reports generated by Pulp. Specifically, one can do the following:
1. Publish an RPM repository with its rsync distributor. Capture the returned call report.
2. Recursively walk through all of the tasks referenced from this call report, and pick out the spawned task with a task_type
of pulp.server.managers.repo.publish.publish
.
3. Inspect the spawned task's steps, as listed out in ['result']['details']
. The Unit query step (rpm, drpm, srpm)
step should list a value of zero in its num_processed
field.
Here's an abbreviated example of a pulp.server.managers.repo.publish.publish
task:
{'snip': 'snippety snip!',
'result': {'_ns': 'repo_publish_results',
'completed': '2017-06-26T18:03:38Z',
'details': [{'description': '',
'details': '',
'error_details': [],
'items_total': 2,
'num_failures': 0,
'num_processed': 2,
'num_success': 2,
'state': 'FINISHED',
'step_id': 'a5b6e8e1-5d6e-4991-8930-e729e1709cc8',
'step_type': 'Unit query step (rpm, drpm, srpm)'},
{'description': 'Rsync files to remote destination',
'details': '',
'error_details': [],
'items_total': 1,
'num_failures': 0,
'num_processed': 1,
'num_success': 1,
'state': 'FINISHED',
'step_id': '78e1a4ae-0340-4786-b7a6-35c5bc93a506',
'step_type': 'Rsync step (origin)'},
{'description': 'Rsync files to remote destination',
'details': '',
'error_details': [],
'items_total': 1,
'num_failures': 0,
'num_processed': 1,
'num_success': 1,
'state': 'FINISHED',
'step_id': 'd2a0212a-57eb-4d33-9c1c-75449c734251',
'step_type': 'Rsync step (content)'},
{'description': 'Rsync files to remote destination',
'details': '',
'error_details': [],
'items_total': 1,
'num_failures': 0,
'num_processed': 1,
'num_success': 1,
'state': 'FINISHED',
'step_id': 'b0f8dc3c-1dfc-46f4-9def-54cccbd4b22a',
'step_type': 'Rsync step (repodata)'},
{'description': '',
'details': '',
'error_details': [],
'items_total': 1,
'num_failures': 0,
'num_processed': 1,
'num_success': 1,
'state': 'FINISHED',
'step_id': '0785cbfa-9e7a-4d05-8037-a69738eedc17',
'step_type': 'UpdateLastPredistDate'}],
'distributor_id': '9c0e0d42-93dc-4335-9168-06a0d292a77c',
'distributor_type_id': 'rpm_rsync_distributor',
'error_message': None,
'exception': None,
'id': '59514c7a14059a093629fb86',
'repo_id': '54b37808-1e7c-43d5-a06d-4e6415dfa3ac',
'result': 'success',
'started': '2017-06-26T18:03:36Z',
'summary': {'Rsync step (content)': 'FINISHED',
'Rsync step (origin)': 'FINISHED',
'Rsync step (repodata)': 'FINISHED',
'Unit query step (rpm, drpm, srpm)': 'FINISHED',
'UpdateLastPredistDate': 'FINISHED'},
'traceback': None},
'spawned_tasks': [],
'start_time': '2017-06-26T18:03:36Z',
'state': 'finished',
'tags': ['pulp:repository:54b37808-1e7c-43d5-a06d-4e6415dfa3ac',
'pulp:action:publish'],
'task_id': '05cef649-ae80-4a78-b1af-18926b4dcc8c',
'task_type': 'pulp.server.managers.repo.publish.publish',
'traceback': None,
'worker_name': 'reserved_resource_worker-0@rhel-7-pulp-2-14'}
Notice the 'num_processed': 2
line? That's correct for an initial rsync publish. It's incorrect for a fast-forward publish. (I think it indicates that two units were copied to the remote host.) For a fast-forward publish, that value should be zero. For more on this, see: Pulp #2666 Note #15 and subsequent comments.
This issue can also be reproduced with Pulp Smash:
python -m unittest pulp_smash.tests.rpm.api_v2.test_rsync_distributor.PublishTwiceTestCase
Related issues
Updated by Ichimonji10 over 7 years ago
- Copied from Issue #2666: Rsync publish for RPM repo fails to run in fast forward mode added
Updated by Ichimonji10 over 7 years ago
This is a regression. That's why I've marked it as a blocker for 2.14.z.
Updated by ttereshc over 7 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dalley
- Priority changed from Normal to High
- Sprint/Milestone set to 40
- Triaged changed from No to Yes
Added by dalley over 7 years ago
Updated by dalley over 7 years ago
- Status changed from ASSIGNED to MODIFIED
Applied in changeset pulp:pulp|a212fdcf28b296fe6f40ad5a68a96695c4a49341.
Added by dalley over 7 years ago
Revision 25ac2211 | View on GitHub
Fix fastforward rsync publishes
Fixes an issue where fastforward publishes were not occurring when they should have been.
closes #2844 https://pulp.plan.io/issues/2844
(cherry picked from commit a212fdcf28b296fe6f40ad5a68a96695c4a49341)
Updated by dalley over 7 years ago
Applied in changeset pulp:pulp|25ac2211964e72132f49cf3c5dd1b82555b58914.
Updated by Ichimonji10 over 7 years ago
- Status changed from MODIFIED to ASSIGNED
The test for this issue still fails:
- non-deterministically, at least 3/4 of the time, and
- on Pulp 2.13 and 2.14 nightlies, both on Jenkins and on my personal VMs.
The test now fails in a slightly different way. As originally reported, num_processed
would be "2". Now, num_processed
is "1". This is significant. The test for this issue does the following:
1. Create a yum repository with yum and rsync distributors.
2. Add two RPMs to the repository.
3. Publish the yum and rsync distributors in turn.
4. Publish the yum and rsync distributors in turn, again.
If the second publish was a full publish instead of a fast-forward publish, one would expect num_processed
to be "2".
journalctl
shows no errors.
In an attempt to learn a little more, I inserted import time; time.sleep(10)
into the automated test, between step 3 and 4. I then ran stat
on some of the data in the target repository. Here's example output from a test that passed:
# stat */Packages/*/*
File: 'beb1bbde-a411-413b-8bd1-19f7e6a226a7/Packages/b/bear-4.1-1.noarch.rpm' -> '../../../content/units/rpm/30/938ff308e93a57caac883fa9ee3a8ffd61f457f413eb399387702a53f8ab4d/4ebbd279-6a27-4302-bb80-b6b8a788fe72'
Size: 129 Blocks: 0 IO Block: 4096 symbolic link
Device: fd00h/64768d Inode: 9777240 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 1000/ my-user) Gid: ( 1000/ my-user)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2017-07-06 11:43:35.888092613 -0400
Modify: 2017-07-06 11:41:41.472762625 -0400
Change: 2017-07-06 11:41:42.896741841 -0400
Birth: -
File: 'beb1bbde-a411-413b-8bd1-19f7e6a226a7/Packages/c/camel-0.1-1.noarch.rpm' -> '../../../content/units/rpm/40/fca1fbddb501b7f598f3e8b40db5e9cd9445ee720482187cac33f3e3054baf/6ef57cd7-5ff9-4d9e-a391-b6c8b443b3de'
Size: 129 Blocks: 0 IO Block: 4096 symbolic link
Device: fd00h/64768d Inode: 17836660 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 1000/ my-user) Gid: ( 1000/ my-user)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2017-07-06 11:43:35.888092613 -0400
Modify: 2017-07-06 11:41:41.466762713 -0400
Change: 2017-07-06 11:41:42.896741841 -0400
Birth: -
And here's sample output from a test that failed:
# stat */Packages/*/*
File: '2bfdc6fd-f96d-4b16-b267-c7c7afa73dd3/Packages/b/bear-4.1-1.noarch.rpm' -> '../../../content/units/rpm/30/938ff308e93a57caac883fa9ee3a8ffd61f457f413eb399387702a53f8ab4d/f17f95ca-decf-41ba-9837-e560eb5c1e31'
Size: 129 Blocks: 0 IO Block: 4096 symbolic link
Device: fd00h/64768d Inode: 8737428 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 1000/ my-user) Gid: ( 1000/ my-user)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2017-07-06 11:43:35.887402989 -0400
Modify: 2017-07-06 11:41:42.045097190 -0400
Change: 2017-07-06 11:41:43.425076955 -0400
Birth: -
File: '2bfdc6fd-f96d-4b16-b267-c7c7afa73dd3/Packages/c/camel-0.1-1.noarch.rpm' -> '../../../content/units/rpm/40/fca1fbddb501b7f598f3e8b40db5e9cd9445ee720482187cac33f3e3054baf/6a211fb0-67a6-465b-96f8-b8145bb3fa44'
Size: 129 Blocks: 0 IO Block: 4096 symbolic link
Device: fd00h/64768d Inode: 17898351 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 1000/ my-user) Gid: ( 1000/ my-user)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2017-07-06 11:43:35.888402973 -0400
Modify: 2017-07-06 11:42:02.587795960 -0400
Change: 2017-07-06 11:42:03.887776898 -0400
Birth: -
Check out the modify timestamps. For the test that passed, the two links have approximately equal timestamps. For the test that failed, the two links have different timestamps. This indicates that, when the test fails, files really are being copied to the target repository with rsync. This same pattern holds true in every case I've seen. Here's some more outputs from tests that passed:
[root@fedora-24-pulp-2-14 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:53.223879499 -0400
Modify: 2017-07-06 12:05:53.230879388 -0400
[root@rhel-7-pulp-2-14 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:53.000000000 -0400
Modify: 2017-07-06 12:05:53.000000000 -0400
And here's some outputs from tests that failed:
[root@fedora-24-pulp-2-13 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:52.967185628 -0400
Modify: 2017-07-06 12:06:13.491846684 -0400
[root@fedora-25-pulp-2-13 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:52.863403990 -0400
Modify: 2017-07-06 12:06:13.474070807 -0400
[root@fedora-25-pulp-2-14 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:53.723338133 -0400
Modify: 2017-07-06 12:06:14.327999193 -0400
[root@rhel-7-pulp-2-13 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:53.000000000 -0400
Modify: 2017-07-06 12:06:14.000000000 -0400
Updated by dalley over 7 years ago
- Status changed from ASSIGNED to MODIFIED
This new behavior is actually a separate problem with fairly low impact. The cause of the issue described in https://pulp.plan.io/issues/2844#note-9 is an emergent flaw in the underlying RepositoryContentUnit data model.
RepositoryContentUnit stores the "created" timestamp as an ISO 8601 string which only resolves to the nearest second (truncating the fraction), while the timestamps of repository publishes are stored to the nearest millisecond. Unfortunately, this means that if a repo association happens within a fraction of a second of a yum predistributor publish, we lose information about which occurred first.
This behavior is emergent now because previously the window for determining which content should be included in a fastforward publish started with the time of the last rsync distributor publish, which was incorrect. It now uses the time of the previous predistributor publish, which is the correct behavior. With this test, the yum predistributor publish manages to complete within the fraction-of-a-second time window some percentage of the time, whereas previously both the yum and rsync publishes would have needed to occur within that time window, which apparently doesn't happen.
The publishes taking place are still fastforward publishes, they are merely including a unit associated within this small time window as though they had taken place after the publish. Because of this, the impact of this new bug is somewhat low, especially given that a human could never act quickly enough to trigger this behavior.
Fixing this properly would require changing the underlying models, which is probably not a good idea at this point in the lifecycle of Pulp 2. This should absolutely be fixed in Pulp 3, however. We just want to make sure we aren't throwing away that millisecond information in the database like we are currently.
Updated by dalley over 7 years ago
Taking the above into account, I would recommend adding a time.sleep() of 2 seconds in between the content association and the publish portions of the smash test, as a workaround for this defect.
Updated by Ichimonji10 over 7 years ago
Taking the above into account, I would recommend adding a time.sleep() of 2 seconds in between the content association and the publish portions of the smash test, as a workaround for this defect.
Thank you! https://github.com/PulpQE/pulp-smash/pull/711
Fixing this properly would require changing the underlying models, which is probably not a good idea at this point in the lifecycle of Pulp 2. This should absolutely be fixed in Pulp 3, however.
Sounds like a reasonable judgment call to me.
Because of this, the impact of this new bug is somewhat low, especially given that a human could never act quickly enough to trigger this behavior.
IMO, whether a human can trigger this behaviour is beside the point, as Pulp is scriptable.
Updated by pcreech over 7 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
Fix fastforward rsync publishes
Fixes an issue where fastforward publishes were not occurring when they should have been.
closes #2844 https://pulp.plan.io/issues/2844