Project

Profile

Help

Issue #2844

Rsync publish for RPM repo fails to run in fast forward mode

Added by Ichimonji10 over 2 years ago. Updated 10 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Severity:
2. Medium
Version:
Master
Platform Release:
2.13.3
Blocks Release:
2.13.z, 2.14.z
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
QA Contact:
Complexity:
Smash Test:
Verified:
Yes
Verification Required:
No
Sprint:
Sprint 21

Description

To reproduce, do the following:

1. Create a yum repository with a yum and rsync distributor.
2. Add some content to the repository.
3. Publish with the yum and rsync distributors.
4. Publish with the yum and rsync distributors again.

The second publish with the rsync distributor should be a fast-forward publish, as no new units were added to the repository between the first and second publishes. Unfortunately, the second publish appears to be a full publish. This issue cropped up three months ago and was fixed in Pulp 2.13.1. See issue #2666. It has re-appeared in the Pulp 2.14 nightlies.

To figure out whether an issue is a full or fast-forward publish, one can examine the asynchronous task reports generated by Pulp. Specifically, one can do the following:

1. Publish an RPM repository with its rsync distributor. Capture the returned call report.
2. Recursively walk through all of the tasks referenced from this call report, and pick out the spawned task with a task_type of pulp.server.managers.repo.publish.publish.
3. Inspect the spawned task's steps, as listed out in ['result']['details']. The Unit query step (rpm, drpm, srpm) step should list a value of zero in its num_processed field.

Here's an abbreviated example of a pulp.server.managers.repo.publish.publish task:

{'snip': 'snippety snip!',
 'result': {'_ns': 'repo_publish_results',
            'completed': '2017-06-26T18:03:38Z',
            'details': [{'description': '',
                         'details': '',
                         'error_details': [],
                         'items_total': 2,
                         'num_failures': 0,
                         'num_processed': 2,
                         'num_success': 2,
                         'state': 'FINISHED',
                         'step_id': 'a5b6e8e1-5d6e-4991-8930-e729e1709cc8',
                         'step_type': 'Unit query step (rpm, drpm, srpm)'},
                        {'description': 'Rsync files to remote destination',
                         'details': '',
                         'error_details': [],
                         'items_total': 1,
                         'num_failures': 0,
                         'num_processed': 1,
                         'num_success': 1,
                         'state': 'FINISHED',
                         'step_id': '78e1a4ae-0340-4786-b7a6-35c5bc93a506',
                         'step_type': 'Rsync step (origin)'},
                        {'description': 'Rsync files to remote destination',
                         'details': '',
                         'error_details': [],
                         'items_total': 1,
                         'num_failures': 0,
                         'num_processed': 1,
                         'num_success': 1,
                         'state': 'FINISHED',
                         'step_id': 'd2a0212a-57eb-4d33-9c1c-75449c734251',
                         'step_type': 'Rsync step (content)'},
                        {'description': 'Rsync files to remote destination',
                         'details': '',
                         'error_details': [],
                         'items_total': 1,
                         'num_failures': 0,
                         'num_processed': 1,
                         'num_success': 1,
                         'state': 'FINISHED',
                         'step_id': 'b0f8dc3c-1dfc-46f4-9def-54cccbd4b22a',
                         'step_type': 'Rsync step (repodata)'},
                        {'description': '',
                         'details': '',
                         'error_details': [],
                         'items_total': 1,
                         'num_failures': 0,
                         'num_processed': 1,
                         'num_success': 1,
                         'state': 'FINISHED',
                         'step_id': '0785cbfa-9e7a-4d05-8037-a69738eedc17',
                         'step_type': 'UpdateLastPredistDate'}],
            'distributor_id': '9c0e0d42-93dc-4335-9168-06a0d292a77c',
            'distributor_type_id': 'rpm_rsync_distributor',
            'error_message': None,
            'exception': None,
            'id': '59514c7a14059a093629fb86',
            'repo_id': '54b37808-1e7c-43d5-a06d-4e6415dfa3ac',
            'result': 'success',
            'started': '2017-06-26T18:03:36Z',
            'summary': {'Rsync step (content)': 'FINISHED',
                        'Rsync step (origin)': 'FINISHED',
                        'Rsync step (repodata)': 'FINISHED',
                        'Unit query step (rpm, drpm, srpm)': 'FINISHED',
                        'UpdateLastPredistDate': 'FINISHED'},
            'traceback': None},
 'spawned_tasks': [],
 'start_time': '2017-06-26T18:03:36Z',
 'state': 'finished',
 'tags': ['pulp:repository:54b37808-1e7c-43d5-a06d-4e6415dfa3ac',
          'pulp:action:publish'],
 'task_id': '05cef649-ae80-4a78-b1af-18926b4dcc8c',
 'task_type': 'pulp.server.managers.repo.publish.publish',
 'traceback': None,
 'worker_name': 'reserved_resource_worker-0@rhel-7-pulp-2-14'}

Notice the 'num_processed': 2 line? That's correct for an initial rsync publish. It's incorrect for a fast-forward publish. (I think it indicates that two units were copied to the remote host.) For a fast-forward publish, that value should be zero. For more on this, see: Pulp #2666 Note #15 and subsequent comments.

This issue can also be reproduced with Pulp Smash:

python -m unittest pulp_smash.tests.rpm.api_v2.test_rsync_distributor.PublishTwiceTestCase

Related issues

Copied from RPM Support - Issue #2666: Rsync publish for RPM repo fails to run in fast forward mode CLOSED - CURRENTRELEASE Actions

Associated revisions

Revision a212fdcf View on GitHub
Added by dalley over 2 years ago

Fix fastforward rsync publishes

Fixes an issue where fastforward publishes were not occurring when they should have been.

closes #2844 https://pulp.plan.io/issues/2844

Revision 25ac2211 View on GitHub
Added by dalley over 2 years ago

Fix fastforward rsync publishes

Fixes an issue where fastforward publishes were not occurring when they should have been.

closes #2844 https://pulp.plan.io/issues/2844

(cherry picked from commit a212fdcf28b296fe6f40ad5a68a96695c4a49341)

History

#1 Updated by Ichimonji10 over 2 years ago

  • Copied from Issue #2666: Rsync publish for RPM repo fails to run in fast forward mode added

#2 Updated by Ichimonji10 over 2 years ago

This is a regression. That's why I've marked it as a blocker for 2.14.z.

#3 Updated by ttereshc over 2 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley
  • Priority changed from Normal to High
  • Sprint/Milestone set to 40
  • Triaged changed from No to Yes

#4 Updated by Ichimonji10 over 2 years ago

  • Description updated (diff)

#5 Updated by dalley over 2 years ago

  • Status changed from ASSIGNED to MODIFIED

#8 Updated by pcreech over 2 years ago

  • Platform Release set to 2.13.3

#9 Updated by Ichimonji10 over 2 years ago

  • Status changed from MODIFIED to ASSIGNED

The test for this issue still fails:

  • non-deterministically, at least 3/4 of the time, and
  • on Pulp 2.13 and 2.14 nightlies, both on Jenkins and on my personal VMs.

The test now fails in a slightly different way. As originally reported, num_processed would be "2". Now, num_processed is "1". This is significant. The test for this issue does the following:

1. Create a yum repository with yum and rsync distributors.
2. Add two RPMs to the repository.
3. Publish the yum and rsync distributors in turn.
4. Publish the yum and rsync distributors in turn, again.

If the second publish was a full publish instead of a fast-forward publish, one would expect num_processed to be "2".

journalctl shows no errors.

In an attempt to learn a little more, I inserted import time; time.sleep(10) into the automated test, between step 3 and 4. I then ran stat on some of the data in the target repository. Here's example output from a test that passed:

# stat */Packages/*/*
  File: 'beb1bbde-a411-413b-8bd1-19f7e6a226a7/Packages/b/bear-4.1-1.noarch.rpm' -> '../../../content/units/rpm/30/938ff308e93a57caac883fa9ee3a8ffd61f457f413eb399387702a53f8ab4d/4ebbd279-6a27-4302-bb80-b6b8a788fe72'
  Size: 129             Blocks: 0          IO Block: 4096   symbolic link
Device: fd00h/64768d    Inode: 9777240     Links: 1
Access: (0777/lrwxrwxrwx)  Uid: ( 1000/ my-user)   Gid: ( 1000/ my-user)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2017-07-06 11:43:35.888092613 -0400
Modify: 2017-07-06 11:41:41.472762625 -0400
Change: 2017-07-06 11:41:42.896741841 -0400
 Birth: -
  File: 'beb1bbde-a411-413b-8bd1-19f7e6a226a7/Packages/c/camel-0.1-1.noarch.rpm' -> '../../../content/units/rpm/40/fca1fbddb501b7f598f3e8b40db5e9cd9445ee720482187cac33f3e3054baf/6ef57cd7-5ff9-4d9e-a391-b6c8b443b3de'
  Size: 129             Blocks: 0          IO Block: 4096   symbolic link
Device: fd00h/64768d    Inode: 17836660    Links: 1
Access: (0777/lrwxrwxrwx)  Uid: ( 1000/ my-user)   Gid: ( 1000/ my-user)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2017-07-06 11:43:35.888092613 -0400
Modify: 2017-07-06 11:41:41.466762713 -0400
Change: 2017-07-06 11:41:42.896741841 -0400
 Birth: -

And here's sample output from a test that failed:

# stat */Packages/*/*
  File: '2bfdc6fd-f96d-4b16-b267-c7c7afa73dd3/Packages/b/bear-4.1-1.noarch.rpm' -> '../../../content/units/rpm/30/938ff308e93a57caac883fa9ee3a8ffd61f457f413eb399387702a53f8ab4d/f17f95ca-decf-41ba-9837-e560eb5c1e31'
  Size: 129             Blocks: 0          IO Block: 4096   symbolic link
Device: fd00h/64768d    Inode: 8737428     Links: 1
Access: (0777/lrwxrwxrwx)  Uid: ( 1000/ my-user)   Gid: ( 1000/ my-user)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2017-07-06 11:43:35.887402989 -0400
Modify: 2017-07-06 11:41:42.045097190 -0400
Change: 2017-07-06 11:41:43.425076955 -0400
 Birth: -
  File: '2bfdc6fd-f96d-4b16-b267-c7c7afa73dd3/Packages/c/camel-0.1-1.noarch.rpm' -> '../../../content/units/rpm/40/fca1fbddb501b7f598f3e8b40db5e9cd9445ee720482187cac33f3e3054baf/6a211fb0-67a6-465b-96f8-b8145bb3fa44'
  Size: 129             Blocks: 0          IO Block: 4096   symbolic link
Device: fd00h/64768d    Inode: 17898351    Links: 1
Access: (0777/lrwxrwxrwx)  Uid: ( 1000/ my-user)   Gid: ( 1000/ my-user)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2017-07-06 11:43:35.888402973 -0400
Modify: 2017-07-06 11:42:02.587795960 -0400
Change: 2017-07-06 11:42:03.887776898 -0400
 Birth: -

Check out the modify timestamps. For the test that passed, the two links have approximately equal timestamps. For the test that failed, the two links have different timestamps. This indicates that, when the test fails, files really are being copied to the target repository with rsync. This same pattern holds true in every case I've seen. Here's some more outputs from tests that passed:

[root@fedora-24-pulp-2-14 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:53.223879499 -0400
Modify: 2017-07-06 12:05:53.230879388 -0400
[root@rhel-7-pulp-2-14 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:53.000000000 -0400
Modify: 2017-07-06 12:05:53.000000000 -0400

And here's some outputs from tests that failed:

[root@fedora-24-pulp-2-13 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:52.967185628 -0400
Modify: 2017-07-06 12:06:13.491846684 -0400
[root@fedora-25-pulp-2-13 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:52.863403990 -0400
Modify: 2017-07-06 12:06:13.474070807 -0400
[root@fedora-25-pulp-2-14 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:53.723338133 -0400
Modify: 2017-07-06 12:06:14.327999193 -0400
[root@rhel-7-pulp-2-13 my-user]# stat */Packages/*/* | grep Modify:
Modify: 2017-07-06 12:05:53.000000000 -0400
Modify: 2017-07-06 12:06:14.000000000 -0400

#10 Updated by pcreech over 2 years ago

  • Blocks Release 2.13.z added

#11 Updated by dalley over 2 years ago

  • Status changed from ASSIGNED to MODIFIED

This new behavior is actually a separate problem with fairly low impact. The cause of the issue described in https://pulp.plan.io/issues/2844#note-9 is an emergent flaw in the underlying RepositoryContentUnit data model.

RepositoryContentUnit stores the "created" timestamp as an ISO 8601 string which only resolves to the nearest second (truncating the fraction), while the timestamps of repository publishes are stored to the nearest millisecond. Unfortunately, this means that if a repo association happens within a fraction of a second of a yum predistributor publish, we lose information about which occurred first.

This behavior is emergent now because previously the window for determining which content should be included in a fastforward publish started with the time of the last rsync distributor publish, which was incorrect. It now uses the time of the previous predistributor publish, which is the correct behavior. With this test, the yum predistributor publish manages to complete within the fraction-of-a-second time window some percentage of the time, whereas previously both the yum and rsync publishes would have needed to occur within that time window, which apparently doesn't happen.

The publishes taking place are still fastforward publishes, they are merely including a unit associated within this small time window as though they had taken place after the publish. Because of this, the impact of this new bug is somewhat low, especially given that a human could never act quickly enough to trigger this behavior.

Fixing this properly would require changing the underlying models, which is probably not a good idea at this point in the lifecycle of Pulp 2. This should absolutely be fixed in Pulp 3, however. We just want to make sure we aren't throwing away that millisecond information in the database like we are currently.

#12 Updated by dalley over 2 years ago

Taking the above into account, I would recommend adding a time.sleep() of 2 seconds in between the content association and the publish portions of the smash test, as a workaround for this defect.

#13 Updated by Ichimonji10 over 2 years ago

Taking the above into account, I would recommend adding a time.sleep() of 2 seconds in between the content association and the publish portions of the smash test, as a workaround for this defect.

Thank you! https://github.com/PulpQE/pulp-smash/pull/711

Fixing this properly would require changing the underlying models, which is probably not a good idea at this point in the lifecycle of Pulp 2. This should absolutely be fixed in Pulp 3, however.

Sounds like a reasonable judgment call to me.

Because of this, the impact of this new bug is somewhat low, especially given that a human could never act quickly enough to trigger this behavior.

IMO, whether a human can trigger this behaviour is beside the point, as Pulp is scriptable.

#14 Updated by pthomas@redhat.com over 2 years ago

  • Verified changed from No to Yes

Automated tests pass for this.

#15 Updated by pcreech over 2 years ago

  • Status changed from MODIFIED to ON_QA

#16 Updated by pcreech over 2 years ago

  • Status changed from ON_QA to CLOSED - CURRENTRELEASE

#17 Updated by bmbouter almost 2 years ago

  • Sprint set to Sprint 21

#18 Updated by bmbouter almost 2 years ago

  • Sprint/Milestone deleted (40)

#19 Updated by bmbouter 11 months ago

  • Tags Pulp 2 added

#20 Updated by bmbouter 10 months ago

  • Tags deleted (RCM)

Please register to edit this issue

Also available in: Atom PDF