Project

Profile

Help

Issue #1043

closed

Pulp's repo sync and publish history is sorted by start_time, which is not indexed

Added by rbarlow almost 9 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
2.6.0
Platform Release:
2.6.5
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Easy Fix, Pulp 2
Sprint:
Quarter:

Description

Our pulp.server.managers.repo.sync.RepoSyncManager.sync_history() method sorts the repo sync history by started (start time), which is not an indexed field. This can cause MongoDB to fail to retrieve the sync history if there are too many documents in the collection with this error:

database error: too much data for sort() with no index.  add an index or specify a smaller limit

We can fix this by indexing the started field, but that will increase the amount of memory consumed by mongod which I think might be undesirable. A clever alternative is to sort by the MongoDB ObjectID, which isn't going to be an identical sort, but should be similar enough. This will cause us to sort by the creation time of the sync history document, rather than by the start time of the sync task.

If we have reservations about semantic versioning due to this proposal, we may be able to add the index for Pulp 2.Y, and drop in with Pulp 3.0 where we switch to sorting by ObjectID. I am on the fence about whether this is a semantic versioning concern.

Actions #1

Updated by rbarlow almost 9 years ago

  • Description updated (diff)
Actions #2

Updated by bmbouter almost 9 years ago

I think this is similar enough of a sort that it does not violate semantic versioning.

Actions #3

Updated by mhrivnak almost 9 years ago

  • Priority changed from Normal to High
  • Severity changed from 2. Medium to 3. High
  • Version set to 2.6.0
  • Platform Release set to 2.6.3
Actions #4

Updated by mhrivnak almost 9 years ago

  • Triaged changed from No to Yes
Actions #5

Updated by amacdona@redhat.com almost 9 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to amacdona@redhat.com
Actions #6

Updated by amacdona@redhat.com almost 9 years ago

  • Subject changed from Pulp's repo sync history is sorted by start_time, which is not indexed to Pulp's repo sync and publish history is sorted by start_time, which is not indexed

This is also the case for repo publish history.

Actions #7

Updated by amacdona@redhat.com almost 9 years ago

  • Status changed from ASSIGNED to POST

https://github.com/pulp/pulp/pull/1965

From the mongo docs http://docs.mongodb.org/v2.4/reference/object-id/#objectid:
"sorting on an _id field that stores ObjectId values is roughly equivalent to sorting by creation time."

Added by Austin Macdonald almost 9 years ago

Revision cb50793f | View on GitHub

sort sync and publish history by an indexed key

fixes #1043

Added by Austin Macdonald almost 9 years ago

Revision cb50793f | View on GitHub

sort sync and publish history by an indexed key

fixes #1043

Actions #8

Updated by mhrivnak over 8 years ago

  • Platform Release changed from 2.6.3 to 2.6.4
Actions #9

Updated by Anonymous over 8 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #10

Updated by dkliban@redhat.com over 8 years ago

  • Platform Release changed from 2.6.4 to 2.6.5
Actions #11

Updated by dkliban@redhat.com over 8 years ago

  • Status changed from MODIFIED to 5
Actions #12

Updated by dkliban@redhat.com over 8 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #14

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF