Project

Profile

Help

Task #3848

Consider using integer IDs in Pulp instead of UUIDs

Added by daviddavis over 1 year ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Platform Release:
Blocks Release:
Backwards Incompatible:
No
Groomed:
No
Sprint Candidate:
No
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:

Description

Motivation

- Better performance0
- Less storage required (4 bytes for int vs 16 byes for UUIDs)
- Hrefs would be shorter (e.g. /pulp/api/v3/repositories/1/)
- In line with other apps like Katello

Drawbacks

- Integer ids expose info like how many records there are
- Can’t support sharding or multiple dbs (are we ever going to need this?)

Solution

Switching to integer IDs is pretty easy. We just need to remove a few lines that specify id as a UUID. The default in Django is int ids.

There is one exception or problem though. Jobs in rq/redis are created using task id1 and this job id needs to be a uuid. I see two possible solutions:

1. We leave task id as a UUID but every other id is an integer
2. We add a job uuid field on task

[0] creating 400,000 units, the non-uuid PK is 30% faster at 42.22 seconds vs. 55.98 seconds. searching through the same 400,000 units, performance is still about 30% faster. Doing a filter for file content units that have a relative_path__startswith={some random letter} (I put UUIDs in all the fields) takes about 0.44 seconds if the model has a UUID pk and about 0.33 seconds if the model has a default Django auto-incrementing PK.
[1] [0] https://github.com/pulp/pulp/blob/9bfc50d90a24c9d0ac4a93f5718187515b947058/pulpcore/pulpcore/tasking/tasks.py#L187


Related issues

Blocks Python Support - Task #3860: Update code and docs to use int ids MODIFIED Actions

Associated revisions

Revision 12919c20 View on GitHub
Added by daviddavis over 1 year ago

Switch to using integer IDs instead of UUIDs

fixes #3848
https://pulp.plan.io/issues/3848

Revision 12919c20 View on GitHub
Added by daviddavis over 1 year ago

Switch to using integer IDs instead of UUIDs

fixes #3848
https://pulp.plan.io/issues/3848

Revision 12919c20 View on GitHub
Added by daviddavis over 1 year ago

Switch to using integer IDs instead of UUIDs

fixes #3848
https://pulp.plan.io/issues/3848

History

#1 Updated by daviddavis over 1 year ago

  • Description updated (diff)

#2 Updated by daviddavis over 1 year ago

  • Description updated (diff)

#3 Updated by daviddavis over 1 year ago

  • Description updated (diff)

#4 Updated by bmbouter over 1 year ago

I agree RQ will need those task ids. I think I prefer option 1. It's helpful to reference our task records using the same id that RQ thinks of the job as and avoiding introducing another identifier.

#6 Updated by daviddavis over 1 year ago

  • Blocks Task #3860: Update code and docs to use int ids added

#7 Updated by daviddavis over 1 year ago

  • Status changed from NEW to POST
  • Assignee set to daviddavis

#8 Updated by daviddavis over 1 year ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#9 Updated by gmbnomis about 1 year ago

I noticed a problem with the change when trying to adapt
pulp_cookbook to the newest version of Pulp core:

The storage path of a PublishedMetadata file contains the pk (see
[0]). With integer IDs, the PK is only known after a .save(). Thus
code like [1] in pulp_cookbook (or [2] on pulp_file) puts
files at ".../None/<relative_path>" regardless of the PK actually
used.

The best I came up with is the following code:

            metadata = PublishedMetadata(
                relative_path=os.path.basename(universe.relative_path),
                publication=publication,
                file=None)
            metadata.save()
            metadata.file = File(open(universe.relative_path, 'rb'))
            metadata.save()

This will generate proper paths, but is a workaround.

[0] https://github.com/pulp/pulp/blob/master/pulpcore/pulpcore/app/models/storage.py#L144
[1] https://github.com/gmbnomis/pulp_cookbook/blob/master/pulp_cookbook/app/tasks/publishing.py#L49
[2] https://github.com/pulp/pulp_file/blob/master/pulp_file/app/tasks/publishing.py#L46

#10 Updated by daviddavis about 1 year ago

Ah great catch. I was wondering how this slipped by but I suppose it does save the published metadata file to a dir called None and I didn't try more than one publish. I've opened https://pulp.plan.io/issues/3878. Thanks for reporting this.

#11 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#12 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF