Project

Profile

Help

Task #3848

closed

Consider using integer IDs in Pulp instead of UUIDs

Added by daviddavis over 6 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

Motivation

- Better performance[0]
- Less storage required (4 bytes for int vs 16 byes for UUIDs)
- Hrefs would be shorter (e.g. /pulp/api/v3/repositories/1/)
- In line with other apps like Katello

Drawbacks

- Integer ids expose info like how many records there are
- Can’t support sharding or multiple dbs (are we ever going to need this?)

Solution

Switching to integer IDs is pretty easy. We just need to remove a few lines that specify id as a UUID. The default in Django is int ids.

There is one exception or problem though. Jobs in rq/redis are created using task id[1] and this job id needs to be a uuid. I see two possible solutions:

1. We leave task id as a UUID but every other id is an integer
2. We add a job uuid field on task

[0] creating 400,000 units, the non-uuid PK is 30% faster at 42.22 seconds vs. 55.98 seconds. searching through the same 400,000 units, performance is still about 30% faster. Doing a filter for file content units that have a relative_path__startswith={some random letter} (I put UUIDs in all the fields) takes about 0.44 seconds if the model has a UUID pk and about 0.33 seconds if the model has a default Django auto-incrementing PK.
[1] [0] https://github.com/pulp/pulp/blob/9bfc50d90a24c9d0ac4a93f5718187515b947058/pulpcore/pulpcore/tasking/tasks.py#L187


Related issues

Blocks Python Support - Task #3860: Update code and docs to use int idsMODIFIEDvdusek

Actions

Also available in: Atom PDF