Story #7824
closedAs a developer, I can bulk_create subclasses of Content
0%
Description
Ticket moved to GitHub: "pulp/pulpcore/1946":https://github.com/pulp/pulpcore/issues/1946
It is currently disallowed to bulk_create() multi-table inherited Django models, due to the technical challenges with doing so, however it is not necessarily impossible in theory.
https://code.djangoproject.com/ticket/28821
As a Pulp developer, if we are able to make assumptions about the way it will be used, then we can avoid most of the problems that prevent a generic implementation. For instance, only one level of inheritance.
I developed a proof of concept strategy that is unfortunately made more difficult by Django's proxy model behavior and the fact that there's no class which represents just the subclass table. So, this code emulates how multi-table inherited models set up their internal relationships, but doesn't actually use model inheritance.
Strategy:
- Transactions don't check the integrity of foreign keys until they are committed
- So save our child table first with a random uuid as the content_ptr, in bulk
- Then go back and save the parent model, in bulk
class PulpBase(models.Model):
pulp_id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
pulp_created = models.DateTimeField(auto_now_add=True)
pulp_type = models.TextField(null=False, default=None)
class Meta:
abstract = True
class NewContent(PulpBase):
pulp_type = models.TextField(null=False, default=None)
class NewPulpManager(models.Manager):
# Ignore the hacky workarounds for stuff like pulp_type, I was trying to keep things simple
def bulk_get_or_create(self, objs, batch_size=None):
with transaction.atomic():
pulp_type = self.model.get_pulp_type()
q = models.Q(pk__in=[])
unsaved_idxs_by_nat_key = defaultdict(list)
for idx, obj in enumerate(objs):
content_already_saved = not obj._state.adding
if not content_already_saved:
unsaved_idxs_by_nat_key[obj.natural_key()].append(idx)
q |= models.Q(**obj.natural_key_dict())
existing_objs = self.model.objects.filter(q)
for existing_obj in existing_objs.iterator(chunk_size=batch_size or 2000):
for idx in unsaved_idxs_by_nat_key[existing_objs.natural_key()]:
objs[idx] = existing_obj
new_base_content = []
for obj in objs:
content_already_saved = not obj._state.adding
if not content_already_saved:
new_base_content.append(NewContent(pulp_type=pulp_type, pulp_id=obj.pk))
self.bulk_create(objs, batch_size=batch_size)
NewContent.objects.bulk_create(new_base_content, batch_size=batch_size)
return objs
class ContentBase(models.Model):
content_ptr = models.OneToOneField(NewContent, primary_key=True, default=uuid.uuid4, on_delete=models.CASCADE)
objects = NewPulpManager()
@classmethod
def get_pulp_type(cls):
return cls.TYPE
@classmethod
def natural_key_fields(cls):
"""
Returns a tuple of the natural key fields which usually equates to unique_together fields
"""
return tuple(chain.from_iterable(cls._meta.unique_together))
def natural_key(self):
"""
Get the model's natural key based on natural_key_fields.
Returns:
tuple: The natural key.
"""
return tuple(getattr(self, f) for f in self.natural_key_fields())
def natural_key_dict(self):
"""
Get the model's natural key as a dictionary of keys and values.
"""
to_return = {}
for key in self.natural_key_fields():
to_return[key] = getattr(self, key)
return to_return
class Meta:
abstract=True
class NewFileContent(ContentBase):
TYPE = "file.file"
relative_path = models.TextField(null=False)
digest = models.CharField(max_length=64, null=False)
class Meta:
unique_together = ("relative_path", "digest")
The speedup is about 3x vs what we currently do.
Old
In [5]: %time PulpFileContent.objects.bulk_get_or_create(old_content)
CPU times: user 555 ms, sys: 88.6 ms, total: 643 ms
Wall time: 1.77 s
New
In [7]: %time NewFileContent.objects.bulk_get_or_create(new_content)
CPU times: user 648 ms, sys: 1.39 ms, total: 649 ms
Wall time: 694 ms
Updated by dkliban@redhat.com over 2 years ago
- Status changed from NEW to CLOSED - WONTFIX
Updated by dalley over 2 years ago
- Status changed from CLOSED - WONTFIX to NEW
- Tags Wishlist added
Updated by pulpbot over 1 year ago
- Description updated (diff)
- Status changed from NEW to CLOSED - DUPLICATE