Project

Profile

Help

Issue #9399

sync error: invalid memory alloc request size

Added by keilr 2 months ago. Updated 10 days ago.

Status:
ASSIGNED
Priority:
High
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 110
Quarter:

Description

pulp version: container image docker.io/pulp/pulp:3.15

remote repository definition:

$ pulp --base-url http://localhost:8080 --username admin --password secret rpm remote show --name "packages-microsoft-com-prod-rhel8"
{
  "pulp_href": "/pulp/api/v3/remotes/rpm/rpm/c455d98d-4c8c-446d-aa4d-dffe568675d6/",
  "pulp_created": "2021-09-14T14:51:03.146140Z",
  "name": "packages-microsoft-com-prod-rhel8",
  "url": "https://packages.microsoft.com/rhel/8/prod/",
  "ca_cert": null,
  "client_cert": null,
  "tls_validation": true,
  "proxy_url": "http://proxy.example.com:8080",
  "pulp_labels": {},
  "pulp_last_updated": "2021-09-14T14:51:03.146171Z",
  "download_concurrency": null,
  "max_retries": null,
  "policy": "immediate",
  "total_timeout": null,
  "connect_timeout": null,
  "sock_connect_timeout": null,
  "sock_read_timeout": null,
  "headers": null,
  "rate_limit": null,
  "sles_auth_token": null
}

error:

$ podman logs --follow pulp
pulp [505c8f64043741a7b8f09eac46fa8331]: pulpcore.tasking.pulpcore_worker:INFO: Task a7a26b28-9251-4da1-82ef-19272b6779f0 failed (invalid memory alloc request size 1073741824
)
pulp [505c8f64043741a7b8f09eac46fa8331]: pulpcore.tasking.pulpcore_worker:INFO:   File "/usr/local/lib/python3.8/site-packages/pulpcore/tasking/pulpcore_worker.py", line 323, in _perform_task
    result = func(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py", line 471, in synchronize
    version = dv.create()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/declarative_version.py", line 151, in create
    loop.run_until_complete(pipeline)

  File "/usr/lib64/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/api.py", line 225, in create_pipeline
    await asyncio.gather(*futures)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/api.py", line 43, in __call__
    await self.run()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/content_stages.py", line 174, in run
    await sync_to_async(process_batch)()

  File "/usr/local/lib/python3.8/site-packages/asgiref/sync.py", line 444, in __call__
    ret = await asyncio.wait_for(future, timeout=None)

  File "/usr/lib64/python3.8/asyncio/tasks.py", line 455, in wait_for
    return await fut

  File "/usr/lib64/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/usr/local/lib/python3.8/site-packages/asgiref/sync.py", line 486, in thread_handler
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/content_stages.py", line 122, in process_batch
    d_content.content.save()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/app/models/base.py", line 149, in save
    return super().save(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/django_lifecycle/mixins.py", line 134, in save
    save(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 726, in save
    self.save_base(using=using, force_insert=force_insert,

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 763, in save_base
    updated = self._save_table(

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 868, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 906, in _do_insert
    return manager._insert(

  File "/usr/local/lib/python3.8/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)

  File "/usr/local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)

All other repo sync jobs work without problems. (e.g. official rhel(7|8) repos, epel, ...)

Is anyone able to reproduce this MS repo issue?


Related issues

Related to RPM Support - Issue #9406: Trivial OOM on sync for a particular Microsoft repoASSIGNED<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

History

#1 Updated by keilr 2 months ago

Free memory shouldn't be an issue. The server has 12GB memory, so around 10GB available for sync jobs.

#2 Updated by dalley 2 months ago

I've never seen this error before. It seems to be a Postgresql error that is thrown when a query is larger than Postgresql's 1gb query size limit, and "1073741824" is 550 bytes more than 1gb.

It looks like the "filelists" metadata has an absolutely insane expansion factor, presumably because the repo has many copies of a few very large packages.

<data type="filelists">
.... snip ....
<size>7943458</size>
<open-size>1199799539</open-size>
</data>

That is, 7.6 megabytes compressed expands to 1.2gb decompressed.

I'm guessing the problem is that we try to batch inserts for efficiency, with a batch size of around 500 IIRC, so all of the packages would be included in this batch - and the transaction is growing too large for Postgresql to handle.

Normally this would perfectly fine but this particular repo is so "dense" in terms of amount of metadata per package that it is not.

#3 Updated by dalley 2 months ago

250

Additionally, this repo is a little strange for other reasons. A couple packages are listed twice with the same NEVRA (name-epoch-version-release-architecture) but different checksums. You're really not supposed to do that. It's difficult to know which package clients would actually end up installing. It looks like the only difference between the two is the filename itself, "blobfuse-1.4.1-RHEL-8.1-x86_64.rpm" vs "blobfuse-1.4.1-RHEL-8.2-x86_64.rpm" and I'm pretty sure that clients like DNF and YUM can't make that distinction.

#4 Updated by dalley 2 months ago

  • Related to Issue #9406: Trivial OOM on sync for a particular Microsoft repo added

#5 Updated by dalley 2 months ago

I actually can't reproduce because syncing this repository causes my VM to run out of memory (it has 9.6gb available).

That is a problem in and of itself, the memory consumption shouldn't be that high, and isn't for most repos, and I just confirmed that it does not do that for RHEL7, which is overall a much larger repo. So something weird is going on there as well - filed #9406 for that.

#6 Updated by dalley 2 months ago

  • Priority changed from Normal to High
  • Severity changed from 2. Medium to 3. High
  • Triaged changed from No to Yes

#7 Updated by dalley 2 months ago

I reported the issues from #3, and they have been fixed, and it did not help. Oh well. Investigation continues.

#8 Updated by dalley 2 months ago

  • Sprint set to Sprint 105

#9 Updated by dalley 2 months ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley

#10 Updated by rchan 2 months ago

  • Sprint changed from Sprint 105 to Sprint 106

#12 Updated by TiagodCC about 2 months ago

dalley wrote:

Additionally, this repo is a little strange for other reasons. A couple packages are listed twice with the same NEVRA (name-epoch-version-release-architecture) but different checksums. You're really not supposed to do that. It's difficult to know which package clients would actually end up installing. It looks like the only difference between the two is the filename itself, "blobfuse-1.4.1-RHEL-8.1-x86_64.rpm" vs "blobfuse-1.4.1-RHEL-8.2-x86_64.rpm" and I'm pretty sure that clients like DNF and YUM can't make that distinction.

Hi dalley We have contacted Microsoft to resolve this issue. Microsoft and us aren't able to reproduce the problem. So, I wanted to ask you to describe the issue a bit more detailed. It would be helpful, if you could explain, which repo you have used and which steps you have performed to receive the errors. As I have understood the problem is that there are identical packages in multiple repositories with other package IDs and names. Therefore, it would be correct, if the identical packages with their IDs would be published without mentioning a RHEL version like "RHEL-8.X". Instead they should use the RHEL major version such as "el8". Thank you in advance for your efforts and answer.

#13 Updated by dalley about 2 months ago

Hey @tiagodOC,

It was already dealt with, see note 7: https://pulp.plan.io/issues/9399?pn=1#note-7

https://github.com/dotnet/core/issues/6706

It didn't help with this issue, perhaps there are others that do, I haven't had confirmed one way or the other yet.

#14 Updated by rchan about 2 months ago

  • Sprint changed from Sprint 106 to Sprint 107

#15 Updated by rchan about 1 month ago

  • Sprint changed from Sprint 107 to Sprint 108

#16 Updated by rchan 24 days ago

  • Sprint changed from Sprint 108 to Sprint 109

#17 Updated by rchan 10 days ago

  • Sprint changed from Sprint 109 to Sprint 110

Please register to edit this issue

Also available in: Atom PDF