Project

Profile

Help

Issue #9399

closed

sync error: invalid memory alloc request size

Added by keilr about 3 years ago. Updated almost 3 years ago.

Status:
CLOSED - NOTABUG
Priority:
High
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 111
Quarter:

Description

pulp version: container image docker.io/pulp/pulp:3.15

remote repository definition:

$ pulp --base-url http://localhost:8080 --username admin --password secret rpm remote show --name "packages-microsoft-com-prod-rhel8"
{
  "pulp_href": "/pulp/api/v3/remotes/rpm/rpm/c455d98d-4c8c-446d-aa4d-dffe568675d6/",
  "pulp_created": "2021-09-14T14:51:03.146140Z",
  "name": "packages-microsoft-com-prod-rhel8",
  "url": "https://packages.microsoft.com/rhel/8/prod/",
  "ca_cert": null,
  "client_cert": null,
  "tls_validation": true,
  "proxy_url": "http://proxy.example.com:8080",
  "pulp_labels": {},
  "pulp_last_updated": "2021-09-14T14:51:03.146171Z",
  "download_concurrency": null,
  "max_retries": null,
  "policy": "immediate",
  "total_timeout": null,
  "connect_timeout": null,
  "sock_connect_timeout": null,
  "sock_read_timeout": null,
  "headers": null,
  "rate_limit": null,
  "sles_auth_token": null
}

error:

$ podman logs --follow pulp
pulp [505c8f64043741a7b8f09eac46fa8331]: pulpcore.tasking.pulpcore_worker:INFO: Task a7a26b28-9251-4da1-82ef-19272b6779f0 failed (invalid memory alloc request size 1073741824
)
pulp [505c8f64043741a7b8f09eac46fa8331]: pulpcore.tasking.pulpcore_worker:INFO:   File "/usr/local/lib/python3.8/site-packages/pulpcore/tasking/pulpcore_worker.py", line 323, in _perform_task
    result = func(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py", line 471, in synchronize
    version = dv.create()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/declarative_version.py", line 151, in create
    loop.run_until_complete(pipeline)

  File "/usr/lib64/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/api.py", line 225, in create_pipeline
    await asyncio.gather(*futures)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/api.py", line 43, in __call__
    await self.run()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/content_stages.py", line 174, in run
    await sync_to_async(process_batch)()

  File "/usr/local/lib/python3.8/site-packages/asgiref/sync.py", line 444, in __call__
    ret = await asyncio.wait_for(future, timeout=None)

  File "/usr/lib64/python3.8/asyncio/tasks.py", line 455, in wait_for
    return await fut

  File "/usr/lib64/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/usr/local/lib/python3.8/site-packages/asgiref/sync.py", line 486, in thread_handler
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/content_stages.py", line 122, in process_batch
    d_content.content.save()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/app/models/base.py", line 149, in save
    return super().save(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/django_lifecycle/mixins.py", line 134, in save
    save(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 726, in save
    self.save_base(using=using, force_insert=force_insert,

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 763, in save_base
    updated = self._save_table(

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 868, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 906, in _do_insert
    return manager._insert(

  File "/usr/local/lib/python3.8/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)

  File "/usr/local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)

All other repo sync jobs work without problems. (e.g. official rhel(7|8) repos, epel, ...)

Is anyone able to reproduce this MS repo issue?


Files


Related issues

Related to RPM Support - Issue #9406: Trivial OOM on sync for a particular Microsoft repoCLOSED - NOTABUGdalleyActions
Actions #1

Updated by keilr about 3 years ago

Free memory shouldn't be an issue. The server has 12GB memory, so around 10GB available for sync jobs.

Actions #2

Updated by dalley about 3 years ago

I've never seen this error before. It seems to be a Postgresql error that is thrown when a query is larger than Postgresql's 1gb query size limit, and "1073741824" is 550 bytes more than 1gb.

It looks like the "filelists" metadata has an absolutely insane expansion factor, presumably because the repo has many copies of a few very large packages.

<data type="filelists">
.... snip ....
<size>7943458</size>
<open-size>1199799539</open-size>
</data>

That is, 7.6 megabytes compressed expands to 1.2gb decompressed.

I'm guessing the problem is that we try to batch inserts for efficiency, with a batch size of around 500 IIRC, so all of the packages would be included in this batch - and the transaction is growing too large for Postgresql to handle.

Normally this would perfectly fine but this particular repo is so "dense" in terms of amount of metadata per package that it is not.

Actions #3

Updated by dalley about 3 years ago

Additionally, this repo is a little strange for other reasons. A couple packages are listed twice with the same NEVRA (name-epoch-version-release-architecture) but different checksums. You're really not supposed to do that. It's difficult to know which package clients would actually end up installing. It looks like the only difference between the two is the filename itself, "blobfuse-1.4.1-RHEL-8.1-x86_64.rpm" vs "blobfuse-1.4.1-RHEL-8.2-x86_64.rpm" and I'm pretty sure that clients like DNF and YUM can't make that distinction.

Actions #4

Updated by dalley about 3 years ago

  • Related to Issue #9406: Trivial OOM on sync for a particular Microsoft repo added
Actions #5

Updated by dalley about 3 years ago

I actually can't reproduce because syncing this repository causes my VM to run out of memory (it has 9.6gb available).

That is a problem in and of itself, the memory consumption shouldn't be that high, and isn't for most repos, and I just confirmed that it does not do that for RHEL7, which is overall a much larger repo. So something weird is going on there as well - filed #9406 for that.

Actions #6

Updated by dalley about 3 years ago

  • Priority changed from Normal to High
  • Severity changed from 2. Medium to 3. High
  • Triaged changed from No to Yes
Actions #7

Updated by dalley about 3 years ago

I reported the issues from #3, and they have been fixed, and it did not help. Oh well. Investigation continues.

Actions #8

Updated by dalley about 3 years ago

  • Sprint set to Sprint 105
Actions #9

Updated by dalley about 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dalley
Actions #10

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 105 to Sprint 106
Actions #12

Updated by TiagodCC about 3 years ago

dalley wrote:

Additionally, this repo is a little strange for other reasons. A couple packages are listed twice with the same NEVRA (name-epoch-version-release-architecture) but different checksums. You're really not supposed to do that. It's difficult to know which package clients would actually end up installing. It looks like the only difference between the two is the filename itself, "blobfuse-1.4.1-RHEL-8.1-x86_64.rpm" vs "blobfuse-1.4.1-RHEL-8.2-x86_64.rpm" and I'm pretty sure that clients like DNF and YUM can't make that distinction.

Hi dalley We have contacted Microsoft to resolve this issue. Microsoft and us aren't able to reproduce the problem. So, I wanted to ask you to describe the issue a bit more detailed. It would be helpful, if you could explain, which repo you have used and which steps you have performed to receive the errors. As I have understood the problem is that there are identical packages in multiple repositories with other package IDs and names. Therefore, it would be correct, if the identical packages with their IDs would be published without mentioning a RHEL version like "RHEL-8.X". Instead they should use the RHEL major version such as "el8". Thank you in advance for your efforts and answer.

Actions #13

Updated by dalley about 3 years ago

Hey @tiagodOC,

It was already dealt with, see note 7: https://pulp.plan.io/issues/9399?pn=1#note-7

https://github.com/dotnet/core/issues/6706

It didn't help with this issue, perhaps there are others that do, I haven't had confirmed one way or the other yet.

Actions #14

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 106 to Sprint 107
Actions #15

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 107 to Sprint 108
Actions #16

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 108 to Sprint 109
Actions #17

Updated by rchan about 3 years ago

  • Sprint changed from Sprint 109 to Sprint 110
Actions #18

Updated by rchan almost 3 years ago

  • Sprint changed from Sprint 110 to Sprint 111
Actions #19

Updated by dalley almost 3 years ago

It appears that this particular microsoft repo contains a package with 13 million files associated - or rather they are the same set of files repeated such that there are nearly 13 million duplicate files listed. I've filed it upstream

https://github.com/dotnet/core/issues/6706#issuecomment-986330681

Actions #20

Updated by dalley almost 3 years ago

  • Status changed from ASSIGNED to CLOSED - NOTABUG

Since the problem isn't Pulp, and there's really nothing we can do about this Postgresql insert size limit, I'm going to close this issue.

Also available in: Atom PDF