Issue #9399: sync error: invalid memory alloc request size - RPM Support - Pulp

Actions

Send by e-mail Copy link

Issue #9399

closed

sync error: invalid memory alloc request size

Added by keilr over 3 years ago. Updated about 3 years ago.

Status:

CLOSED - NOTABUG

Priority:

High

Assignee:

dalley

Sprint/Milestone:

Start date:

Due date:

Estimated time:

Severity:

3. High

Version:

Platform Release:

OS:

Triaged:

Yes

Groomed:

Sprint Candidate:

Tags:

Sprint:

Sprint 111

Quarter:

Description

pulp version: container image docker.io/pulp/pulp:3.15

remote repository definition:

$ pulp --base-url http://localhost:8080 --username admin --password secret rpm remote show --name "packages-microsoft-com-prod-rhel8"
{
  "pulp_href": "/pulp/api/v3/remotes/rpm/rpm/c455d98d-4c8c-446d-aa4d-dffe568675d6/",
  "pulp_created": "2021-09-14T14:51:03.146140Z",
  "name": "packages-microsoft-com-prod-rhel8",
  "url": "https://packages.microsoft.com/rhel/8/prod/",
  "ca_cert": null,
  "client_cert": null,
  "tls_validation": true,
  "proxy_url": "http://proxy.example.com:8080",
  "pulp_labels": {},
  "pulp_last_updated": "2021-09-14T14:51:03.146171Z",
  "download_concurrency": null,
  "max_retries": null,
  "policy": "immediate",
  "total_timeout": null,
  "connect_timeout": null,
  "sock_connect_timeout": null,
  "sock_read_timeout": null,
  "headers": null,
  "rate_limit": null,
  "sles_auth_token": null
}

error:

$ podman logs --follow pulp
pulp [505c8f64043741a7b8f09eac46fa8331]: pulpcore.tasking.pulpcore_worker:INFO: Task a7a26b28-9251-4da1-82ef-19272b6779f0 failed (invalid memory alloc request size 1073741824
)
pulp [505c8f64043741a7b8f09eac46fa8331]: pulpcore.tasking.pulpcore_worker:INFO:   File "/usr/local/lib/python3.8/site-packages/pulpcore/tasking/pulpcore_worker.py", line 323, in _perform_task
    result = func(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py", line 471, in synchronize
    version = dv.create()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/declarative_version.py", line 151, in create
    loop.run_until_complete(pipeline)

  File "/usr/lib64/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/api.py", line 225, in create_pipeline
    await asyncio.gather(*futures)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/api.py", line 43, in __call__
    await self.run()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/content_stages.py", line 174, in run
    await sync_to_async(process_batch)()

  File "/usr/local/lib/python3.8/site-packages/asgiref/sync.py", line 444, in __call__
    ret = await asyncio.wait_for(future, timeout=None)

  File "/usr/lib64/python3.8/asyncio/tasks.py", line 455, in wait_for
    return await fut

  File "/usr/lib64/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/usr/local/lib/python3.8/site-packages/asgiref/sync.py", line 486, in thread_handler
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/content_stages.py", line 122, in process_batch
    d_content.content.save()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/app/models/base.py", line 149, in save
    return super().save(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/django_lifecycle/mixins.py", line 134, in save
    save(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 726, in save
    self.save_base(using=using, force_insert=force_insert,

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 763, in save_base
    updated = self._save_table(

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 868, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 906, in _do_insert
    return manager._insert(

  File "/usr/local/lib/python3.8/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)

  File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)

  File "/usr/local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value

  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)

All other repo sync jobs work without problems. (e.g. official rhel(7|8) repos, epel, ...)

Is anyone able to reproduce this MS repo issue?

Files

Screenshot from 2021-09-15 22-19-03.png (62.7 KB) Screenshot from 2021-09-15 22-19-03.png

dalley, 09/16/2021 04:20 AM

Related issues

Actions

Copy link

Updated by keilr over 3 years ago

Free memory shouldn't be an issue. The server has 12GB memory, so around 10GB available for sync jobs.

Actions

Copy link

Updated by dalley over 3 years ago

I've never seen this error before. It seems to be a Postgresql error that is thrown when a query is larger than Postgresql's 1gb query size limit, and "1073741824" is 550 bytes more than 1gb.

It looks like the "filelists" metadata has an absolutely insane expansion factor, presumably because the repo has many copies of a few very large packages.

<data type="filelists">
.... snip ....
<size>7943458</size>
<open-size>1199799539</open-size>
</data>

That is, 7.6 megabytes compressed expands to 1.2gb decompressed.

I'm guessing the problem is that we try to batch inserts for efficiency, with a batch size of around 500 IIRC, so all of the packages would be included in this batch - and the transaction is growing too large for Postgresql to handle.

Normally this would perfectly fine but this particular repo is so "dense" in terms of amount of metadata per package that it is not.

Actions

Copy link

Updated by dalley over 3 years ago

File Screenshot from 2021-09-15 22-19-03.png Screenshot from 2021-09-15 22-19-03.png added

Additionally, this repo is a little strange for other reasons. A couple packages are listed twice with the same NEVRA (name-epoch-version-release-architecture) but different checksums. You're really not supposed to do that. It's difficult to know which package clients would actually end up installing. It looks like the only difference between the two is the filename itself, "blobfuse-1.4.1-RHEL-8.1-x86_64.rpm" vs "blobfuse-1.4.1-RHEL-8.2-x86_64.rpm" and I'm pretty sure that clients like DNF and YUM can't make that distinction.

Actions

Copy link

Updated by dalley over 3 years ago

Related to Issue #9406: Trivial OOM on sync for a particular Microsoft repo added

Actions

Copy link

Updated by dalley over 3 years ago

I actually can't reproduce because syncing this repository causes my VM to run out of memory (it has 9.6gb available).

That is a problem in and of itself, the memory consumption shouldn't be that high, and isn't for most repos, and I just confirmed that it does not do that for RHEL7, which is overall a much larger repo. So something weird is going on there as well - filed #9406 for that.

Actions

Copy link

Updated by dalley over 3 years ago

Priority changed from Normal to High
Severity changed from 2. Medium to 3. High
Triaged changed from No to Yes

Actions

Copy link

Updated by dalley over 3 years ago

I reported the issues from #3, and they have been fixed, and it did not help. Oh well. Investigation continues.

Actions

Copy link

Updated by dalley over 3 years ago

Sprint set to Sprint 105

Actions

Copy link

Updated by dalley over 3 years ago

Status changed from NEW to ASSIGNED
Assignee set to dalley

Actions

Copy link

#10

Updated by rchan over 3 years ago

Sprint changed from Sprint 105 to Sprint 106

Actions

Copy link

#12

Updated by TiagodCC over 3 years ago

dalley wrote:

Additionally, this repo is a little strange for other reasons. A couple packages are listed twice with the same NEVRA (name-epoch-version-release-architecture) but different checksums. You're really not supposed to do that. It's difficult to know which package clients would actually end up installing. It looks like the only difference between the two is the filename itself, "blobfuse-1.4.1-RHEL-8.1-x86_64.rpm" vs "blobfuse-1.4.1-RHEL-8.2-x86_64.rpm" and I'm pretty sure that clients like DNF and YUM can't make that distinction.

Hi dalley We have contacted Microsoft to resolve this issue. Microsoft and us aren't able to reproduce the problem. So, I wanted to ask you to describe the issue a bit more detailed. It would be helpful, if you could explain, which repo you have used and which steps you have performed to receive the errors. As I have understood the problem is that there are identical packages in multiple repositories with other package IDs and names. Therefore, it would be correct, if the identical packages with their IDs would be published without mentioning a RHEL version like "RHEL-8.X". Instead they should use the RHEL major version such as "el8". Thank you in advance for your efforts and answer.

Actions

Copy link

#13

Updated by dalley over 3 years ago

Hey @tiagodOC,

It was already dealt with, see note 7: https://pulp.plan.io/issues/9399?pn=1#note-7

https://github.com/dotnet/core/issues/6706

It didn't help with this issue, perhaps there are others that do, I haven't had confirmed one way or the other yet.

Actions

Copy link

#14

Updated by rchan over 3 years ago

Sprint changed from Sprint 106 to Sprint 107

Actions

Copy link

#15

Updated by rchan over 3 years ago

Sprint changed from Sprint 107 to Sprint 108

Actions

Copy link

#16

Updated by rchan about 3 years ago

Sprint changed from Sprint 108 to Sprint 109

Actions

Copy link

#17

Updated by rchan about 3 years ago

Sprint changed from Sprint 109 to Sprint 110

Actions

Copy link

#18

Updated by rchan about 3 years ago

Sprint changed from Sprint 110 to Sprint 111

Actions

Copy link

#19

Updated by dalley about 3 years ago

It appears that this particular microsoft repo contains a package with 13 million files associated - or rather they are the same set of files repeated such that there are nearly 13 million duplicate files listed. I've filed it upstream

https://github.com/dotnet/core/issues/6706#issuecomment-986330681

Actions

Copy link

#20

Updated by dalley about 3 years ago

Status changed from ASSIGNED to CLOSED - NOTABUG

Since the problem isn't Pulp, and there's really nothing we can do about this Postgresql insert size limit, I'm going to close this issue.

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

RPM Support

Agile boards

Custom queries

Issue #9399

sync error: invalid memory alloc request size

Updated by keilr over 3 years ago

Updated by dalley over 3 years ago

Updated by dalley over 3 years ago

Updated by dalley over 3 years ago

Updated by dalley over 3 years ago

Updated by dalley over 3 years ago

Updated by dalley over 3 years ago

Updated by dalley over 3 years ago

Updated by dalley over 3 years ago

Updated by rchan over 3 years ago

Updated by TiagodCC over 3 years ago

Updated by dalley over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan over 3 years ago

Updated by rchan about 3 years ago

Updated by rchan about 3 years ago

Updated by rchan about 3 years ago

Updated by dalley about 3 years ago

Updated by dalley about 3 years ago