Issue #9399
closedsync error: invalid memory alloc request size
Description
pulp version: container image docker.io/pulp/pulp:3.15
remote repository definition:
$ pulp --base-url http://localhost:8080 --username admin --password secret rpm remote show --name "packages-microsoft-com-prod-rhel8"
{
"pulp_href": "/pulp/api/v3/remotes/rpm/rpm/c455d98d-4c8c-446d-aa4d-dffe568675d6/",
"pulp_created": "2021-09-14T14:51:03.146140Z",
"name": "packages-microsoft-com-prod-rhel8",
"url": "https://packages.microsoft.com/rhel/8/prod/",
"ca_cert": null,
"client_cert": null,
"tls_validation": true,
"proxy_url": "http://proxy.example.com:8080",
"pulp_labels": {},
"pulp_last_updated": "2021-09-14T14:51:03.146171Z",
"download_concurrency": null,
"max_retries": null,
"policy": "immediate",
"total_timeout": null,
"connect_timeout": null,
"sock_connect_timeout": null,
"sock_read_timeout": null,
"headers": null,
"rate_limit": null,
"sles_auth_token": null
}
error:
$ podman logs --follow pulp
pulp [505c8f64043741a7b8f09eac46fa8331]: pulpcore.tasking.pulpcore_worker:INFO: Task a7a26b28-9251-4da1-82ef-19272b6779f0 failed (invalid memory alloc request size 1073741824
)
pulp [505c8f64043741a7b8f09eac46fa8331]: pulpcore.tasking.pulpcore_worker:INFO: File "/usr/local/lib/python3.8/site-packages/pulpcore/tasking/pulpcore_worker.py", line 323, in _perform_task
result = func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py", line 471, in synchronize
version = dv.create()
File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/declarative_version.py", line 151, in create
loop.run_until_complete(pipeline)
File "/usr/lib64/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/api.py", line 225, in create_pipeline
await asyncio.gather(*futures)
File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/api.py", line 43, in __call__
await self.run()
File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/content_stages.py", line 174, in run
await sync_to_async(process_batch)()
File "/usr/local/lib/python3.8/site-packages/asgiref/sync.py", line 444, in __call__
ret = await asyncio.wait_for(future, timeout=None)
File "/usr/lib64/python3.8/asyncio/tasks.py", line 455, in wait_for
return await fut
File "/usr/lib64/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/asgiref/sync.py", line 486, in thread_handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/content_stages.py", line 122, in process_batch
d_content.content.save()
File "/usr/local/lib/python3.8/site-packages/pulpcore/app/models/base.py", line 149, in save
return super().save(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/django_lifecycle/mixins.py", line 134, in save
save(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 726, in save
self.save_base(using=using, force_insert=force_insert,
File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 763, in save_base
updated = self._save_table(
File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 868, in _save_table
results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 906, in _do_insert
return manager._insert(
File "/usr/local/lib/python3.8/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 1270, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
All other repo sync jobs work without problems. (e.g. official rhel(7|8) repos, epel, ...)
Is anyone able to reproduce this MS repo issue?
Files
Related issues
Updated by keilr over 3 years ago
Free memory shouldn't be an issue. The server has 12GB memory, so around 10GB available for sync jobs.
Updated by dalley over 3 years ago
I've never seen this error before. It seems to be a Postgresql error that is thrown when a query is larger than Postgresql's 1gb query size limit, and "1073741824" is 550 bytes more than 1gb.
It looks like the "filelists" metadata has an absolutely insane expansion factor, presumably because the repo has many copies of a few very large packages.
<data type="filelists">
.... snip ....
<size>7943458</size>
<open-size>1199799539</open-size>
</data>
That is, 7.6 megabytes compressed expands to 1.2gb decompressed.
I'm guessing the problem is that we try to batch inserts for efficiency, with a batch size of around 500 IIRC, so all of the packages would be included in this batch - and the transaction is growing too large for Postgresql to handle.
Normally this would perfectly fine but this particular repo is so "dense" in terms of amount of metadata per package that it is not.
Updated by dalley over 3 years ago
Additionally, this repo is a little strange for other reasons. A couple packages are listed twice with the same NEVRA (name-epoch-version-release-architecture) but different checksums. You're really not supposed to do that. It's difficult to know which package clients would actually end up installing. It looks like the only difference between the two is the filename itself, "blobfuse-1.4.1-RHEL-8.1-x86_64.rpm" vs "blobfuse-1.4.1-RHEL-8.2-x86_64.rpm" and I'm pretty sure that clients like DNF and YUM can't make that distinction.
Updated by dalley over 3 years ago
- Related to Issue #9406: Trivial OOM on sync for a particular Microsoft repo added
Updated by dalley over 3 years ago
I actually can't reproduce because syncing this repository causes my VM to run out of memory (it has 9.6gb available).
That is a problem in and of itself, the memory consumption shouldn't be that high, and isn't for most repos, and I just confirmed that it does not do that for RHEL7, which is overall a much larger repo. So something weird is going on there as well - filed #9406 for that.
Updated by dalley over 3 years ago
- Priority changed from Normal to High
- Severity changed from 2. Medium to 3. High
- Triaged changed from No to Yes
Updated by dalley over 3 years ago
I reported the issues from #3, and they have been fixed, and it did not help. Oh well. Investigation continues.
Updated by dalley over 3 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dalley
Updated by TiagodCC over 3 years ago
dalley wrote:
Additionally, this repo is a little strange for other reasons. A couple packages are listed twice with the same NEVRA (name-epoch-version-release-architecture) but different checksums. You're really not supposed to do that. It's difficult to know which package clients would actually end up installing. It looks like the only difference between the two is the filename itself, "blobfuse-1.4.1-RHEL-8.1-x86_64.rpm" vs "blobfuse-1.4.1-RHEL-8.2-x86_64.rpm" and I'm pretty sure that clients like DNF and YUM can't make that distinction.
Hi dalley We have contacted Microsoft to resolve this issue. Microsoft and us aren't able to reproduce the problem. So, I wanted to ask you to describe the issue a bit more detailed. It would be helpful, if you could explain, which repo you have used and which steps you have performed to receive the errors. As I have understood the problem is that there are identical packages in multiple repositories with other package IDs and names. Therefore, it would be correct, if the identical packages with their IDs would be published without mentioning a RHEL version like "RHEL-8.X". Instead they should use the RHEL major version such as "el8". Thank you in advance for your efforts and answer.
Updated by dalley over 3 years ago
Hey @tiagodOC,
It was already dealt with, see note 7: https://pulp.plan.io/issues/9399?pn=1#note-7
https://github.com/dotnet/core/issues/6706
It didn't help with this issue, perhaps there are others that do, I haven't had confirmed one way or the other yet.
Updated by rchan about 3 years ago
- Sprint changed from Sprint 108 to Sprint 109
Updated by rchan about 3 years ago
- Sprint changed from Sprint 109 to Sprint 110
Updated by rchan about 3 years ago
- Sprint changed from Sprint 110 to Sprint 111
Updated by dalley about 3 years ago
It appears that this particular microsoft repo contains a package with 13 million files associated - or rather they are the same set of files repeated such that there are nearly 13 million duplicate files listed. I've filed it upstream
https://github.com/dotnet/core/issues/6706#issuecomment-986330681
Updated by dalley about 3 years ago
- Status changed from ASSIGNED to CLOSED - NOTABUG
Since the problem isn't Pulp, and there's really nothing we can do about this Postgresql insert size limit, I'm going to close this issue.