Project

Profile

Help

Issue #8990

closed

Memory consumption issues during large copy operations

Added by dalley almost 3 years ago. Updated almost 3 years ago.

Status:
CLOSED - WONTFIX
Priority:
Normal
Assignee:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Copied from Github Discussions: https://github.com/pulp/community/discussions/37


Hello,

First of all, I want to apologize in advance if this isn't the right place to open this discussion. Please let me know if there's somewhere more appropriate for me to request help.

We have been having issues using Oracle’s Linux yum repos for Oracle Linux 7 with Pulp. We had been running an older version Pulp, 2.17.1, but after we started experiencing issues we upgraded to Pulp 2.21.5.

We are taking snapshots of these two Oracle Linux repositories: https://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64/index.html https://yum.oracle.com/repo/OracleLinux/OL7/optional/latest/x86_64/index.html

We “synchronize” these repos into Pulp using the commands “pulp-admin rpm repo sync” and “pulp-admin rpm repo publish”. Both of these Pulp tasks run successfully. However, after this step is done, I can observe that if I try to look at the contents of the repository with “pulp-admin rpm repo content rpm --repo-id oraclelinux7-base” I receive an internal error from Pulp.

$ sudo pulp-admin rpm repo content rpm --repo-id oraclelinux7-base
There was an internal server error while trying to access the Pulp application.
One possible cause is that the database needs to be migrated to the latest
version. If this is the case, run pulp-manage-db and restart the services. More
information may be found in Apache's log.

Both of the repos, “latest” and “optional”, show this same symptom.

Next, normally what we do is we copy the contents of our Pulp repos into a “snapshot” so that we retain the state of the repo at the point when we updated it. This is so that we can pin a particular set of updates to an environment before we update and keep track of the changes. To make a snapshot, we have a script that runs the pulp-admin command to create a new repo for the snapshot, then uses “pulp-admin rpm repo copy all“ to copy the synchronized base repo into the snapshot repo, then finally we run the publish task on the snapshot. This is where the problem happens.

“Optional” can create snapshots just fine, in spite of the error above. But when I create a snapshot of “latest”, an error occurs that kills the Pulp worker and leaves the ‘associate’ task in a ‘running’ state indefinitely.

Here is a sample log from journalctl:

Jun 25 13:49:15 amerpulp-01-iae pulp[16103]: celery.app.trace:INFO: [ad8688b3] Task pulp.server.controllers.repository.download_deferred[ad8688b3-c715-4971-9c7e-0f15738011ab] succeeded in 0.0740167059994s: None
Jun 25 13:49:15 amerpulp-01-iae pulp[18122]: pulp.server.db.connection:INFO: Attempting to connect to localhost
Jun 25 13:49:15 amerpulp-01-iae pulp[18122]: pulp.server.db.connection:INFO: Attempting to connect to localhost
Jun 25 13:49:15 amerpulp-01-iae pulp[18122]: pulp.server.db.connection:INFO: Write concern for Mongo connection: {}
Jun 25 13:52:22 amerpulp-01-iae pulp[17067]: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism PLAIN
Jun 25 13:52:22 amerpulp-01-iae pulp[1188]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[b846c919-0277-4a2f-afdd-c3b437bc278e]
Jun 25 13:52:22 amerpulp-01-iae pulp[1239]: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.unit_association.associate_from_repo[d1485c84-0e53-40ec-9c23-9507900b2fe0]
Jun 25 13:52:22 amerpulp-01-iae pulp[1239]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[72b440fc-04bd-49b4-a594-7a4cf7e77780]
Jun 25 13:52:22 amerpulp-01-iae pulp[1458]: celery.app.trace:INFO: [b846c919] Task pulp.server.async.tasks._queue_reserved_task[b846c919-0277-4a2f-afdd-c3b437bc278e] succeeded in 0.0275188770029s: None
Jun 25 14:00:47 amerpulp-01-iae pulp[1239]: pulp.server.async.worker_watcher:WARNING: Worker reserved_resource_worker-1@amerpulp-01-iae heartbeat time 74.552641s exceeds heartbeat interval. Consider adjusting the worker_timeout setting.
Jun 25 14:00:47 amerpulp-01-iae pulp[1188]: pulp.server.async.worker_watcher:WARNING: Worker resource_manager@amerpulp-01-iae heartbeat time 32.787201s exceeds heartbeat interval. Consider adjusting the worker_timeout setting.
Jun 25 14:00:47 amerpulp-01-iae pulp[1236]: pulp.server.async.worker_watcher:WARNING: Worker reserved_resource_worker-0@amerpulp-01-iae heartbeat time 74.560998s exceeds heartbeat interval. Consider adjusting the worker_timeout setting.
Jun 25 14:00:48 amerpulp-01-iae pulp[1239]: celery.worker.request:ERROR: (1239-47616) Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Jun 25 14:00:48 amerpulp-01-iae pulp[1239]: celery.worker.request:ERROR: (1239-47616) Traceback (most recent call last):
Jun 25 14:00:48 amerpulp-01-iae pulp[1239]: celery.worker.request:ERROR: (1239-47616)   File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
Jun 25 14:00:48 amerpulp-01-iae pulp[1239]: celery.worker.request:ERROR: (1239-47616)     human_status(exitcode)),
Jun 25 14:00:48 amerpulp-01-iae pulp[1239]: celery.worker.request:ERROR: (1239-47616) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
Jun 25 14:00:48 amerpulp-01-iae pulp[18964]: pulp.server.db.connection:INFO: Attempting to connect to localhost
Jun 25 14:00:48 amerpulp-01-iae pulp[18964]: pulp.server.db.connection:INFO: Attempting to connect to localhost
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: pulp.server.db.connection:INFO: Write concern for Mongo connection: {}
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: py.warnings:WARNING: [72b440fc] (18964-47616) /usr/lib64/python2.7/site-packages/pymongo/topology.py:74: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for detail
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: py.warnings:WARNING: [72b440fc] (18964-47616)   "MongoClient opened before fork. Create MongoClient "
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: py.warnings:WARNING: [72b440fc] (18964-47616)
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: pulp.server.async.tasks:INFO: [72b440fc] Task failed : [d1485c84-0e53-40ec-9c23-9507900b2fe0] : Worker terminated abnormally while processing task d1485c84-0e53-40ec-9c23-9507900b2fe0.  Check the logs for details
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616) Task pulp.server.async.tasks._release_resource[72b440fc-04bd-49b4-a594-7a4cf7e77780] raised unexpected: UnboundLocalError("local variable 'original_formatted_traceback' referenced before assignment",)
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616) Traceback (most recent call last):
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616)     R = retval = fun(*args, **kwargs)
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 110, in __call__
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616)     return super(PulpTask, self).__call__(*args, **kwargs)
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616)     return self.run(*args, **kwargs)
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 378, in _release_resource
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616)     new_task.on_failure(exception, task_id, (), {}, MyEinfo)
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 781, in on_failure
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616)     _logger.debug(original_formatted_traceback)
Jun 25 14:00:49 amerpulp-01-iae pulp[18964]: celery.app.trace:ERROR: [72b440fc] (18964-47616) UnboundLocalError: local variable 'original_formatted_traceback' referenced before assignment
Jun 25 14:00:53 amerpulp-01-iae pulp[1178]: pulp.server.async.scheduler:ERROR: Worker 'scheduler@amerpulp-01-iae' has gone missing, removing from list of workers
Jun 25 14:00:53 amerpulp-01-iae pulp[1178]: pulp.server.async.scheduler:ERROR: There are 0 pulp_celerybeat processes running. Pulp will not operate correctly without at least one pulp_celerybeat process running.
Jun 25 14:00:53 amerpulp-01-iae pulp[1178]: pulp.server.async.worker_watcher:INFO: New worker 'scheduler@amerpulp-01-iae' discovered

At 14:00:48 process 1239 had the pulp worker killed prematurely while the task was running.

Is there anything I can do to remedy this? Why does it only happen with https://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64/index.html? (As a test, I tried the same thing with the “archive” repo Oracle provides, which seems to have the same packages as 'latest', and achieved the same results. I also tried rebuilding our Pulp server with no snapshots, and reproduced the same symptoms)

Thank you so much for reading this!


Here's the commands we run:

pulp-admin rpm repo copy all --from-repo-id="${SRC}" --to-repo-id="${DEST}" || fail "copy failed" pulp-admin rpm repo publish run --repo-id="${DEST}" || fail "publish failed"

It doesn't look like any additional options are given to the copy. Should I be? I can amend the script.

I should mention I think the 'internal server error' I get when I look at the RPM contents of these repos is might be a red herring. The only repo that fails a copy is the oraclelinux7-base one. Or perhaps it's two issues, related or unrelated....

One more thing - I increased the memory on the server from 16 to 32 GB and now it was able to copy the snapshot successfully

Actions #1

Updated by ttereshc almost 3 years ago

  • Tags Pulp 2 added
Actions #2

Updated by dalley almost 3 years ago

  • Status changed from NEW to CLOSED - WONTFIX

This is actually a longstanding (and difficult to fix) bug - given that Pulp 2 is in maintenance mode, it's unlikely to be fixed.

The workaround is to not use --all, and copy per-type or using a content filter.

Also available in: Atom PDF