Issue #1669
closedMemory leak in Pulp celery processes
Description
Via IRC I received a report that on at least two occasions, Pulp workers of a large Pulp installation showed signs of memory leak. The leak consumed so much memory that at a certain point, all subsequent tasks fail with "Cannot allocate memory".
When they are in this situation there are no tasks waiting or running in Pulp as is evident by their output provided for:
No running on waiting tasks in pulp
pulp:PRIMARY> db.task_status.find({"state":{"$in":["running", "waiting"]}})
pulp:PRIMARY>
Also you can clearly see the offending processes are Pulp celery processes which are idle but consuming large amounts of memory.
bash-4.1# ps u 16012
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
apache 16012 0.2 15.1 3016808 2475812 ? Sl Feb10 4:17 /usr/bin/python -m celery.__main__ worker -c 1 -n reserved_resource_worker-1@pulp04.example.com --events --a
bash-4.1# ps u 16121
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
apache 16121 0.3 13.6 2767072 2224064 ? Sl Feb10 5:24 /usr/bin/python -m celery.__main__ worker -c 1 -n reserved_resource_worker-5@pulp04.example.com --events --a
bash-4.1# ps u 16206
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
apache 16206 0.2 12.8 3092120 2105596 ? Sl Feb10 3:34 /usr/bin/python -m celery.__main__ worker -c 1 -n reserved_resource_worker-7@pulp04.example.com --events --a
bash-4.1# ps u 16141
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
apache 16141 0.3 12.5 2621912 2056276 ? Sl Feb10 4:31 /usr/bin/python -m celery.__main__ worker -c 1 -n reserved_resource_worker-6@pulp04.example.com --events --a
It's not yet clear if this is happening over time versus instantaneously, or if it is related to a specific task type. It has been suggested that this may be related[0]. We should also consider that this could be an upstream Celery issue also which has several memory leak related bug reports[1].
[0]: https://dzone.com/articles/diagnosing-memory-leaks-python
[1]: https://github.com/celery/celery/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+leak
Updated by bmbouter about 7 years ago
- Platform Release set to 2.8.1
- Triaged changed from No to Yes
Updated by bmbouter about 7 years ago
- Status changed from NEW to CLOSED - WORKSFORME
I was contacted by the original reporter and they identified the issue as not being a root cause in Pulp code. They had made some modifications to the code which introduced a memory issue. They are not exactly sure of the root cause, but by modifying their implementation they confirmed it was resolved.
If other users observe this, please re-open.