Issue #2492: Small memory leak in httpd when dispatching Pulp tasks - Pulp

Actions

Send by e-mail Copy link

Issue #2492

closed

Small memory leak in httpd when dispatching Pulp tasks

Added by bmbouter about 8 years ago. Updated over 4 years ago.

Status:

CLOSED - WONTFIX

Priority:

Normal

Assignee:

Category:

Sprint/Milestone:

Start date:

Due date:

Estimated time:

Severity:

1. Low

Version:

Platform Release:

OS:

Triaged:

Yes

Groomed:

Sprint Candidate:

Tags:

Pulp 2

Sprint:

Quarter:

Description

standalone pulp reproducer:

name="some-very-long-repo-name-some-very-long-repo-name-some-very-long-repo-name-test"

i=1;
while true; do
  pulp-admin rpm repo create --repo-id=${name}-${i} --display-name=${name}-${i}
  pulp-admin rpm repo delete --repo-id=${name}-${i} 
  sleep 2
done

(plus some ps monitoring)

Memory usage (RSS) grows approx 110bytes per one task /220bytes per one iteration of above cycle (sum of RSS increases per individual processes)

Files

Download all files

wsgi_memory_growth_10000.png (15.1 KB) wsgi_memory_growth_10000.png		bmbouter, 12/28/2016 06:51 PM
uwsgi_10K_create_and_delete.png (8.87 KB) uwsgi_10K_create_and_delete.png		bmbouter, 01/11/2017 09:21 PM

Related issues

Actions

Copy link

Updated by mhrivnak about 8 years ago

We at least need to identify a root cause soon. If fixing is complex, we could consider a different priority for that.

Actions

Copy link

Updated by bizhang about 8 years ago

Sprint/Milestone set to 31
Triaged changed from No to Yes

Actions

Copy link

Updated by bmbouter about 8 years ago

Description updated (diff)
Sprint/Milestone deleted (31)

Actions

Copy link

Updated by bmbouter about 8 years ago

Status changed from NEW to ASSIGNED
Assignee set to bmbouter

Actions

Copy link

Updated by bmbouter about 8 years ago

File wsgi_memory_growth.png added

On a 2.10-dev installation of Pulp, I modified the httpd conf file to have exactly 1 WSGI process for Pulp which causes it to handle all requests.

I then ran this reproducer:

name="some-very-long-repo-name-some-very-long-repo-name-some-very-long-repo-name-test"

for i in {0..1000}; do
  sudo ps -up $(pgrep -f 'wsgi:pulp\)')

  pulp-admin rpm repo create --repo-id=${name}-${i} --display-name=${name}-${i}
  pulp-admin rpm repo delete --repo-id=${name}-${i}
done

I then grab all of the process memory footprints into one text file with:

cat output.txt | grep FOREGROUND > filtered.txt
awk '{print $6}' ~/filtered.txt > ~/filtered_mem_only.txt

I then graph that using Google Docs and I get the attached figure.

Actions

Copy link

Updated by bmbouter about 8 years ago

File deleted (~~wsgi_memory_growth.png~~)

Actions

Copy link

Updated by bmbouter about 8 years ago

File wsgi_memory_growth.png added

I ran 3 additional test cases.

I reran the original reproducer, but using RabbitMQ instead of Qpid
using RabbitMQ, I reran the reproducer but only had it create the repos. This codepath does not involve Celery or the message bus.
using RabbitMQ, I reran the reproducer but only had it delete the repos. This codepath does involve Celery and the message bus.

All 4 test cases are in one figure for comparison. It includes the previous data also so I'm deleting the previous figure.

I think the data supports several conclusions:

1. The memory leak is not unique to Qpid
2. Some of the memory growth is not related to Celery, but after ~500 requests memory usage does not grow
3. The memory is related to Celery

I don't think we know if the memory growth will continue to grow monotonically with continued tasking system use, or if it will eventually stabilize.

Actions

Copy link

Updated by bmbouter about 8 years ago

File wsgi_memory_growth_10000.png added

I ran the following 3 tests for 10K requests and attached the image.

create+delete using RabbitMQ
create only using RabbitMQ
delete only using RabbitMQ

btw, all of this data is recorded here[0].

[0]: https://docs.google.com/a/redhat.com/spreadsheets/d/1N4Tm5uyblnHoBdy27n5_YohkfIQxkWzIr_kfZKF5n3Y/edit?usp=sharing

Actions

Copy link

Updated by bmbouter about 8 years ago

File deleted (~~wsgi_memory_growth.png~~)

Actions

Copy link

#10

Updated by bmbouter about 8 years ago

File deleted (~~wsgi_memory_growth_10000.png~~)

Actions

Copy link

#11

Updated by bmbouter about 8 years ago

File wsgi_memory_growth_10000.png wsgi_memory_growth_10000.png added

The man page states that the RSS column is in Kilobytes not bytes. Uploading a new figure with the correct axis label.

Actions

Copy link

#12

Updated by bmbouter about 8 years ago

From the long running tests, I think it shows a few things:

1. Celery memory usage for task dispatch contributes the most to the memory problems
2. Non-celery operations also increase memory usage but more slowly. httpd does seem to release this with a slight dip around 9K requests.
3. Memory growth seems to increase monotonically
4. The rate of worst-case growth (the red line) is approx. 0.2928 KB per create+destroy operation. That includes both the create and destroy operation.

I don't think this is severe enough to continue investigating. We should consider doing one of two things. We could make a change to the default WSGI config to enable process recycling, or we could close as WONTFIX.

What do others think?

Actions

Copy link

#14

Updated by mhrivnak about 8 years ago

I think process recycling is an appropriate change upstream.

The number of requests between restarts should be large so this doesn't have a big resource impact, especially since the memory leak is very small. Just making stuff up, I'd go with at least 10,000 requests between restarts. That would still keep the memory leak's impact very small.

I also thing the graceful-timeout should be large so there is ample opportunity for the process to do a graceful shutdown. There's no harm in letting that be at least several minutes.

http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIDaemonProcess.html

Actions

Copy link

#15

Updated by dkliban@redhat.com about 8 years ago

Is the memory leak present when httpd is handling requests for content? If so, then we should open up a bug against httpd.

If the memory leak is not present when httpd is handling requests for content, then we should consider how many Pulp API requests would need to be made in order to cause a significant increase in memory use. Based on the data provided in comment 12, it would take over 6800 API requests to cause httpd to use an extra 1 megabyte of memory. This is not a serious problem.

Introducing process recycling will only address the symptoms of the problem. It will also prevent us from noticing any other memory leaks that could be introduced in the future.

We should focus on fixing the problem. If we don't know how to fix it, we should at least leave it as is so someone else can attempt to fix it.

Actions

Copy link

#16

Updated by bmbouter about 8 years ago

After reading comments from others, and thinking more about this, I think we should leave this at NEW and remove it from the sprint. The main things that stand out in my mind are:

It allows for the potential early discovery of a more serious leak (memory, fd's, AIO contexts, whatever)
This leak is not severe so the benefit of making this change is very small.
It allows the issue to remain open for someone to really fix it with a tool like guppy

Actions

Copy link

#17

Updated by mhrivnak almost 8 years ago

Sprint/Milestone set to 31

Actions

Copy link

#18

Updated by bmbouter almost 8 years ago

File uwsgi_2K_create_and_delete.png added
Sprint/Milestone deleted (31)

I wanted to compare the memory behavior of httpd versus other webservers, so I ran the same reproducer with Pulp running against uWSGI and nginx. nginx used a fixed (literally 0 kb changes during the test) of 5292 KB. The uWSGI however leaked similarly to httpd. See the attached graph of the create+delete test using rabbitMQ with 2K requests.

Actions

Copy link

#19

Updated by bmbouter almost 8 years ago

I added some to my reproducer with additional webserver support and the tracking of file descriptors. In terms of next steps, I plan to heavily comment out the codepaths of these and trying to observe what aspects of the request make additional. This was originally an idea from mhrivnak.

name="some-very-long-repo-name-some-very-long-repo-name-some-very-long-repo-name-test"

rm httpd_mem_data.txt
rm httpd_fd_data.txt

rm uwsgi_mem_data.txt
rm uwsgi_fd_data.txt

rm nginx_mem_data.txt
rm nginx_fd_data.txt

rm -rf ~/devel/mem_output/*
sudo rm /var/lib/pulp/count_output.txt

for i in {0..2000}; do
  #sudo ps --no-header -up $(pgrep -f 'wsgi:pulp\)') | awk '{ print $6 }' >> httpd_mem_data.txt
  #sudo lsof -p $(pgrep -f 'wsgi:pulp\)') | wc -l >> httpd_fd_data.txt

  sudo ps --no-header -up $(pgrep -f 'uwsgi pulp_uwsgi.ini' | tail -n1) | awk '{ print $6 }' >> uwsgi_mem_data.txt
  sudo lsof -p $(pgrep -f 'uwsgi pulp_uwsgi.ini' | tail -n1) | wc -l >> uwsgi_fd_data.txt

  sudo ps --no-header -up $(pgrep -f 'nginx: worker process') | awk '{ print $6 }' >> nginx_mem_data.txt
  sudo lsof -p $(pgrep -f 'nginx: worker process') | wc -l >> nginx_fd_data.txt

  pulp-admin rpm repo create --repo-id=${name}-${i} --display-name=${name}-${i}
  pulp-admin rpm repo delete --repo-id=${name}-${i}
done

Actions

Copy link

#20

Updated by bmbouter almost 8 years ago

File deleted (~~uwsgi_2K_create_and_delete.png~~)

Actions

Copy link

#21

Updated by bmbouter almost 8 years ago

File uwsgi_10K_create_and_delete.png uwsgi_10K_create_and_delete.png added

Adding a graph containing uwsgi memory usage for 10K create+delete operations.

Actions

Copy link

#22

Updated by bmbouter almost 8 years ago

After commenting out several areas of codepaths fired by the reproducer, the following diff causes memory info to be constant even with Pulp doing a bunch of meaningful work. Specifically the patch disables authorization and the response forming and middleware. This patch does cause meaningful work to occur including the repo being created and deleted, full logging, celery task dispatch for the repo deletion, and mongodb usage.

diff --git a/server/pulp/server/webservices/views/repositories.py b/server/pulp/server/webservices/views/repositories.py
index e0389c8..c9e500d 100644
--- a/server/pulp/server/webservices/views/repositories.py
+++ b/server/pulp/server/webservices/views/repositories.py
@@ -103,7 +103,7 @@ class ReposView(View):
                                          include_distributors)
         return generate_json_response_with_pulp_encoder(processed_repos)

-    @auth_required(authorization.CREATE)
+    #@auth_required(authorization.CREATE)
     @parse_json_body(json_type=dict)
     def post(self, request):
         """
@@ -130,6 +130,7 @@ class ReposView(View):

         repo = serializers.Repository(repo_obj).data
         response = generate_json_response_with_pulp_encoder(repo)
+        return
         return generate_redirect_response(response, repo['_href'])


@@ -170,7 +171,7 @@ class RepoResourceView(View):

         return generate_json_response_with_pulp_encoder(repo)

-    @auth_required(authorization.DELETE)
+    #@auth_required(authorization.DELETE)
     def delete(self, request, repo_id):
         """
         Dispatch a task to delete a repository.
@@ -186,6 +187,7 @@ class RepoResourceView(View):
         """
         model.Repository.objects.get_repo_or_missing_resource(repo_id)
         async_result = repo_controller.queue_delete(repo_id)
+        return
         raise exceptions.OperationPostponed(async_result)

     @auth_required(authorization.UPDATE)

Actions

Copy link

#23

Updated by bizhang almost 8 years ago

Related to Story #2632: As a developer I want to reevaluate worker issues to see if they have been resolved by moving from Celery3 to Celery4 added

Actions

Copy link

#24

Updated by bmbouter almost 7 years ago

Status changed from ASSIGNED to NEW
Assignee deleted (~~bmbouter~~)

I'm not actively working on this so I'm moving back to NEW and unassigned.

This bug won't affect Pulp3 because the code containing the issue is not included in Pulp3 (and never will be). We know which code contained the issue based on the investigation done in this bug.

Actions

Copy link

#25

Updated by bmbouter over 5 years ago

Tags Pulp 2 added

Actions

Copy link

#26

Updated by dalley over 4 years ago

Status changed from NEW to CLOSED - WONTFIX

Shouldn't be a problem in Pulp 3

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

Pulp

Agile boards

Custom queries

Issue #2492

Small memory leak in httpd when dispatching Pulp tasks

Updated by mhrivnak about 8 years ago

Updated by bizhang about 8 years ago

Updated by bmbouter about 8 years ago

Updated by bmbouter about 8 years ago

Updated by bmbouter about 8 years ago

Updated by bmbouter about 8 years ago

Updated by bmbouter about 8 years ago

Updated by bmbouter about 8 years ago

Updated by bmbouter about 8 years ago

Updated by bmbouter about 8 years ago

Updated by bmbouter about 8 years ago

Updated by bmbouter about 8 years ago

Updated by mhrivnak about 8 years ago

Updated by dkliban@redhat.com about 8 years ago

Updated by bmbouter about 8 years ago

Updated by mhrivnak almost 8 years ago

Updated by bmbouter almost 8 years ago

Updated by bmbouter almost 8 years ago

Updated by bmbouter almost 8 years ago

Updated by bmbouter almost 8 years ago

Updated by bmbouter almost 8 years ago

Updated by bizhang almost 8 years ago

Updated by bmbouter almost 7 years ago

Updated by bmbouter over 5 years ago

Updated by dalley over 4 years ago