Pulp content app looses database connection
After a long time running pulp-content-app all requests begin failing with the following error message:
Traceback (most recent call last): File "/venv/lib64/python3.6/site-packages/aiohttp/web_protocol.py", line 418, in start resp = await task File "/venv/lib64/python3.6/site-packages/aiohttp/web_app.py", line 458, in _handle resp = await handler(request) File "/venv/lib64/python3.6/site-packages/pulpcore/content/handler.py", line 117, in stream_content return await self._match_and_stream(path, request) File "/venv/lib64/python3.6/site-packages/pulpcore/content/handler.py", line 309, in _match_and_stream distro = self._match_distribution(path) File "/venv/lib64/python3.6/site-packages/pulpcore/content/handler.py", line 158, in _match_distribution return BaseDistribution.objects.get(base_path__in=base_paths).cast() File "/venv/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 402, in get num = len(clone) File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 256, in __len__ self._fetch_all() File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all self._result_cache = list(self._iterable_class(self)) File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__ results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size) File "/venv/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1131, in execute_sql cursor = self.connection.cursor() File "/venv/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 256, in cursor return self._cursor() File "/venv/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 235, in _cursor return self._prepare_cursor(self.create_cursor(name)) File "/venv/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/venv/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 235, in _cursor return self._prepare_cursor(self.create_cursor(name)) File "/venv/lib64/python3.6/site-packages/django/db/backends/postgresql/base.py", line 223, in create_cursor cursor = self.connection.cursor() django.db.utils.InterfaceError: connection already closed
Possible reason is that pulp-content-app doesn't re-establish database connection whenever it closes.
#2 Updated by osapryki over 1 year ago
I think this discussion might be related:
#3 Updated by bmbouter over 1 year ago
This error suggests the connection is closing from the Postgresql side. I was chatting about the issue in their channel and they indicated Postgresql is not closing the connection. They suggested that a firewall in the middle (openshift perhaps?) is firewalling the idle connection.
Was there anything in the postgresql logs that indicates it is closing the connection or that django maybe closed it (aka the firewall?)
#4 Updated by bmbouter over 1 year ago
I had two ideas.
One: we could check the postgresql connection during the pre_request event http://docs.gunicorn.org/en/latest/settings.html#pre-request but it would be costly in terms of the request-response runtime increase.
Two: I looked for some sort of process recycling in gunicorn (which runs the content app) but I didn't see it.
#5 Updated by ironfroggy over 1 year ago
bmbouter, maybe you want gunicorn's
max_requests setting, which will restart workers after
N number of requests handled?
#6 Updated by osapryki over 1 year ago
- Description updated (diff)
@ironfroggy Limiting number of connections won't help because connection can terminate any time within this limit.
RCA: Django manages connections implicitly. It sets up signal handlers to handle dead or expired connections before and after each request . This handler executes close_if_unusable_or_obsolete method, which closes connection if it times out when CONN_MAX_AGE is exceeded or if is_usable()  check fails.
Since content-app is asyncio application and uses django connection, it will share single connection between coroutines. Possible solution is to close the connection when request handler returns response or after significant part of database queries.
from django.db import connection # Put this either after significant database communication logic or at the beginning or at the end of request handler. connection.close()
Also since the connection is shared between coroutines you should be extremely careful with that to avoid connection state being inconsistent between coroutine context switches. For example you should make sure context is never switched within a transaction.
Please register to edit this issue