Project

Profile

Help

Issue #6045

Pulp content app looses database connection

Added by osapryki 10 months ago. Updated 9 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 66
Quarter:

Description

After a long time running pulp-content-app all requests begin failing with the following error message:

Traceback (most recent call last):
  File "/venv/lib64/python3.6/site-packages/aiohttp/web_protocol.py", line 418, in start
    resp = await task
  File "/venv/lib64/python3.6/site-packages/aiohttp/web_app.py", line 458, in _handle
    resp = await handler(request)
  File "/venv/lib64/python3.6/site-packages/pulpcore/content/handler.py", line 117, in stream_content
    return await self._match_and_stream(path, request)
  File "/venv/lib64/python3.6/site-packages/pulpcore/content/handler.py", line 309, in _match_and_stream
    distro = self._match_distribution(path)
  File "/venv/lib64/python3.6/site-packages/pulpcore/content/handler.py", line 158, in _match_distribution
    return BaseDistribution.objects.get(base_path__in=base_paths).cast()
  File "/venv/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 402, in get
    num = len(clone)
  File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 256, in __len__
    self._fetch_all()
  File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/venv/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1131, in execute_sql
    cursor = self.connection.cursor()
  File "/venv/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 256, in cursor
    return self._cursor()
  File "/venv/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 235, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/venv/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/venv/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 235, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/venv/lib64/python3.6/site-packages/django/db/backends/postgresql/base.py", line 223, in create_cursor
    cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed

Possible reason is that pulp-content-app doesn't re-establish database connection whenever it closes.

Associated revisions

Revision 5aabb202 View on GitHub
Added by daviddavis 10 months ago

Fix "connection already closed" error in content app

fixes #6045 https://pulp.plan.io/issues/6045

Revision 6fe74ee2 View on GitHub
Added by bmbouter 9 months ago

Add bugfix changelog entry for 6045

https://pulp.plan.io/issues/6045 re #6045

Revision 3da5c467 View on GitHub
Added by daviddavis 9 months ago

Fix "connection already closed" error in content app

fixes #6045 https://pulp.plan.io/issues/6045

(cherry picked from commit 5aabb202dd59cbe2d30ef5ad91f01932c9ca041b)

Revision 3293134e View on GitHub
Added by bmbouter 9 months ago

Add bugfix changelog entry for 6045

https://pulp.plan.io/issues/6045 re #6045

(cherry picked from commit 6fe74ee2978732c72dcfdfa8a6eeed5760a4dee7)

History

#1 Updated by bmbouter 10 months ago

Thank you for reporting this. I expect Django to re-establish the connection. I see this from its management docs here

How can I reproduce this issue also, any pointers for me?

#3 Updated by bmbouter 10 months ago

This error suggests the connection is closing from the Postgresql side. I was chatting about the issue in their channel and they indicated Postgresql is not closing the connection. They suggested that a firewall in the middle (openshift perhaps?) is firewalling the idle connection.

Was there anything in the postgresql logs that indicates it is closing the connection or that django maybe closed it (aka the firewall?)

#4 Updated by bmbouter 10 months ago

I had two ideas.

One: we could check the postgresql connection during the pre_request event http://docs.gunicorn.org/en/latest/settings.html#pre-request but it would be costly in terms of the request-response runtime increase.

Two: I looked for some sort of process recycling in gunicorn (which runs the content app) but I didn't see it.

#5 Updated by ironfroggy 10 months ago

bmbouter, maybe you want gunicorn's max_requests setting, which will restart workers after N number of requests handled?

https://docs.gunicorn.org/en/stable/settings.html#max-requests

#6 Updated by osapryki 10 months ago

  • Description updated (diff)

@ironfroggy Limiting number of connections won't help because connection can terminate any time within this limit.

RCA: Django manages connections implicitly. It sets up signal handlers to handle dead or expired connections before and after each request [1]. This handler executes close_if_unusable_or_obsolete method, which closes connection if it times out when CONN_MAX_AGE is exceeded or if is_usable() [3] check fails.

[1] https://github.com/django/django/blob/stable/2.2.x/django/db/__init__.py#L60

[2] https://github.com/django/django/blob/stable/2.2.x/django/db/backends/base/base.py#L492

[3] https://github.com/django/django/blob/stable/2.2.x/django/db/backends/postgresql/base.py#L249

Since content-app is asyncio application and uses django connection, it will share single connection between coroutines. Possible solution is to close the connection when request handler returns response or after significant part of database queries.

from django.db import connection

# Put this either after significant database communication logic or at the beginning or at the end of request handler.
connection.close()

Also since the connection is shared between coroutines you should be extremely careful with that to avoid connection state being inconsistent between coroutine context switches. For example you should make sure context is never switched within a transaction.

#7 Updated by daviddavis 10 months ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 66

#8 Updated by daviddavis 10 months ago

  • Status changed from NEW to POST

#9 Updated by daviddavis 10 months ago

  • Status changed from POST to MODIFIED

#10 Updated by daviddavis 9 months ago

  • Assignee set to daviddavis

#12 Updated by bmbouter 9 months ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

#13 Updated by bmbouter 9 months ago

  • Sprint/Milestone set to 3.1.1

Please register to edit this issue

Also available in: Atom PDF