Project

Profile

Help

Issue #6045

closed

Pulp content app looses database connection

Added by osapryki almost 5 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 66
Quarter:

Description

After a long time running pulp-content-app all requests begin failing with the following error message:

Traceback (most recent call last):
  File "/venv/lib64/python3.6/site-packages/aiohttp/web_protocol.py", line 418, in start
    resp = await task
  File "/venv/lib64/python3.6/site-packages/aiohttp/web_app.py", line 458, in _handle
    resp = await handler(request)
  File "/venv/lib64/python3.6/site-packages/pulpcore/content/handler.py", line 117, in stream_content
    return await self._match_and_stream(path, request)
  File "/venv/lib64/python3.6/site-packages/pulpcore/content/handler.py", line 309, in _match_and_stream
    distro = self._match_distribution(path)
  File "/venv/lib64/python3.6/site-packages/pulpcore/content/handler.py", line 158, in _match_distribution
    return BaseDistribution.objects.get(base_path__in=base_paths).cast()
  File "/venv/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 402, in get
    num = len(clone)
  File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 256, in __len__
    self._fetch_all()
  File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/venv/lib64/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/venv/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1131, in execute_sql
    cursor = self.connection.cursor()
  File "/venv/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 256, in cursor
    return self._cursor()
  File "/venv/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 235, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/venv/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/venv/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 235, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "/venv/lib64/python3.6/site-packages/django/db/backends/postgresql/base.py", line 223, in create_cursor
    cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed

Possible reason is that pulp-content-app doesn't re-establish database connection whenever it closes.


Related issues

Related to Pulp - Issue #9276: Content app can have unusable/closed db connections in pulpcore 3.15/3.16CLOSED - CURRENTRELEASEdkliban@redhat.comActions
Actions #1

Updated by bmbouter almost 5 years ago

Thank you for reporting this. I expect Django to re-establish the connection. I see this from its management docs here

How can I reproduce this issue also, any pointers for me?

Actions #3

Updated by bmbouter almost 5 years ago

This error suggests the connection is closing from the Postgresql side. I was chatting about the issue in their channel and they indicated Postgresql is not closing the connection. They suggested that a firewall in the middle (openshift perhaps?) is firewalling the idle connection.

Was there anything in the postgresql logs that indicates it is closing the connection or that django maybe closed it (aka the firewall?)

Actions #4

Updated by bmbouter almost 5 years ago

I had two ideas.

One: we could check the postgresql connection during the pre_request event http://docs.gunicorn.org/en/latest/settings.html#pre-request but it would be costly in terms of the request-response runtime increase.

Two: I looked for some sort of process recycling in gunicorn (which runs the content app) but I didn't see it.

Actions #5

Updated by ironfroggy almost 5 years ago

bmbouter, maybe you want gunicorn's max_requests setting, which will restart workers after N number of requests handled?

https://docs.gunicorn.org/en/stable/settings.html#max-requests

Actions #6

Updated by osapryki almost 5 years ago

  • Description updated (diff)

@ironfroggy Limiting number of connections won't help because connection can terminate any time within this limit.

RCA: Django manages connections implicitly. It sets up signal handlers to handle dead or expired connections before and after each request [1]. This handler executes close_if_unusable_or_obsolete method, which closes connection if it times out when CONN_MAX_AGE is exceeded or if is_usable() [3] check fails.

[1] https://github.com/django/django/blob/stable/2.2.x/django/db/__init__.py#L60

[2] https://github.com/django/django/blob/stable/2.2.x/django/db/backends/base/base.py#L492

[3] https://github.com/django/django/blob/stable/2.2.x/django/db/backends/postgresql/base.py#L249

Since content-app is asyncio application and uses django connection, it will share single connection between coroutines. Possible solution is to close the connection when request handler returns response or after significant part of database queries.

from django.db import connection

# Put this either after significant database communication logic or at the beginning or at the end of request handler.
connection.close()

Also since the connection is shared between coroutines you should be extremely careful with that to avoid connection state being inconsistent between coroutine context switches. For example you should make sure context is never switched within a transaction.

Actions #7

Updated by daviddavis almost 5 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 66
Actions #8

Updated by daviddavis almost 5 years ago

  • Status changed from NEW to POST

Added by daviddavis almost 5 years ago

Revision 5aabb202 | View on GitHub

Fix "connection already closed" error in content app

fixes #6045 https://pulp.plan.io/issues/6045

Actions #9

Updated by daviddavis almost 5 years ago

  • Status changed from POST to MODIFIED
Actions #10

Updated by daviddavis almost 5 years ago

  • Assignee set to daviddavis

Added by bmbouter almost 5 years ago

Revision 6fe74ee2 | View on GitHub

Add bugfix changelog entry for 6045

https://pulp.plan.io/issues/6045 re #6045

Added by daviddavis almost 5 years ago

Revision 3da5c467 | View on GitHub

Fix "connection already closed" error in content app

fixes #6045 https://pulp.plan.io/issues/6045

(cherry picked from commit 5aabb202dd59cbe2d30ef5ad91f01932c9ca041b)

Added by bmbouter almost 5 years ago

Revision 3293134e | View on GitHub

Add bugfix changelog entry for 6045

https://pulp.plan.io/issues/6045 re #6045

(cherry picked from commit 6fe74ee2978732c72dcfdfa8a6eeed5760a4dee7)

Actions #12

Updated by bmbouter almost 5 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Actions #13

Updated by bmbouter almost 5 years ago

  • Sprint/Milestone set to 3.1.1
Actions #14

Updated by ttereshc about 3 years ago

  • Related to Issue #9276: Content app can have unusable/closed db connections in pulpcore 3.15/3.16 added

Also available in: Atom PDF