Project

Profile

Help

Refactor #6529

closed

Pulp 3 may be using Django in an unsafe way (with async)

Added by dalley almost 4 years ago. Updated about 2 years ago.

Status:
CLOSED - DUPLICATE
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Quarter:

Description

Ticket moved to GitHub: "pulp/pulpcore/1883":https://github.com/pulp/pulpcore/issues/1883


After reading this page [0] I realized that what we are doing with using the Django ORM inside of async code is potentially very unsafe.

[0] https://docs.djangoproject.com/en/3.0/topics/async/

I subsequently had this discussion with one of the developers in #django

<dalley>    hi - so this is kind of a low-level question about Django internals
<dalley>    We're using the Django ORM in asynchronous code with Django 2.2.  The async feature page of Django 3.0 talks at length about how this is unsafe due to shared global state that isn't coroutine aware
<ubernostrum>   It is true that the ORM is not async-safe, yes.
<dalley>    but we've never had any problems with this with semi-extensive use. under what circumstances would that cause problems?
<ubernostrum>   There's a debate to be had about whether an async ORM is even a good thing to have :):
<ubernostrum>   Anyway, the concern is that asynchronous code has, effectively, all the problems of threaded code -- it may suspend and re-enter at completely unpredictable times.
<ubernostrum>   And since the ORM, currently, depends on global shared state in a non-async-safe way, that means it's possible (though not guaranteed) that the global shared state could be modified when your code isn't expecting it to be.
<ubernostrum>   It's not a promise that you'll have bugs, just a promise that you might have bugs and if you do they'll be the very worst sort to try to reproduce and diagnose.
<dalley>    ubernostrum, so, how pervasive are those issues?  is it possible that we're not doing anything that would trigger them, or is that exceedingly unlikely and we just haven't noticed yet?
<ubernostrum>   dalley: it's not a matter of "pervasive" or not; that's a thing you'd have to try to figure out for a given individual code base.
<ubernostrum>   There's no way to say it is universally X% safe or unsafe.
<ubernostrum>   That's one of the pernicious things about bugs in asynchronous code.
<ubernostrum>   Maybe everything you're doing is perfectly fine and won't cause problems. Maybe it's only mostly fine and will lead to an impossible-to-diagnose bug a month from now. We can't tell you in advance which it is.
<dalley>    are there a couple of known problematic patterns that we can look for?
<ubernostrum>   Well, for example, what are you actually using async to do?
<dalley>    we have kind of a heavy application with a REST API, not a typical web app.  The purpose is to mirror, manipulate, and host software repositories for stuff like RPM, Debian, Python packages, Docker registry, etc.
<dalley>    We're aware that the DB I/O itself isn't async, but we're using async code because the pipeline behavior was desirable for ingesting "content"
<dalley>    and we do use asyncio for saving the package files
<ubernostrum>   And when you started doing this, did you get a SynchronousOnlyOperation exception and follow the documentation for how to prevent that?
<dalley>    No, but we're on 2.2 - is that error present in 2.2?
<dalley>    we've never hit that exception to my knowledge
<ubernostrum>   Ah.
<ubernostrum>   No, you wouldn't get that on 2.2, because the helpers for identifying unsafe situations and trying to work around them weren't present in 2.2.
<ubernostrum>   You're in completely uncharted territory.
<ubernostrum>   In 3.0, Django can detect some cases of unsafe use of async and will automatically crash to protect you unless you've taken steps to verify and tell Django that what you're doing is safe (or that you accept the risks).
<ubernostrum>   The only analysis the Django developers have done for safe/unsafe async use is in 3.0.
<ubernostrum>   dalley: the Django 3.0 documentation covers this; I suggest reading up on it.
<ubernostrum>   https://docs.djangoproject.com/en/3.0/topics/async/
<ubernostrum>   Officially, Django 2.2 does not support running asynchronously at all; 3.0 supports it in limited ways.
<ubernostrum>   dalley: to resort to an analogy, though, basically you're asking questions like "I heard it might be unsafe to cross the street, is that true?"
<ubernostrum>   And people are going to reply that it's heavily context-dependent. If there's a lot of traffic in the street and you run out without looking, it's very unsafe.
<ubernostrum>   If there's no traffic and you look carefully and use a crossing signal, it's safer.
<ubernostrum>   etc.
<ubernostrum>   Except in this case the street is known for sometimes also having lightning strikes out of a clear blue sky.

To address this, we should switch to Django 3.0 and use the newly-provided async helpers such as sync_to_async to make this safe again. There is the potential that if we do this we could get a performance improvement, as it emulates async by spawning a new thread for the I/O. All of the normal caveats about introducing threading apply here, though. If possible we should prefer wrapping the individual database calls as seen here [1] to sidestep that problem.

[1] https://django.readthedocs.io/en/latest/topics/async.html

Actions #1

Updated by dalley almost 4 years ago

  • Description updated (diff)
Actions #2

Updated by dalley almost 4 years ago

  • Subject changed from Pulp 3 is using Django in an unsafe way with async to Pulp 3 may be using Django in an unsafe way (with async)
Actions #3

Updated by dalley almost 3 years ago

  • Related to Task #8488: [EPIC] Upgrade to django 3.2 added
Actions #4

Updated by bmbouter almost 3 years ago

  • Related to deleted (Task #8488: [EPIC] Upgrade to django 3.2)
Actions #5

Updated by bmbouter almost 3 years ago

  • Parent issue set to #8488
Actions #6

Updated by dalley almost 3 years ago

  • Parent issue deleted (#8488)
Actions #7

Updated by pulpbot about 2 years ago

  • Description updated (diff)
  • Status changed from NEW to CLOSED - DUPLICATE

Also available in: Atom PDF