Issue #8708
closedIssue #8912: [EPIC] Issues with the traditional tasking system
Tasking: Uploading many advisories in a row can have intermittent failures
Description
Uploading ~350 advisories at the same time, and 'a few' of them fail with tracebacks like the following:
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[59340]: pulp [None]: rq.worker:INFO: resource-manager: 69556e0b-19a6-4de8-98c2-e3731333b2bc
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com gunicorn[59342]: pulp [d620ef055ae940a894615c1c7989da44]: 127.0.0.1 - admin [05/May/2021:20:41:00 +0000] "POST /pulp/api/v3/content/rpm/adviso
ries/ HTTP/1.0" 202 67 "-" "HTTPie/2.4.0"
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: pulp [d620ef055ae940a894615c1c7989da44]: rq.worker:ERROR: Traceback (most recent call last):
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/rq/worker.py", line 1008, in perform_job
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: rv = job.perform()
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/rq/job.py", line 706, in perform
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: self._result = self._execute()
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/rq/job.py", line 729, in _execute
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: result = self.func(*self.args, **self.kwargs)
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/home/vagrant/devel/pulpcore/pulpcore/tasking/tasks.py", line 144, in _queue_reserved_task
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: reservation = worker.reservations.get(resource=resource)
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/models/manager.py", line 82, in manager_method
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: return getattr(self.get_queryset(), name)(*args, **kwargs)
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/models/query.py", line 406, in get
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: raise self.model.DoesNotExist(
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: pulpcore.app.models.task.ReservedResource.DoesNotExist: ReservedResource matching query does not exist.
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: Traceback (most recent call last):
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/rq/worker.py", line 1008, in perform_job
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: rv = job.perform()
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/rq/job.py", line 706, in perform
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: self._result = self._execute()
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/rq/job.py", line 729, in _execute
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: result = self.func(*self.args, **self.kwargs)
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/home/vagrant/devel/pulpcore/pulpcore/tasking/tasks.py", line 144, in _queue_reserved_task
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: reservation = worker.reservations.get(resource=resource)
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/models/manager.py", line 82, in manager_method
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: return getattr(self.get_queryset(), name)(*args, **kwargs)
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: File "/usr/local/lib/pulp/lib64/python3.9/site-packages/django/db/models/query.py", line 406, in get
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: raise self.model.DoesNotExist(
May 05 20:41:00 pulp3-source-fedora33.padre-fedora.example.com rq[62301]: pulpcore.app.models.task.ReservedResource.DoesNotExist: ReservedResource matching query does not exist.
Using the advisories in this repo: https://github.com/ggainey/pulp_startup/tree/main/lero_tests/tmp/bla
And this to do the uploads:
cd lero_tests/tmp/bla
REPO_HREF=`pulp rpm repository create --name bar | jq -r '.pulp_href'`
for x in *; do http --form POST :/pulp/api/v3/content/rpm/advisories/ file@./$x repository=$REPO_HREF; done
Each advisory-upload is its own task. Occasionally, in the logs, you'll see the traceback above. Also, check the count of advisories in the repository - it should be 356, and will occasionally be less:
pulp rpm repository version show --repository bar | jq '.content_summary.present."rpm.advisory".count'
352
NOTE: uploading an advisory, that already exists, will not only fail - it will make it impossible to upload further advisories, until https://pulp.plan.io/issues/8683 gets fixed. Addressing this issue probably can/needs-to wait until after 8683 is fixed/merged.
Related issues
Close a race condition with ReservedResource
During a small window between checking it's existance and looking up the ReservedResource, this object may be deleted in another thread. This condition will lead to task remaining in waiting state forever.
fixes #8708 https://pulp.plan.io/issues/8708