Issue #3021
closedDatabase writes are not all recorded
Description
Against the most recent 3.0-dev along with a latest checkout of pulp_example (from dkliban's PR) I run the following commands to perform a basic sync using the asyncio downloaders:
http POST http://127.0.0.1:8000/api/v3/repositories/ name=example-repo notes:={} scratchpad:={}
http POST http://127.0.0.1:8000/api/v3/repositories/example-repo/importers/example-asyncio/ name=example-importer download_policy='immediate' repository='http://127.0.0.1:800/api/v3/repositories/example-repo/' feed_url='https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/PULP_MANIFEST' sync_mode='mirror'
http POST http://127.0.0.1:8000/api/v3/repositories/example-repo/importers/example-asyncio/example-importer/sync/
The repo above is expected to have 2 units which 404. If you run this repo over and over, most of the time you'll get the following snippet in your task status output:
<snip>
"non_fatal_errors": [
{
"traceback": null,
"description": "404, message='Not Found' for url https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/missing-2.iso",
"code": null
},
{
"traceback": null,
"description": "404, message='Not Found' for url https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/missing-1.iso",
"code": null
}
],
</snip>
However occasionally you'll get only one 404 recorded, not two, which looks like this:
<snip>
"non_fatal_errors": [
{
"traceback": null,
"description": "404, message='Not Found' for url https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/missing-2.iso",
"code": null
}
],
</snip>
Note that the codepath taken by pulp_example proves that the downloaders are correctly emitting the 404 exception back to the plugin code. The thing that seems to not be happening correctly is the recording of the non-fatal exceptions on the Task in the database.
After some further experimentation this occurs when the task runtime is very short with runtimes like:
"started_at": "2017-09-18T21:48:51.402544Z",
"finished_at": "2017-09-18T21:48:51.886003Z",
The theory is that the task is ending and not flushing all writes to the database before the task is finished.
Updated by ttereshc over 7 years ago
- Triaged changed from No to Yes
- Sprint Candidate changed from No to Yes
Updated by bmbouter over 7 years ago
I think the on_failures() and on_success() handlers should flush the database connection. These should always run after the task code has completed unless there is a SIGKILL. https://github.com/pulp/pulp/blob/3.0-dev/platform/pulpcore/tasking/tasks.py#L274-L315
Updated by dkliban@redhat.com about 7 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to dkliban@redhat.com
Updated by dkliban@redhat.com about 7 years ago
This seems like a bug with asyncio downloaders. The following traceback is emitted when an exception is not recorded:
[2017-10-02 18:04:33,899: ERROR/ForkPoolWorker-1] Task exception was never retrieved
future: <Task finished coro=<attach_url_to_exception.<locals>.wrapper() done, defined at /home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/base.py:45> exception=ClientResponseError("404, message='Not Found'",)>
Traceback (most recent call last):
File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step
result = coro.send(None)
File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/base.py", line 50, in wrapper
raise error
File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/base.py", line 47, in wrapper
return await func(downloader)
File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/http.py", line 129, in run
response.raise_for_status()
File "/home/vagrant/.virtualenvs/pulp/lib64/python3.5/site-packages/aiohttp/client_reqrep.py", line 653, in raise_for_status
headers=self.headers)
aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found'
I am investigating further and will post back when I know more about the nature of this problem.
Updated by dkliban@redhat.com about 7 years ago
This problem is described here[0]. Though it looks like we are already chaining the coroutines.
[0] https://docs.python.org/3.5/library/asyncio-dev.html#detect-exceptions-never-consumed
Added by dkliban@redhat.com about 7 years ago
Added by dkliban@redhat.com about 7 years ago
Revision ca975fba | View on GitHub
Problem: GroupDownloader skips some downloads
Solution: process all finished downloads before checking if any Group is done
Updated by dkliban@redhat.com about 7 years ago
- Status changed from ASSIGNED to POST
The GroupDownloader was returning too soon when more than 1 download was complete.
Updated by dkliban@redhat.com about 7 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulp|ca975fba83b63574773a43b33d488c560ecd9d82.
Updated by bmbouter about 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Problem: GroupDownloader skips some downloads
Solution: process all finished downloads before checking if any Group is done
closes #3021 https://pulp.plan.io/issues/3021