Project

Profile

Help

Issue #3021

closed

Database writes are not all recorded

Added by bmbouter over 7 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
Yes
Tags:
Sprint:
Sprint 26
Quarter:

Description

Against the most recent 3.0-dev along with a latest checkout of pulp_example (from dkliban's PR) I run the following commands to perform a basic sync using the asyncio downloaders:

http POST http://127.0.0.1:8000/api/v3/repositories/ name=example-repo notes:={} scratchpad:={}
http POST http://127.0.0.1:8000/api/v3/repositories/example-repo/importers/example-asyncio/ name=example-importer download_policy='immediate' repository='http://127.0.0.1:800/api/v3/repositories/example-repo/' feed_url='https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/PULP_MANIFEST' sync_mode='mirror'
http POST http://127.0.0.1:8000/api/v3/repositories/example-repo/importers/example-asyncio/example-importer/sync/

The repo above is expected to have 2 units which 404. If you run this repo over and over, most of the time you'll get the following snippet in your task status output:

<snip>
    "non_fatal_errors": [
        {
            "traceback": null,
            "description": "404, message='Not Found' for url https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/missing-2.iso",
            "code": null
        },
        {
            "traceback": null,
            "description": "404, message='Not Found' for url https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/missing-1.iso",
            "code": null
        }
    ],
</snip>

However occasionally you'll get only one 404 recorded, not two, which looks like this:

<snip>
    "non_fatal_errors": [
        {
            "traceback": null,
            "description": "404, message='Not Found' for url https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/missing-2.iso",
            "code": null
        }
    ],
</snip>

Note that the codepath taken by pulp_example proves that the downloaders are correctly emitting the 404 exception back to the plugin code. The thing that seems to not be happening correctly is the recording of the non-fatal exceptions on the Task in the database.

After some further experimentation this occurs when the task runtime is very short with runtimes like:

    "started_at": "2017-09-18T21:48:51.402544Z",
    "finished_at": "2017-09-18T21:48:51.886003Z",

The theory is that the task is ending and not flushing all writes to the database before the task is finished.

Actions #1

Updated by ttereshc over 7 years ago

  • Triaged changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #2

Updated by mhrivnak over 7 years ago

  • Sprint/Milestone set to 45
Actions #3

Updated by bmbouter over 7 years ago

I think the on_failures() and on_success() handlers should flush the database connection. These should always run after the task code has completed unless there is a SIGKILL. https://github.com/pulp/pulp/blob/3.0-dev/platform/pulpcore/tasking/tasks.py#L274-L315

Actions #4

Updated by dkliban@redhat.com about 7 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dkliban@redhat.com
Actions #5

Updated by dkliban@redhat.com about 7 years ago

This seems like a bug with asyncio downloaders. The following traceback is emitted when an exception is not recorded:

[2017-10-02 18:04:33,899: ERROR/ForkPoolWorker-1] Task exception was never retrieved
future: <Task finished coro=<attach_url_to_exception.<locals>.wrapper() done, defined at /home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/base.py:45> exception=ClientResponseError("404, message='Not Found'",)>
Traceback (most recent call last):
  File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step
    result = coro.send(None)
  File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/base.py", line 50, in wrapper
    raise error 
  File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/base.py", line 47, in wrapper
    return await func(downloader)
  File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/http.py", line 129, in run
    response.raise_for_status()
  File "/home/vagrant/.virtualenvs/pulp/lib64/python3.5/site-packages/aiohttp/client_reqrep.py", line 653, in raise_for_status
    headers=self.headers) 
aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found'

I am investigating further and will post back when I know more about the nature of this problem.

Actions #6

Updated by dkliban@redhat.com about 7 years ago

This problem is described here[0]. Though it looks like we are already chaining the coroutines.

[0] https://docs.python.org/3.5/library/asyncio-dev.html#detect-exceptions-never-consumed

Added by dkliban@redhat.com about 7 years ago

Revision ca975fba | View on GitHub

Problem: GroupDownloader skips some downloads

Solution: process all finished downloads before checking if any Group is done

closes #3021 https://pulp.plan.io/issues/3021

Added by dkliban@redhat.com about 7 years ago

Revision ca975fba | View on GitHub

Problem: GroupDownloader skips some downloads

Solution: process all finished downloads before checking if any Group is done

closes #3021 https://pulp.plan.io/issues/3021

Actions #7

Updated by dkliban@redhat.com about 7 years ago

  • Status changed from ASSIGNED to POST

The GroupDownloader was returning too soon when more than 1 download was complete.

https://github.com/pulp/pulp/pull/3181

Actions #8

Updated by dkliban@redhat.com about 7 years ago

  • Status changed from POST to MODIFIED
Actions #9

Updated by bmbouter almost 7 years ago

  • Sprint set to Sprint 26
Actions #10

Updated by bmbouter almost 7 years ago

  • Sprint/Milestone deleted (45)
Actions #11

Updated by daviddavis over 5 years ago

  • Sprint/Milestone set to 3.0.0
Actions #12

Updated by bmbouter over 5 years ago

  • Tags deleted (Pulp 3)
Actions #13

Updated by bmbouter about 5 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF