Project

Profile

Help

Issue #3021

Database writes are not all recorded

Added by bmbouter about 2 years ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Category:
-
Sprint/Milestone:
Start date:
Due date:
Severity:
3. High
Version:
Platform Release:
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
Yes
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 26

Description

Against the most recent 3.0-dev along with a latest checkout of pulp_example (from dkliban's PR) I run the following commands to perform a basic sync using the asyncio downloaders:

http POST http://127.0.0.1:8000/api/v3/repositories/ name=example-repo notes:={} scratchpad:={}
http POST http://127.0.0.1:8000/api/v3/repositories/example-repo/importers/example-asyncio/ name=example-importer download_policy='immediate' repository='http://127.0.0.1:800/api/v3/repositories/example-repo/' feed_url='https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/PULP_MANIFEST' sync_mode='mirror'
http POST http://127.0.0.1:8000/api/v3/repositories/example-repo/importers/example-asyncio/example-importer/sync/

The repo above is expected to have 2 units which 404. If you run this repo over and over, most of the time you'll get the following snippet in your task status output:

<snip>
    "non_fatal_errors": [
        {
            "traceback": null,
            "description": "404, message='Not Found' for url https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/missing-2.iso",
            "code": null
        },
        {
            "traceback": null,
            "description": "404, message='Not Found' for url https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/missing-1.iso",
            "code": null
        }
    ],
</snip>

However occasionally you'll get only one 404 recorded, not two, which looks like this:

<snip>
    "non_fatal_errors": [
        {
            "traceback": null,
            "description": "404, message='Not Found' for url https://repos.fedorapeople.org/pulp/pulp/fixtures/file-mixed/missing-2.iso",
            "code": null
        }
    ],
</snip>

Note that the codepath taken by pulp_example proves that the downloaders are correctly emitting the 404 exception back to the plugin code. The thing that seems to not be happening correctly is the recording of the non-fatal exceptions on the Task in the database.

After some further experimentation this occurs when the task runtime is very short with runtimes like:

    "started_at": "2017-09-18T21:48:51.402544Z",
    "finished_at": "2017-09-18T21:48:51.886003Z",

The theory is that the task is ending and not flushing all writes to the database before the task is finished.

Associated revisions

Revision ca975fba View on GitHub
Added by dkliban@redhat.com about 2 years ago

Problem: GroupDownloader skips some downloads

Solution: process all finished downloads before checking if any Group is done

closes #3021
https://pulp.plan.io/issues/3021

Revision ca975fba View on GitHub
Added by dkliban@redhat.com about 2 years ago

Problem: GroupDownloader skips some downloads

Solution: process all finished downloads before checking if any Group is done

closes #3021
https://pulp.plan.io/issues/3021

Revision ca975fba View on GitHub
Added by dkliban@redhat.com about 2 years ago

Problem: GroupDownloader skips some downloads

Solution: process all finished downloads before checking if any Group is done

closes #3021
https://pulp.plan.io/issues/3021

History

#1 Updated by ttereshc about 2 years ago

  • Triaged changed from No to Yes
  • Sprint Candidate changed from No to Yes

#2 Updated by mhrivnak about 2 years ago

  • Sprint/Milestone set to 45

#3 Updated by bmbouter about 2 years ago

I think the on_failures() and on_success() handlers should flush the database connection. These should always run after the task code has completed unless there is a SIGKILL. https://github.com/pulp/pulp/blob/3.0-dev/platform/pulpcore/tasking/tasks.py#L274-L315

#4 Updated by dkliban@redhat.com about 2 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to dkliban@redhat.com

#5 Updated by dkliban@redhat.com about 2 years ago

This seems like a bug with asyncio downloaders. The following traceback is emitted when an exception is not recorded:

[2017-10-02 18:04:33,899: ERROR/ForkPoolWorker-1] Task exception was never retrieved
future: <Task finished coro=<attach_url_to_exception.<locals>.wrapper() done, defined at /home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/base.py:45> exception=ClientResponseError("404, message='Not Found'",)>
Traceback (most recent call last):
  File "/usr/lib64/python3.5/asyncio/tasks.py", line 240, in _step
    result = coro.send(None)
  File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/base.py", line 50, in wrapper
    raise error 
  File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/base.py", line 47, in wrapper
    return await func(downloader)
  File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/download/asyncio/http.py", line 129, in run
    response.raise_for_status()
  File "/home/vagrant/.virtualenvs/pulp/lib64/python3.5/site-packages/aiohttp/client_reqrep.py", line 653, in raise_for_status
    headers=self.headers) 
aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found'

I am investigating further and will post back when I know more about the nature of this problem.

#6 Updated by dkliban@redhat.com about 2 years ago

This problem is described here0. Though it looks like we are already chaining the coroutines.

[0] https://docs.python.org/3.5/library/asyncio-dev.html#detect-exceptions-never-consumed

#7 Updated by dkliban@redhat.com about 2 years ago

  • Status changed from ASSIGNED to POST

The GroupDownloader was returning too soon when more than 1 download was complete.

https://github.com/pulp/pulp/pull/3181

#8 Updated by dkliban@redhat.com about 2 years ago

  • Status changed from POST to MODIFIED

#9 Updated by bmbouter over 1 year ago

  • Sprint set to Sprint 26

#10 Updated by bmbouter over 1 year ago

  • Sprint/Milestone deleted (45)

#11 Updated by daviddavis 6 months ago

  • Sprint/Milestone set to 3.0

#12 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3)

Please register to edit this issue

Also available in: Atom PDF