Issue #2551
closed
Pulp task error messages should be more informative
Status:
CLOSED - CURRENTRELEASE
Description
Syncing a repo from katello an error was reported:
PLP0000: The task status 23a81cbf-547b-4c16-9eba-0bf478399da1 exited immediately for some reason. Marking as errored. Check the logs for more details
the /var/log/messages had the detailed
celery.worker.job:ERROR: (15661-67328) WorkerLostError: Worker exited prematurely: signal 6 (SIGIOT)
pulp-2.11.0-1.el7
This usually means that while Pulp was running, a library it was calling into experienced a fatal exception. For instance a segfault, OOM, or memory allocation issue. Some searching of the web indicates that it could fail to allocate necessary memory which would be an environmental issue. Is this problem reproducible? I suspect not, but I want to confirm.
It's very highly unlikely to be something a Pulp code change would fix, but we can help investigate since you experienced it using Pulp. Since Python is interpreted and it's highly unlikley to be a bug in cPython, it's either going to be an environmental problem or a bug in a third party library.
As the bug is written now, it couldn't be accepted. Can some reproducer steps be added? Either API calls or pulp-admin commands would do.
Sorry, I forgot to include the reason why I reported this issue: The generic PLP0000 message is not as useful as conveying the actual error that is shown in /var/log/messages up to the user. I was told that pulp devs desired to know the cases of PLP0000 encountered in katello tooling so that better info could be passed up.
The source of the underlying issue itself is unknown to me at this time. I will file a new issue with details for reproducing that.
Ohhhhh. Yes I see exactly what you mean. That makes sense. My comments above are more pertinent to the issue tracking the root cause. Thanks for clarifying.
- Subject changed from PLP0000 error reported for "celery.worker.job:ERROR: (15661-67328) WorkerLostError: Worker exited prematurely: signal 6 (SIGIOT)" to Pulp task error messages should be more informative
- Sprint/Milestone set to 32
- Triaged changed from No to Yes
- Sprint/Milestone changed from 32 to 33
- Status changed from NEW to ASSIGNED
- Assignee set to jortel@redhat.com
- Status changed from ASSIGNED to POST
- Status changed from POST to MODIFIED
- Platform Release set to 2.12.2
- Status changed from MODIFIED to 5
- Status changed from 5 to CLOSED - CURRENTRELEASE
- Sprint/Milestone deleted (
33)
Also available in: Atom
PDF
Improved logging of worker abnormal termination. closes #2551