Project

Profile

Help

Issue #3656

"TaskLock matching query does not exist" in logs every few seconds

Added by jsherril@redhat.com over 2 years ago. Updated over 1 year ago.

Status:
CLOSED - WONTFIX
Priority:
Normal
Assignee:
-
Category:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

May  2 16:41:38 balmora celery[6393]: [2018-05-02 20:41:38,116: ERROR/MainProcess] Error in timer: DoesNotExist('TaskLock matching query does not exist.',)
May  2 16:41:38 balmora celery[6393]: Traceback (most recent call last):
May  2 16:41:38 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/hub.py", line 136, in fire_timers
May  2 16:41:38 balmora celery[6393]:    entry()
May  2 16:41:38 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/timer.py", line 68, in __call__
May  2 16:41:38 balmora celery[6393]:    return self.fun(*self.args, **self.kwargs)
May  2 16:41:38 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/timer.py", line 127, in _reschedules
May  2 16:41:38 balmora celery[6393]:    return fun(*args, **kwargs)
May  2 16:41:38 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/pulpcore/tasking/celery_app.py", line 120, in _on_tick
May  2 16:41:38 balmora celery[6393]:    handle_worker_heartbeat(consumer.hostname)
May  2 16:41:38 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/pulpcore/tasking/services/worker_watcher.py", line 67, in handle_worker_heartbeat
May  2 16:41:38 balmora celery[6393]:    lock = TaskLock.objects.get(name=worker.name)
May  2 16:41:38 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
May  2 16:41:38 balmora celery[6393]:    return getattr(self.get_queryset(), name)(*args, **kwargs)
May  2 16:41:38 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/django/db/models/query.py", line 403, in get
May  2 16:41:38 balmora celery[6393]:    self.model._meta.object_name
May  2 16:41:38 balmora celery[6393]: pulpcore.app.models.task.DoesNotExist: TaskLock matching query does not exist.
May  2 16:41:43 balmora celery[6393]: [2018-05-02 20:41:43,127: ERROR/MainProcess] Error in timer: DoesNotExist('TaskLock matching query does not exist.',)
May  2 16:41:43 balmora celery[6393]: Traceback (most recent call last):
May  2 16:41:43 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/hub.py", line 136, in fire_timers
May  2 16:41:43 balmora celery[6393]:    entry()
May  2 16:41:43 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/timer.py", line 68, in __call__
May  2 16:41:43 balmora celery[6393]:    return self.fun(*self.args, **self.kwargs)
May  2 16:41:43 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/timer.py", line 127, in _reschedules
May  2 16:41:43 balmora celery[6393]:    return fun(*args, **kwargs)
May  2 16:41:43 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/pulpcore/tasking/celery_app.py", line 120, in _on_tick
May  2 16:41:43 balmora celery[6393]:    handle_worker_heartbeat(consumer.hostname)
May  2 16:41:43 balmora celery[6393]:  File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/pulpcore/tasking/services/worker_watcher.py", line 67, in handle_worker_heartbeat

In this case 6393 is the resource manager. Everything seems to be working fine however.

History

#1 Updated by bmbouter over 2 years ago

I've observed this several times in my dev environment. It's happening right now actually. I observe this most commonly when I suspend and resume the vagrant vm I have. Since my VM is currently showing this here are some details about my system.

1. The resource manager is running. I verified via a `ps | grep resource_manager`
2. There are 0 TaskLock records. in my database

^ should not exist. The TaskLock record is created when the resource_manager starts and since it's still running it should not have been deleted. I can see that the reason it got deleted is because Pulp thought the resource manager was missing. I know this because it says:

May 03 13:08:04 pulp3.dev celery[21105]: [2018-05-03 13:08:04,453: ERROR/MainProcess] Worker 'resource_manager@pulp3.dev' has gone missing, removing from list of

^ message was emitted exactly when I reopened my laptop. So what I think is happening is:

1. System is normal
2. suspend VM
3. Wait longer than 30 seconds
4. resume VM
5. observe that Pulp observes the resource_manager is lost and deletes its TaskLock
6. The resource_manager continues heartbeating
7. The handling of each of those heartbeats can't find the record it wants to "update"

Maybe that update should be a get_or_create?
dalley what do you think?

Note that the RQ branch removes TaskLock entirely so maybe we just wait for that to merge and then close this?

#2 Updated by dalley over 2 years ago

  • Tags Pulp 3 MVP added

@jsherrill, are you installing from source or from PyPI?

If it's a source install, we should go ahead and fix this now. If PyPI, I'm fine with waiting until RQ merges.

#3 Updated by dalley over 2 years ago

  • Triaged changed from No to Yes

#4 Updated by bmbouter over 2 years ago

The problematic code was removed with the adoption of RQ. You won't see this issue anymore. Please reopen if you do.

#5 Updated by bmbouter over 2 years ago

  • Status changed from NEW to CLOSED - WORKSFORME

#6 Updated by bmbouter over 2 years ago

  • Status changed from CLOSED - WORKSFORME to CLOSED - WONTFIX

We removed the line that was causing this error with the adoption of RQ. We didn't fix it, so I'm saying WONTFIX

#7 Updated by bmbouter over 1 year ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF