Issue #3656
closed"TaskLock matching query does not exist" in logs every few seconds
Description
May 2 16:41:38 balmora celery[6393]: [2018-05-02 20:41:38,116: ERROR/MainProcess] Error in timer: DoesNotExist('TaskLock matching query does not exist.',)
May 2 16:41:38 balmora celery[6393]: Traceback (most recent call last):
May 2 16:41:38 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/hub.py", line 136, in fire_timers
May 2 16:41:38 balmora celery[6393]: entry()
May 2 16:41:38 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/timer.py", line 68, in __call__
May 2 16:41:38 balmora celery[6393]: return self.fun(*self.args, **self.kwargs)
May 2 16:41:38 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/timer.py", line 127, in _reschedules
May 2 16:41:38 balmora celery[6393]: return fun(*args, **kwargs)
May 2 16:41:38 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/pulpcore/tasking/celery_app.py", line 120, in _on_tick
May 2 16:41:38 balmora celery[6393]: handle_worker_heartbeat(consumer.hostname)
May 2 16:41:38 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/pulpcore/tasking/services/worker_watcher.py", line 67, in handle_worker_heartbeat
May 2 16:41:38 balmora celery[6393]: lock = TaskLock.objects.get(name=worker.name)
May 2 16:41:38 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
May 2 16:41:38 balmora celery[6393]: return getattr(self.get_queryset(), name)(*args, **kwargs)
May 2 16:41:38 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/django/db/models/query.py", line 403, in get
May 2 16:41:38 balmora celery[6393]: self.model._meta.object_name
May 2 16:41:38 balmora celery[6393]: pulpcore.app.models.task.DoesNotExist: TaskLock matching query does not exist.
May 2 16:41:43 balmora celery[6393]: [2018-05-02 20:41:43,127: ERROR/MainProcess] Error in timer: DoesNotExist('TaskLock matching query does not exist.',)
May 2 16:41:43 balmora celery[6393]: Traceback (most recent call last):
May 2 16:41:43 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/hub.py", line 136, in fire_timers
May 2 16:41:43 balmora celery[6393]: entry()
May 2 16:41:43 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/timer.py", line 68, in __call__
May 2 16:41:43 balmora celery[6393]: return self.fun(*self.args, **self.kwargs)
May 2 16:41:43 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/kombu/async/timer.py", line 127, in _reschedules
May 2 16:41:43 balmora celery[6393]: return fun(*args, **kwargs)
May 2 16:41:43 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/pulpcore/tasking/celery_app.py", line 120, in _on_tick
May 2 16:41:43 balmora celery[6393]: handle_worker_heartbeat(consumer.hostname)
May 2 16:41:43 balmora celery[6393]: File "/home/jlsherri/pulpvenv/lib64/python3.6/site-packages/pulpcore/tasking/services/worker_watcher.py", line 67, in handle_worker_heartbeat
In this case 6393 is the resource manager. Everything seems to be working fine however.
Updated by bmbouter over 6 years ago
I've observed this several times in my dev environment. It's happening right now actually. I observe this most commonly when I suspend and resume the vagrant vm I have. Since my VM is currently showing this here are some details about my system.
1. The resource manager is running. I verified via a `ps | grep resource_manager`
2. There are 0 TaskLock records. in my database
^ should not exist. The TaskLock record is created when the resource_manager starts and since it's still running it should not have been deleted. I can see that the reason it got deleted is because Pulp thought the resource manager was missing. I know this because it says:
May 03 13:08:04 pulp3.dev celery[21105]: [2018-05-03 13:08:04,453: ERROR/MainProcess] Worker 'resource_manager@pulp3.dev' has gone missing, removing from list of
^ message was emitted exactly when I reopened my laptop. So what I think is happening is:
1. System is normal
2. suspend VM
3. Wait longer than 30 seconds
4. resume VM
5. observe that Pulp observes the resource_manager is lost and deletes its TaskLock
6. The resource_manager continues heartbeating
7. The handling of each of those heartbeats can't find the record it wants to "update"
Maybe that update should be a get_or_create?
dalley what do you think?
Note that the RQ branch removes TaskLock entirely so maybe we just wait for that to merge and then close this?
Updated by dalley over 6 years ago
- Tags Pulp 3 MVP added
@jsherrill, are you installing from source or from PyPI?
If it's a source install, we should go ahead and fix this now. If PyPI, I'm fine with waiting until RQ merges.
Updated by bmbouter over 6 years ago
The problematic code was removed with the adoption of RQ. You won't see this issue anymore. Please reopen if you do.
Updated by bmbouter over 6 years ago
- Status changed from NEW to CLOSED - WORKSFORME
Updated by bmbouter over 6 years ago
- Status changed from CLOSED - WORKSFORME to CLOSED - WONTFIX
We removed the line that was causing this error with the adoption of RQ. We didn't fix it, so I'm saying WONTFIX