Network maintenance. Planio will be observing two scheduled maintenance windows this Tuesday, March 2 and Wednesday, March 3 from 02:00 UTC until 06:00 UTC each in order to perform maintenance on access routers in our primary datacenter. Your account might observe short downtimes during these periods up to several minutes at a time.

Issue #736

multiple resource_managers on the same database

Added by jluza almost 6 years ago. Updated almost 2 years ago.

Start date:
Due date:
Estimated time:
2. Medium
Platform Release:
Sprint Candidate:
Pulp 2


Let's consider following situation:
on one server running celerybeat, workers and resource_manager.
User want to run next couple of workers on another server, but accidentally run also resource_manager on another server.
Now there are worker[0-x]_srv1 and worker[0-x]_srv2 in database. User will kill resource_manager and workers on srv2.
But worker[0-x]_srv2 are still in database. Resource_manager_srv1 take care only for worker_[0-x]_srv1.
Workers from server2 are already dead but still remaining in database and pulp will happily assign tasks to them.
Solution is run resource_manager on srv2 again and wait till it clears dead workers from database or remove them manually.
In the case of manual remove workers from database, user also needs to stop all services and then start them again.
Possible ways how to prevent this:
- resource_manager will be used also for assigning tasks.
- mechanism that prevents running two or more resource_managers on one database
- resource_manager will manage all workers in db.workers, not only ones that are registered to it.


#1 Updated by bmbouter almost 6 years ago

  • Status changed from NEW to 7

In the upcoming release of 2.6.0 the tasking system had significant improvements made for it. See #157 for more details on the expected behavior. It will task, discover, and monitor workers across any number of machines.

Running two resource managers is not correct, but it should provide mostly correct operation. The resource locking done by the resource_manager code was designed to be concurrent, so we expect it to perform adequately even if they do mistakenly start two.

The important thing is that when the second resource manager stop that its records are correctly removed. I just did some testing with the upcoming 2.6.0 release and I correctly see:

pulp.server.async.scheduler:ERROR: Workers '' has gone missing, removing from list of workers
pulp.server.async.tasks:ERROR: The worker named is missing. Canceling the tasks in its queue

That is expected so I'm going to close this issue for now. Try using the 2.6 beta and reproducing the problem. If you can reproduce it with that then reopen the issue.

#2 Updated by bmbouter almost 6 years ago

  • Status changed from 7 to CLOSED - NOTABUG

#3 Updated by bmbouter almost 6 years ago

  • Severity set to 2. Medium

#4 Updated by bmbouter almost 6 years ago

  • Triaged changed from No to Yes

#5 Updated by bmbouter almost 2 years ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF