Story #2014
closedAs a user, Pulp can have smart task distribution
0%
Description
Use case:
=========
- I have dozens of repo groups defined in Pulp and every repo group has several repositories.
- I export all the repo groups via group export distributor as DVD ISOs images on regular basis.
- The biggest repo group currently contains 21 DVD ISOs (more than 90GB).
- There are ~10 repo groups that have more than 10 DVD ISOs (usually about 14-19 DVD ISOs).
- There is a machine with almost 0,5 TB work dir which is shared between 8 Pulp workers (8 threads) which run there.
Problem:
========
Pulp often fails because of "No space left on device", because it delegates group export tasks to all available workers (threads) on the machine and the available 0,5TB is not sufficient.
I was told that there is no way how to tell pulp to behave more intelligently like "Hey here you got 8 workers you can use all of them for regular work, but for group export distribution use max 4 of them at a time".
Proposed solution:
==================
Let's take inspiration from Koji [1]. It provides several options how to load-balance tasks:
1) Channels
-----------
Every host can be member of several channels [2].
Host can take only tasks that belongs to channels he is member of.
2) Task weight and capacity
---------------------------
Every host can work on several tasks at a time. Every task has some weight (createrepo, build, ...) and every host has some capacity that reflects its resources. For example a host A with capacity 10 can work on 10 FOO tasks with weight 1 at a time or the same host can work on 2 BAR tasks with weight 5 or any other possible combinations.
Thanks to these features, one can easily build and run reliable infrastructure full of heterogeneous tasks.
Because Pulp tasks are also heterogeneous (it a big difference if you generate 20MB of repodata for one repo or 90GB of ISOs for one repo group), it should have similar features. I would suggest to use Koji at least as an inspiration because it's here for a long time and it's time proven.
Regards
Tomas
[1] https://fedoraproject.org/wiki/Koji
[2] http://koji.fedoraproject.org/koji/hostinfo?hostID=154
Updated by bmbouter over 8 years ago
- Subject changed from [RFE] Smart task distribution to As a user, Pulp can have smart task distribution
- Platform Release deleted (
2.8.1)
The target platform field is set when the ticket goes to MODIFIED.
Updated by bmbouter over 8 years ago
This story seems to simulatneously talk about adding restrictions while also improving performance. These seems like entirely different goals.
For the restrictions:
For the disk problem, Pulp seems to be doing too good a job of running tasks in parallel. For the size of data you want to handle you want to restrict the number of workers. Unfortunately concurrency is restricted to all task types and not just 1 task type. Generally to restrict the throughput of the Pulp tasking system it is done as a whole using the PULP_CONCURRENCY option. Have you tried restricting your concurrency to 4?
For the performance improvements:
Pulp could probably get more out of physical hardware by running complimentary workloads such as high CPU workloads with high I/O workloads to avoid the resource contention. I think this is the type of speed up you are suggesting. Usually Pulp has focused on horizontal scaling and less on this type of performance. It would also require a lot of knowledge about the task's affect on the system which would be complex at best.
What are you thoughts given ^?
Updated by tmlcoch over 8 years ago
Hi Bryan,
I'm not talking neither about restrictions nor performance improvements here. I'm saying that Pulp really needs a smarter task distribution, ideally configurable, which takes account of character/difficulty of the task.
I'm going to write it again:
It a really big difference if a task (A) generates repodata for a single rpm repo or (B) exports repo group.
Task (A) lasts few minutes at most and requires about few dozens MB in workdir - easy peasy.
Task (B) may last hours and requires up to 100 GB in workdir - pretty damn load.
Currently Pulp is naive and don't care about fact that task are heterogeneous and have different complexity and requirements on computing resources.
In peak hours, our system receives hundreds of task every hour.
Most of the task are simple ones like repo regeneration (A), but from time to time we need export repo groups (B) and that's the situation when Pulp with its naive approach in task distribution fall flat.
Reduce number of concurrency from 8 to 4 would significantly decrease system performance and throughput of our infrastructure and cause that resources our beefy worker would be underused. The worker can work on 8 (or even more) simple tasks (A) at the same time without any problem and limit overall number of tasks is not an option.
Updated by bmbouter over 8 years ago
One thing to be aware of is the practical challenge of how the tasking works. Tasks lock "resources" and the locking system ensures each resource has at most 1 task running at a time and they are run serially. For example a "resource" is the name of a repo. This is critically important or a sync and publish task for repo "foo" could both run at the same time which would leave the publish in an inconsistent state. To accomplish this, each worker executes at most 1 task at a time and the concurrency comes from pulp_workers spawning many processes. This is critically important for correctness. This is incompatible with the concurrency model you've outlined.
Another thing to be aware of is the need for First Come First Server (FCFS). Tasks are expected to be run within an order they were submitted with respect to resources. So a dispatch of A1, B1, A2, B2 requires that A1 run before A2 and B1 run before B2. This is important because when users dispatch a sync (without autopublish) and then a publish, they expect the publish to be run after the sync. Because of this when the resource manager cannot find a worker to handle it's work it waits until a worker becomes idle. This is OK because no one could do the work anyway (all busy) and you don't want to pre-allocate a worker because you don't know who will become available first. This is very efficient.
One thing we could do is restrict which workers are able to handle the work by task type. So for example group export task could only run on up to 4 workers. Or perhaps these 4 specific workers. This would allow tasks to be spread out across infrastructure in a more controlled way which I could see as valuable and very easy to accomplish. Note that the resource manager would wait until one of these workers becomes free such that it can dispatch the work which will hold up tasks behind it, but it is still pretty efficient.
What do you think about these things?
As an aside, for any task type, knowing the maximum number of concurrent tasks is probably difficult. You're expecting that users would set this right? You're not thinking that Pulp would automatically determine this right?
Another aside is an observation that resource workloads for the same task type change drastically from run to run. Look at a sync for example, the remote data dictates the network and disk resources used by the task. Publish too, the number of units to be published determines the resource needs. I've wondered about solutions to this myself, but I don't have anything solid.
Updated by tmlcoch over 8 years ago
bmbouter wrote:
One thing to be aware of is the practical challenge of how the tasking works. Tasks lock "resources" and the locking system ensures each resource has at most 1 task running at a time and they are run serially. For example a "resource" is the name of a repo. This is critically important or a sync and publish task for repo "foo" could both run at the same time which would leave the publish in an inconsistent state. To accomplish this, each worker executes at most 1 task at a time and the concurrency comes from pulp_workers spawning many processes. This is critically important for correctness. This is incompatible with the concurrency model you've outlined.
I'm not proposing a new concurrency model. What I want is that "task manager" which distributes tasks takes into consideration that not all tasks are the same when distributing tasks to workers so the workers aren't overloaded and failing.
Another thing to be aware of is the need for First Come First Server (FCFS). Tasks are expected to be run within an order they were submitted with respect to resources. So a dispatch of A1, B1, A2, B2 requires that A1 run before A2 and B1 run before B2. This is important because when users dispatch a sync (without autopublish) and then a publish, they expect the publish to be run after the sync. Because of this when the resource manager cannot find a worker to handle it's work it waits until a worker becomes idle. This is OK because no one could do the work anyway (all busy) and you don't want to pre-allocate a worker because you don't know who will become available first. This is very efficient.
This FCFS shouldn't be any affected by my request.
I understand that some task must be run in exact order, but I expect that such tasks are wrapped into some "parent" task which takes care about the order or that there is something like dependency condition in the task (don't schedule this task until a specified task is done), am I wrong?
For example in our system, it's quite common to run hundreds of publishes on different repos and at the same time, run group export distributor on different set of repos. If the sets of repos which are being published and which are being exported via group export distributor have no intersection, then it's save to run this task simultaneously without dependency on each other, right?
One thing we could do is restrict which workers are able to handle the work by task type. So for example group export task could only run on up to 4 workers. Or perhaps these 4 specific workers. This would allow tasks to be spread out across infrastructure in a more controlled way which I could see as valuable and very easy to accomplish. Note that the resource manager would wait until one of these workers becomes free such that it can dispatch the work which will hold up tasks behind it, but it is still pretty efficient.
This is pretty much what the "Channels" in Koji do.
You can have, let's say, five workers A-E and say that all workers (A-B) are members of "sync" channel + A is also member of group export channel which will result into situation when group export tasks are only done on "A" while sync tasks can be done on arbitrary worker.
What do you think about these things?
As an aside, for any task type, knowing the maximum number of concurrent tasks is probably difficult. You're expecting that users would set this right? You're not thinking that Pulp would automatically determine this right?
Actually, I would prefer some smart automatic detection done by Pulp, but I wanted to keep this feature request simple from the start. (But I would still like to see a possibility to set/override the values in configuration manually)
Another aside is an observation that resource workloads for the same task type change drastically from run to run. Look at a sync for example, the remote data dictates the network and disk resources used by the task. Publish too, the number of units to be published determines the resource needs. I've wondered about solutions to this myself, but I don't have anything solid.
The weight of the task doesn't have to be super-accurate default values should be based on sane assumptions and test measurements.
Some assumptions can be done easily, for example: The repo sync task, it syncs 1 repo at a time while repo group export usually must export multiple repositories (that's a point of groups to contains multiple repos) and thus we can expect that repo group export distributor task will consume more space than repo sync task.
Btw. this is a place where option to manually set weights of the tasks could be beneficial - just in case that someone would run Pulp in environment where it's common to sync repos which take dozens of gigabytes while group export distributors are run on tiny repositories with few small packages.
Updated by bmbouter over 8 years ago
tmlcoch wrote:
I'm not proposing a new concurrency model. What I want is that "task manager" which distributes tasks takes into consideration that not all tasks are the same when distributing tasks to workers so the workers aren't overloaded and failing.
Oh yes I see now that you aren't proposing a new concurrency model. I had misread 'host' for 'worker'.
This FCFS shouldn't be any affected by my request.
I agree it shouldn't be affected, but it is something to be aware of because it will cause workers to be sometimes idle.
I understand that some task must be run in exact order, but I expect that such tasks are wrapped into some "parent" task which takes care about the order or that there is something like dependency condition in the task (don't schedule this task until a specified task is done), am I wrong?
Tasks are serialized on a 'resource'. So both sync and publish for instance lock a repo 'foo' as the resource so they will be processed synchronously. Workers are reserved by resource and the resource_manager does the dispatching.
For example in our system, it's quite common to run hundreds of publishes on different repos and at the same time, run group export distributor on different set of repos. If the sets of repos which are being published and which are being exported via group export distributor have no intersection, then it's save to run this task simultaneously without dependency on each other, right?
It is safe to run them concurrently because they deal with different resources, but they are not completely independent because both task types flow through the resource manager which only reads from the HEAD of the 'resource_manager' queue. So while they can be executed in parallel safely the resource manager may have to wait to dispatch tasks which could block other tasks in the resource_manager queue from being assigned a worker by the resource_manager. The inability to skip ahead in the queue is not one that celery and kombu supports that I know of.
This is pretty much what the "Channels" in Koji do.
You can have, let's say, five workers A-E and say that all workers (A-B) are members of "sync" channel + A is also member of group export channel which will result into situation when group export tasks are only done on "A" while sync tasks can be done on arbitrary worker.
This sounds fine. Note that the resource_manager only reading from HEAD of the queue means that if the current task cannot be dispatched because one of its channel workers are not available, then any sync tasks behind it will also not be dispatched.
In server.conf there would be the opportunity to put in task names (python class path) and a list of worker names. When the resource manager looks for an available worker it would restrict itself to only the workers listed. This allows for workload routing by task name. If no entry for the task type is found it uses any available for its reservation.
Actually, I would prefer some smart automatic detection done by Pulp, but I wanted to keep this feature request simple from the start. (But I would still like to see a possibility to set/override the values in configuration manually)
If implemented as described above the default would be off which is the current behavior. All workers are candidates to handle any task with a reservation. I think we would need to implement a manual version first and then maybe we could look at profiling.
The weight of the task doesn't have to be super-accurate default values should be based on sane assumptions and test measurements.
Some assumptions can be done easily, for example: The repo sync task, it syncs 1 repo at a time while repo group export usually must export multiple repositories (that's a point of groups to contains multiple repos) and thus we can expect that repo group export distributor task will consume more space than repo sync task.
Btw. this is a place where option to manually set weights of the tasks could be beneficial - just in case that someone would run Pulp in environment where it's common to sync repos which take dozens of gigabytes while group export distributors are run on tiny repositories with few small packages.
I'm not clear on the weights idea. How are weights related to worker selection? How are weights related to the feature description above ^?
I'm very interested in the "placement" problem of workloads on hosts with resource constraints. I've studied it for a while, and I've learned that predicting the resource needs of Pulp workloads is a complex task. Also users have different goals which is another dimension of planning. For example you have a disk restriction goal. Knowing how much disk Pulp operations will use, knowing your resource constraints, and adhering to your disk goal, are all very difficult problems. For this reason I think the best Pulp can do is to allow the user to have the knobs available to place and restrict workloads by task type.
Are you interested in making this? It wouldn't be hard, you would cause the searching for an "open" worker [0] to adhere to settings in server.conf which I would put here [1]. Those settings would need some examples written in this ticket which could become the basis for the documentation. I am willing to help coach throughout the process. Also see the vagrant environment here [2] which is really great for development!
[0]: https://github.com/pulp/pulp/blob/ed36708132e18bac4ab8b58d0b56cdb2ffe95685/server/pulp/server/async/tasks.py#L141
[1]: https://github.com/pulp/pulp/blob/0f100dbb81db860753cc97958bc315bc57eee4bc/server/etc/pulp/server.conf#L328
[2]: https://docs.pulpproject.org/dev-guide/contributing/dev_setup.html#vagrant
Updated by bmbouter over 5 years ago
- Status changed from NEW to CLOSED - WONTFIX
Updated by bmbouter over 5 years ago
Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the developer mailing list.