Project

Profile

Help

Story #2014

closed

As a user, Pulp can have smart task distribution

Added by tmlcoch over 7 years ago. Updated almost 5 years ago.

Status:
CLOSED - WONTFIX
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Use case:
=========

  • I have dozens of repo groups defined in Pulp and every repo group has several repositories.
  • I export all the repo groups via group export distributor as DVD ISOs images on regular basis.
  • The biggest repo group currently contains 21 DVD ISOs (more than 90GB).
  • There are ~10 repo groups that have more than 10 DVD ISOs (usually about 14-19 DVD ISOs).
  • There is a machine with almost 0,5 TB work dir which is shared between 8 Pulp workers (8 threads) which run there.

Problem:
========
Pulp often fails because of "No space left on device", because it delegates group export tasks to all available workers (threads) on the machine and the available 0,5TB is not sufficient.

I was told that there is no way how to tell pulp to behave more intelligently like "Hey here you got 8 workers you can use all of them for regular work, but for group export distribution use max 4 of them at a time".

Proposed solution:
==================
Let's take inspiration from Koji [1]. It provides several options how to load-balance tasks:

1) Channels
-----------
Every host can be member of several channels [2].
Host can take only tasks that belongs to channels he is member of.

2) Task weight and capacity
---------------------------
Every host can work on several tasks at a time. Every task has some weight (createrepo, build, ...) and every host has some capacity that reflects its resources. For example a host A with capacity 10 can work on 10 FOO tasks with weight 1 at a time or the same host can work on 2 BAR tasks with weight 5 or any other possible combinations.

Thanks to these features, one can easily build and run reliable infrastructure full of heterogeneous tasks.
Because Pulp tasks are also heterogeneous (it a big difference if you generate 20MB of repodata for one repo or 90GB of ISOs for one repo group), it should have similar features. I would suggest to use Koji at least as an inspiration because it's here for a long time and it's time proven.

Regards
Tomas

[1] https://fedoraproject.org/wiki/Koji
[2] http://koji.fedoraproject.org/koji/hostinfo?hostID=154

Also available in: Atom PDF