Document Pulp3 Hardware Requirements recommendations
This question came up in our channel, we should put this info into the docs in the Architecture and Deploying page in a new section called "Hardware Requirements".
Here's some text that was written in the channel about it:
13:26 <cognifloyd> For a single VM install of Pulp 3 (using a django-storages backend for artifact storage so that artifacts aren't in the VM) how much CPU/RAM/Disk should I expect to need in that VM? There will be yum repos for CentOS 6/7/8 + EPEL 6/7/8, and the pypi index, and a few custom file repos. Are there any rule of thumbs to help me initially size this thing? 13:29 <-- lhc130 (~Sam@2a00:23c7:5187:4f00:4e05:623d:9a92:2739) has quit (Ping timeout: 246 seconds) 13:32 <-- shaunm (~shaunm@2600:2b00:9404:7600:2505:1b5f:29f1:a21a) has quit (Quit: shaunm) 13:32 --> shaunm (~shaunm@2600:2b00:9404:7600:c8ea:c1f6:573a:b20) has joined #pulp 13:39 <-- x9c4 (~firstname.lastname@example.org) has quit (Quit: Leaving) 13:43 --> pgagne_ (~email@example.com) has joined #pulp 13:46 <-- pgagne (~firstname.lastname@example.org) has quit (Ping timeout: 260 seconds) 13:58 <-- orabin (~email@example.com) has quit (Read error: Connection reset by peer) 14:50 --> pombreda_ (~firstname.lastname@example.org) has joined #pulp 15:05 <cognifloyd> Next question: 15:08 <cognifloyd> Once I get the basic pulp set up, I'll be looking at building a pulp 3 plugin for a file-like artifact I have to deal with that has some annoying encryption requirements. ie The artifact should be encrypted in the django-storages backend, and pulp must not have the key to decrypt it. Clients will be given a key to decrypt those artifacts. Has anything like this been done? I think a plain file repo would work, but I'm wondering if pulp 15:08 <cognifloyd> d need special support since these would be encrypted. 15:26 <bmbouter> cognifloyd: we don't have have sizing recommendations unfortunately, but I can give some anecdotal info 15:26 <bmbouter> cpu count should equal the number of pulp workers you start, which allows you to perform N repository operations concurrently 15:26 <bmbouter> so 2 cpus, you can sync 2 repos concurrently 15:28 <bmbouter> RAM tends to hit it's high watermark during sync and then go back down to nominal levels, so for N workers I'd say plan on a gig for each and then maybe 1 gig for postgres as a start 15:28 <bmbouter> so for 2 workers, 3 gigs total (2 for sync use, 1 for postgresql) 15:28 <bmbouter> our dev machines typically have 2-4 G and we never oom 15:29 <bmbouter> for disk it's the size of the repos you want all added together. pulp de-duplicates content so even as you sync those over time they tend not to grow very muh 15:29 <bmbouter> much 15:29 <bmbouter> I'm not sure what centos6/7/8 + el 6/7/8 is these days but maybe 400G? 15:30 <-- ipanova (~email@example.com) has quit (Quit: Leaving.) 15:30 <cognifloyd> 400G (ish) for the artifacts or the metadata? 15:30 <bmbouter> in terms of the encryption requirements I think pulp_file would work just fine for you, pulp doesn't need to read/parse the binary data it stores ever, it just needs to calculate the checksums and it can do that on the encrypted data 15:31 <bmbouter> 400G ish for the artifacts 15:31 <bmbouter> the metadata is very small and lives in the db 15:31 <cognifloyd> I'm not concerned about the filesize of the artifacts as I'll have them stored in azure blob storage. 15:31 <bmbouter> oh right 15:31 <bmbouter> you said that 15:31 * cognifloyd would prefer to use GCP, but a client demanded we use azure instead. Bummer 15:31 <cognifloyd> ;) 15:32 <bmbouter> your disk can be small enough to provide working storage during sync prior to blobs being placed on the backend, so maybe 50G would do it all 15:32 <bmbouter> pulp verifies checksum data locally and artifacts download/verify in parallel so 50G is probably more than you'll need but it's a bit hard to predict 15:32 <cognifloyd> ah. ok. Thanks for some starting point rules of thumb. I should be able to adjust from there :) 15:33 <bmbouter> yw, if you can share what you find with use we'd love to hear. also let us know if anything could be better or doesn't work 15:33 <cognifloyd> will do 15:34 <cognifloyd> I really like the pulp 3 architecture with versioned repos (an entire repo metadata rollback sounds awesome). And I hate running Java, so a lot of the other artifact repositories left me with a horrible taste in my mouth. Python is awesome.