Issue #6098: Pulp gunicorn times out when trying to list all packages - Pulp

Actions

Send by e-mail Copy link

Issue #6098

closed

Pulp gunicorn times out when trying to list all packages

Added by mdepaulo@redhat.com about 4 years ago. Updated over 3 years ago.

Status:

CLOSED - WONTFIX

Priority:

Normal

Assignee:

Category:

Installer - Moved to GitHub issues

Sprint/Milestone:

Start date:

Due date:

Estimated time:

Severity:

2. Medium

Version:

Platform Release:

OS:

Triaged:

Yes

Groomed:

Sprint Candidate:

Yes

Tags:

Sprint:

Quarter:

Description

See this thread for discussion and proposed solution: https://www.redhat.com/archives/pulp-list/2020-February/msg00011.html

This will affect containers as well.

Actions

Copy link

Updated by fao89 about 4 years ago

Triaged changed from No to Yes

Actions

Copy link

Updated by mdepaulo@redhat.com about 4 years ago

Bug triage discussion:

mikedep333> Q: I would pulpcore-content need a longer timeout, or just pulpcore-api?
10:35 
<daviddavis> hay_sup just pulpcore-api I think
10:35 <mikedep333> Yeah. The question is what should be the default value then. We could make it default to undefined, and if undefined, let the system's default take effect.
10:36 
<daviddavis> hay_sup I'm fine with that I suppose
10:36 <mikedep333> which is 30s
10:36 But I want the ansible-pulp value to be safe for most users, so we might want to set it to 120s.
10:36 
<fao89> Fabricio Aguiar 120s sounds good to me
10:36 
<ggainey> Grant Gainey yeah, user said "Set the timeout to 120s. The query of all packages in rhel7 actually took less than a minute.  Default 30s seems a little too short."
10:37 so +1 to 120s
10:37 
<daviddavis> hay_sup I was thinking 60 but 120 is ok
10:37 
<fao89> Fabricio Aguiar we can try both
10:37 
<ggainey> Grant Gainey sure
10:37 <mikedep333> Great, I'll do that.
10:37 
<fao89> Fabricio Aguiar !accept
10:37 
10:37 <mikedep333> 120s, since a user's virtual machine might be having a slow day. Or a user may have 20,000 packages.
<ggainey> Grant Gainey mikedep333: yeah - some rhel repos are 15-17K, iirc
10:38 
10:39 <mikedep333> ggainey: EPEL is 13512. If a RHEL repo is 15-17K, then 90s is probably typical, and a slow machine would need like 180s.
10:39 
<ggainey> Grant Gainey +1
10:40 
<fao89> Fabricio Aguiar RHEL 7 is 20000+
10:40 !accept
10:40 
<daviddavis> hay_sup mikedep333 ggainey any idea what the use case is for querying everything at once?
10:40 
<ggainey> Grant Gainey yeah, the longer since release, the larger, since rhel keeps all nevras
10:40 
<fao89> Fabricio Aguiar Open floor!
10:40 
<daviddavis> hay_sup we sync from APIs all the time and just page through them
10:41 <mikedep333> fao89: https://pkgs.org/ says that CentOS 7's main repo (the RHEL7 main repo + RHEL7 optional) is 10,047.
10:42 
<ggainey> Grant Gainey daviddavis: "what's in this repo" is def something ppl want to know - "I want to pull *everything* into my graphing-routine", for example - you def could say "use the paging API", but I guarantee users will push back, and it's a pretty simple fix to just raise the default to something larger-but-not-unreasonable
10:42 ⇐ ttereshc quit (ttereshc@nat/redhat/x-tlieoomzpmejjtnt) Quit: Leaving
10:42 <mikedep333> We have to assume that some people are going to get the entire list, and then parse it in something else.
10:42 
<ggainey> Grant Gainey yup
10:43 ETL is A Thing
10:43 
<fao89> Fabricio Aguiar mikedep333, I got it when I was looking into performance problem: https://www.redhat.com/archives/pulp-dev/2019-November/msg00084.html
10:43 
<daviddavis> hay_sup mikedep333 ggainey yea, but I would assume it should be on users to raise their default. my concern is that a high default isn't good for most users
10:43 <mikedep333> When I was doing systems engineering at a Linux distro user org with, we did that
10:44 
<ggainey> Grant Gainey daviddavis: when does a high default hurt? 
10:44 <mikedep333> ^
10:44 
<ggainey> Grant Gainey daviddavis: just in the "you can't really get there" case?
10:44 
<daviddavis> hay_sup when you don't realize you're running a huge query
10:44 <mikedep333> I mean, if a task will hang, it will hurt users.
10:44 
<ggainey> Grant Gainey (which is def A Thing as well, yeah)
10:45 <mikedep333> What I could do is set the value in the pulp_rpm_prerequisites role. So the order of precedence would be:
10:45 highest: user-specified
10:45 medium: pulp_rpm_prerequisites specified
10:45 
<daviddavis> hay_sup if users are submitting requests that take 180s or more than they're really taxing the system and they may not realize it
10:45 <mikedep333> lowest: pulp3 default
10:46 
<daviddavis> hay_sup IMO, I think we should set the default to 120 and make it easy for users to change it
10:46 mikedep333: that makes sense to me
10:46 <mikedep333> daviddavis: I say we do that. We can always increase it later.
10:46 
<ggainey> Grant Gainey that works for me - 120s is a reasonable compromise (imho, anyway)
10:46 <mikedep333> And users will have a simple variable to chang eit.
10:46 
<daviddavis> hay_sup +1
10:46 
<fao89> Fabricio Aguiar +1
10:46 
<ggainey> Grant Gainey plus, "and users can change it" gives us a chance in the "how to change it" doc to say "you think you want to do this, but you really don't, and here's why" :)
10:47 coolio
10:47 <mikedep333> ggainey: I agree, I'll list the implications.
10:47 
<daviddavis> hay_sup cool
10:47 
<ggainey> Grant Gainey sounds like a fine plan

Actions

Copy link