Project

Profile

Help

Issue #4387

Epel is shipping a new version of celery which doesn't work with Pulp

Added by bherring about 1 year ago. Updated 11 months ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Severity:
2. Medium
Version:
Platform Release:
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 49

Description

Problem

On the latest build of 2.18.1b1 for 2019-02-06, the following test has a failure when killing amqp that the service never recovers on non-fips boxes. FIPS enabled boxes do not have this problem and also have a different version of python-amqp installed:

RPM diff out

[root@rhel76-2181b1 ~]# diff all_fips.out all_nonfips.out 
66,67d65
< dracut-fips-033-554.el7.x86_64
< dracut-fips-aesni-033-554.el7.x86_64
89a88
> fpaste-0.3.7.4.1-2.el7.noarch
124d122
< hmaccalc-0.9.13-4.el7.x86_64
399c397
< python2-amqp-2.2.2-3.el7.noarch
---
> python2-amqp-2.4.0-1.el7.noarch
401,402c399,400
< python2-billiard-3.5.0.3-3.el7.x86_64
< python2-celery-4.0.2-5.el7.noarch
---
> python2-billiard-3.5.0.5-1.el7.x86_64
> python2-celery-4.2.1-3.el7.noarch
407c405
< python2-django-1.11.17-1.el7.noarch
---
> python2-django-1.11.18-1.el7.noarch
410c408
< python2-kombu-4.0.2-9.el7.noarch
---
> python2-kombu-4.2.2-1.el7.noarch
418,419c416
< python2-vine-1.1.3-4.el7.noarch
< python-anyjson-0.3.3-3.el7.noarch
---
> python2-vine-1.2.0-1.el7.noarch
434a432
> python-django-bash-completion-1.11.18-1.el7.noarch
555a554
> shark-0.1-1.noarch
562a562
> stork-0.12-2.noarch
583a584
> whale-0.2-1.noarch
592d592
< yum-plugin-priorities-1.1.31-50.el7.noarch
[root@rhel76-2181b1 ~]#

Test Failure

an 29 16:51:37 localhost.localdomain systemd[1]: Stopping An AMQP message broker daemon....
Jan 29 16:51:37 localhost.localdomain pulp[31599]: gofer.messaging.adapter.qpid.reliability:WARNING: connection aborted
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256) consumer: Connection to broker lost. Trying to re-establish the connection...
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256) Traceback (most recent call last):
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)   File "/usr/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 317, in start
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)     blueprint.start(self)
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)     step.start(parent)
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)   File "/usr/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 593, in start
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)     c.loop(*c.loop_args())
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)   File "/usr/lib/python2.7/site-packages/celery/worker/loops.py", line 121, in synloop
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)     connection.drain_events(timeout=2.0)
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 301, in drain_events
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)     return self.transport.drain_events(self.connection, **kwargs)
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1693, in drain_events
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)     receiver = self.session.next_receiver(timeout=timeout)
Jan 29 16:51:37 localhost.localdomain pulp[31580]: celery.worker.consumer.consumer:WARNING: (31580-24256)   File "<string>", line 6, in next_receiver

Test Recation

Run the following tests from Pulp-2-Test

  • pytest -sv pulp_2_tests/tests/rpm/api_v2/test_service_resiliency.py
amqp-failure.log (24.5 KB) Failure seen on bherring, 02/06/2019 09:10 PM amqp-failure.log
all_fips.out (19.8 KB) bherring, 02/06/2019 09:14 PM all_fips.out
all_nonfips.out (19.8 KB) bherring, 02/06/2019 09:14 PM all_nonfips.out
fips.out (1.68 KB) bherring, 02/06/2019 09:14 PM fips.out
nonfips.out (1.68 KB) bherring, 02/06/2019 09:14 PM nonfips.out

Related issues

Related to Pulp - Test #4359: 2.18.1 Testing CLOSED - COMPLETE Actions
Related to Pulp - Task #4388: Upgrade to Celery 4.3 CLOSED - WONTFIX Actions
Related to Pulp - Task #4402: Add docs to tell users they have to use Pulp's version of celery CLOSED - WONTFIX Actions

Associated revisions

Revision c15ed69d View on GitHub
Added by bherring about 1 year ago

Changing Ansible Installer priorities with current 2.18.1 definition

To solve an EPEL version problem with the python-celery stack, the final solution during build has changed the EPOCH on NEVRA for those RPMs to 10:

The resulting files needed to be updated to:

  • use priority on FIPS with EPEL
  • do not use priority on non-FIPS with EPEL

The end state of each of those upgrade permutations leave the box in the same RPM state without the end-user to have to play with yum or repository priorities.

Removing the pulp-priority role as it is now deprecated with these changes.

The included, tested changes are the output of the solution for CI.

refs #4387

History

#1 Updated by bherring about 1 year ago

#2 Updated by daviddavis about 1 year ago

  • Related to Task #4388: Upgrade to Celery 4.3 added

#3 Updated by daviddavis about 1 year ago

  • Subject changed from Difference between python2-amqp on fips vs. non-fips causing regression to Epel is shipping a new version of celery which doesn't work with Pulp

#4 Updated by daviddavis about 1 year ago

We can't upgrade celery beyond 4.0 due to some regressions like this one:

https://github.com/celery/celery/issues/3802#issuecomment-407024765

#5 Updated by bmbouter about 1 year ago

I filed a bugzilla against EPEL7 requesting it be downgraded. That is here: https://bugzilla.redhat.com/show_bug.cgi?id=1674032

#6 Updated by daviddavis about 1 year ago

  • Related to Task #4402: Add docs to tell users they have to use Pulp's version of celery added

#7 Updated by CodeHeeler about 1 year ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 48

#8 Updated by bmbouter about 1 year ago

We heard back from the EPEL ticket. EPEL won't be downgrading to the lower stack so that leaves us with 2 options:

1. "fix celery"
2. figure out how to get users to use the previous bits they were using

I think option 1 will create instability for our users. Pulp should not try to fix Celery. We've spent over 1000 engineering hours on Celery-only bugs, and many of them either regressed in a later version or were never resolved due to "architectural problems" like fork-thread-safety issues, etc. Celery has no CI so every upgrade is risky. Pulp isn't in a position to resolve this for Celery. Pulp3 has moved away from using Celery entirely, so at this point I think it's mainly about minimizing risk for the Pulp 2 users.

I recommend option 2, which will be to configure repository priorities to have the Pulp shipped repo take precedent over the EPEL bits. It's not great, but I do think it will limit risk, which on the 2.y line is I think a good thing. I believe Katello configures repo priorities currently.

#9 Updated by gmbnomis about 1 year ago

Perhaps it's naive, but can't we change the dependency to celery (currently python-celery >= 4.0.0-0) in the pulp-server RPM? (and possibly other dependencies)

#10 Updated by bmbouter about 1 year ago

That would resolve the packaging issue but the newer version in EPEL have regressions which affect Pulp so after installing it will error in various ways.

#11 Updated by daviddavis about 1 year ago

I think it would work setting Requires: python-celery >= 4.0.0, python-celery < 4.1.0 in the pulp spec[0]. We're going to try that out to see if it works.

[0] https://git.io/fh7PE

#12 Updated by daviddavis about 1 year ago

Ran into a problem. It looks like when a user runs "yum update," it still tries to update celery to 4.2 and then hits this package conflict:

---> Package python2-celery.noarch 0:4.0.2-5.el7 will be updated
--> Processing Dependency: python2-celery < 4.1 for package: pulp-server-2.18.1-0.1.beta.git.107.5a7de45.el7.noarch
--> Processing Conflict: pulp-server-2.18.1-0.1.beta.git.107.5a7de45.el7.noarch conflicts python2-celery >= 4.1
--> Finished Dependency Resolution
Error: Package: pulp-server-2.18.1-0.1.beta.git.107.5a7de45.el7.noarch (pulp)
           Requires: python2-celery < 4.1
           Removing: python2-celery-4.0.2-5.el7.noarch (@pulp)
               python2-celery = 4.0.2-5.el7
           Updated By: python2-celery-4.2.1-3.el7.noarch (epel)
               python2-celery = 4.2.1-3.el7
Error: pulp-server conflicts with python2-celery-4.2.1-3.el7.noarch
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

#13 Updated by rchan about 1 year ago

  • Sprint changed from Sprint 48 to Sprint 49

#14 Updated by daviddavis about 1 year ago

We have decided to proceed with pinning python2-celery to < 4.1. Users should add --exclude=python2-celery when running yum update.

#15 Updated by mono2stereo about 1 year ago

--exclude=python2-celery is not enough. I had to add --exclude=python2-celery --exclude=python2-kombu --exclude=python2-amqp --exclude=python2-vine because of transitive version dependencies.

#16 Updated by daviddavis about 1 year ago

@mono2stereo thank you.

#17 Updated by bherring about 1 year ago

QE Notes

@mono2stereo , thanks for the notes.

The follow were the excludes initially used [0]:

exclude=python2-kombu python2-celery python2-amqp python2-vine

The final solution that was pushed out changed the EPOCH (10:) on the NEVRA for these pulp owned RPMs. Therefore, the excludes are not required with the current 2.18.1 GA bits that are pushed.

This has been verified in #4359 with:

  • FIPS, EPEL enabled, priority enabled
  • Non-FIPS, EPEL not enabled, priority NOT enabled

Will be pushing a PR for the CI being ran on upgrade to the permutations listed above.

[0] - https://pulp.plan.io/issues/4359#note-22

#18 Updated by daviddavis 12 months ago

  • Status changed from NEW to CLOSED - CURRENTRELEASE

This has been resolved by bumping the epoch on celery and its dependencies to 10.

#19 Updated by bmbouter 11 months ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF