Project

Profile

Help

Story #23

closed

As a user, I can rest easy in the knowledge that Pulp's Celery tasks are not serialized dangerously

Added by Anonymous over 9 years ago. Updated almost 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
2.8.0
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Pulp 2
Sprint:
Quarter:

Description

Our tasks currently need to use pickle for serialization, because we pass objects as arguments to some of them. We likely also return objects from some tasks.We should either make our objects json serializable, or change our tasks to accept simple types that json can handle.Once this is complete, we should configure Celery to use json, and configure the workers not to accept pickle.


Related issues

Has duplicate Pulp - Issue #575: Warning in the log when starting celery servicesCLOSED - DUPLICATEActions
Actions #1

Updated by rbarlow over 9 years ago

  • Project changed from 22 to Pulp
Actions #2

Updated by bmbouter over 8 years ago

  • Groomed set to No
  • Sprint Candidate set to No
Actions #3

Updated by bmbouter over 8 years ago

  • Groomed changed from No to Yes
Actions #4

Updated by bmbouter over 8 years ago

  • Sprint Candidate changed from No to Yes
Actions #5

Updated by bmbouter over 8 years ago

The branch with the current work is here: https://github.com/bmbouter/pulp/tree/23-farmers-ale

It's currently branched from 2.6, and has 2 additional commits (50d785d and 8bf9633).

Here is what needs to be done:
1) First, rebase or cherry pick against master. The two existing commits will have their hashes changed in the process, but please leave them and their commit messages in place (for attribution). Once rebased into a branch that branches from master, make your commits on top of the existing 2.

2) Reproduce the known issue with this code by having a working pulp installation and running:

pulp-admin rpm repo create --repo-id bar
pulp-admin rpm repo create --repo-id foo
pulp-admin rpm repo copy rpm --from-repo-id=bar --to-repo-id=foo

3) The issue is that the UnitAssociationCriteria is arriving as a dict in the task code, when it needs to be a UnitAssociationCriteria (which inherits from dict). To resolve this, the UnitAssociationCriteria object needs to be explicitly serialized to a dictionary prior to it being sent as an argument of apply_async(). Add a to_dict() method on UnitAssociationCriteria() to return the dict. Find the place where the UnitAssociationCriteria is passed as an argument by looking at what API endpoint is being used by the repo copy rpm command.

4) Add a classmethod to UnitAssociationCriteria which can instantiate and return a UnitAssociationCriteria object from a dictionary that was produced by to_dict().

5) Use the classmethod above in the task code to produce the UnitAssociationCriteria from the dict passed in as an argument

6) Do some other general testing if you like (not required)

7) Write test code for everything

8) Write one specific test which looks through the celery registry and asserts that each entry in it is a subclass of PulpTask.

Actions #6

Updated by bmbouter over 8 years ago

Here is what I put together as a start for item #8 above.

from pulp.server.async import app
from pulp.server import tasks # you have to import this to discover all pulp tasks
from pulp.server.async.tasks import PulpTask

for task_name, task in app.celery.tasks.iteritems():
    if task.startswith('pulp'):
        if not isinstance(task, PulpTask):
            self.fail('task named %s must inherit from %s) % (task_name, PulpTask)
Actions #7

Updated by ipanova@redhat.com over 8 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ipanova@redhat.com
Actions #8

Updated by ipanova@redhat.com over 8 years ago

  • Status changed from ASSIGNED to MODIFIED
Actions #9

Updated by bmbouter over 8 years ago

It's not merged yet, so moving back to POST.

Added by Shubham Bhawsinka over 8 years ago

Revision 2ed8b7f5 | View on GitHub

Progress for story 23

  • Recursive serialization and deserialization
  • Force all Pulp Celery tasks to inherit from PulpTask

Added by bmbouter over 8 years ago

Revision 4ac8b815 | View on GitHub

Removes explicit type validation, and cleans up docstrings some.

Added by ipanova@redhat.com over 8 years ago

Revision ed367081 | View on GitHub

Introducing from_dict() and to_dict()

closes #23 https://pulp.plan.io/issues/23

UnitAssociationCriteria arrives as dict in the task code, when it needs to be UnitAssociationCriteria object. to_dict() method was introduced to serialize UnitAssociationCriteria to a dictionary prior to it being sent as an argument to apply_sync(). the from_dict() method was introduced to return UnitAssociationCriteria object from a dictionary that was produced by to_dict()

This commit also introduces a testcase that checks that all Celery tasks defined in Pulp inhherit from PulpTask and not from Celery directly.

There have been done also changes to criteria. Due to the introduction of de/sereliazions methods, criteria's value cannot be None. Code and tests were updated accordigly.

Added by ipanova@redhat.com over 8 years ago

Revision ed367081 | View on GitHub

Introducing from_dict() and to_dict()

closes #23 https://pulp.plan.io/issues/23

UnitAssociationCriteria arrives as dict in the task code, when it needs to be UnitAssociationCriteria object. to_dict() method was introduced to serialize UnitAssociationCriteria to a dictionary prior to it being sent as an argument to apply_sync(). the from_dict() method was introduced to return UnitAssociationCriteria object from a dictionary that was produced by to_dict()

This commit also introduces a testcase that checks that all Celery tasks defined in Pulp inhherit from PulpTask and not from Celery directly.

There have been done also changes to criteria. Due to the introduction of de/sereliazions methods, criteria's value cannot be None. Code and tests were updated accordigly.

Actions #10

Updated by ipanova@redhat.com over 8 years ago

  • % Done changed from 0 to 100
Actions #11

Updated by amacdona@redhat.com over 8 years ago

  • Platform Release set to 2.8.0
Actions #12

Updated by bmbouter about 8 years ago

This merged change does switch our usage to use the JSON serializer, but we haven't actually disabled the other serializers. As such the following banner is still visible:

Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) /usr/lib/python2.7/site-packages/celery/apps/worker.py:161: CDeprecationWarning:
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) Starting from version 3.2 Celery will refuse to accept pickle by default.
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000)
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) The pickle serializer is a security concern as it may give attackers
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) the ability to execute any command.  It's important to secure
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) your broker from unauthorized access when using pickle, so we think
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) that enabling pickle should require a deliberate action and not be
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) the default choice.
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000)
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) If you depend on pickle then you should set a setting to disable this
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) warning and to be sure that everything will continue working
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) when you upgrade to Celery 3.2::
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000)
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000)     CELERY_ACCEPT_CONTENT = ['pickle', 'json', 'msgpack', 'yaml']
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000)
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000) You must only enable the serializers that you will actually use.
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000)
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000)
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000)   warnings.warn(CDeprecationWarning(W_PICKLE_DEPRECATED))
Jan 22 01:35:01 dev pulp[22849]: py.warnings:WARNING: (22849-36000)

I am fixing this now with a small PR to disable the other serializers.

Added by bmbouter about 8 years ago

Revision 478443a7 | View on GitHub

Disables all other celery serializers except JSON

This removes a scary, multiline warning from celery workers when they start up

https://pulp.plan.io/issues/23 re #23

Added by bmbouter about 8 years ago

Revision 478443a7 | View on GitHub

Disables all other celery serializers except JSON

This removes a scary, multiline warning from celery workers when they start up

https://pulp.plan.io/issues/23 re #23

Actions #14

Updated by bmbouter about 8 years ago

  • Has duplicate Issue #575: Warning in the log when starting celery services added
Actions #15

Updated by dkliban@redhat.com about 8 years ago

  • Status changed from MODIFIED to 5
Actions #16

Updated by dkliban@redhat.com about 8 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #18

Updated by bmbouter almost 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF