Project

Profile

Help

Issue #8997

closed

Ensure that Pulp can function if Redis was configured and available when Pulp starts, but has an availability issue while Pulp is running

Added by dalley almost 3 years ago. Updated over 2 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 110
Quarter:

Description

With the non-redis tasking system, all attempts to connect to or communicate with Redis should be tolerant of both connection failures and keys "going missing" - especially regarding to the serving of content. Redis should be a non-critical operational component in all respects.


Related issues

Related to Pulp - Task #8805: Cache the responses of the content appCLOSED - CURRENTRELEASEgerrod

Actions
Blocked by Pulp - Issue #9070: Remove Redis from status information if unusedCLOSED - CURRENTRELEASEActions
Actions #1

Updated by dalley almost 3 years ago

  • Related to Task #8805: Cache the responses of the content app added
Actions #2

Updated by dalley almost 3 years ago

  • Description updated (diff)
Actions #3

Updated by dalley almost 3 years ago

  • Description updated (diff)
Actions #4

Updated by dalley almost 3 years ago

  • Sprint set to Sprint 100
Actions #5

Updated by ekohl almost 3 years ago

  • Blocked by Issue #9070: Remove Redis from status information if unused added
Actions #6

Updated by dkliban@redhat.com almost 3 years ago

  • Triaged changed from No to Yes
Actions #7

Updated by rchan almost 3 years ago

  • Sprint changed from Sprint 100 to Sprint 101
Actions #8

Updated by ipanova@redhat.com over 2 years ago

  • Sprint changed from Sprint 101 to Sprint 102
Actions #9

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 102 to Sprint 103
Actions #10

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 103 to Sprint 104
Actions #11

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 104 to Sprint 105
Actions #12

Updated by lmjachky over 2 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to lmjachky
Actions #13

Updated by lmjachky over 2 years ago

When running some functional tests on a VM which does not contain pre-installed redis (I have got removed the pulp_redis role + https://github.com/pulp/pulp_installer/blob/adecdf7b8c146c88916d63d88419b5e9083f4e1f/roles/pulp_redis/defaults/main.yml in pulp_installer), the following error is raised:

        {
            "child_tasks": [],
            "created_resources": [],
            "error": {
                "description": "Error 111 connecting to localhost:6379. Connection refused.",
                "traceback": "  File \"/home/vagrant/devel/pulpcore/pulpcore/tasking/pulpcore_worker.py\", line 350, in _perform_task\n    result = func(*args, **kwargs)\n  File \"/home/vagrant/devel/pulpcore/pulpcore/app/tasks/base.py\", line 88, in general_delete\n    instance.delete()\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/django_lifecycle/mixins.py\", line 145, in delete\n    self._run_hooked_methods(BEFORE_DELETE)\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/django_lifecycle/mixins.py\", line 218, in _run_hooked_methods\n    method(self)\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/django_lifecycle/decorators.py\", line 69, in func\n    hooked_method(*args, **kwargs)\n  File \"/home/vagrant/devel/pulpcore/pulpcore/app/models/publication.py\", line 501, in invalidate_cache\n    Cache().delete(base_key=self.base_path)\n  File \"/home/vagrant/devel/pulpcore/pulpcore/cache/cache.py\", line 72, in delete\n    return self.redis.delete(*base_key)\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/client.py\", line 1567, in delete\n    return self.execute_command('DEL', *names)\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/client.py\", line 898, in execute_command\n    conn = self.connection or pool.get_connection(command_name, **options)\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/connection.py\", line 1192, in get_connection\n    connection.connect()\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/connection.py\", line 563, in connect\n    raise ConnectionError(self._error_message(e))\n"
            },
            "finished_at": "2021-09-22T13:33:41.590589Z",
            "logging_cid": "a855e8e8df074006881ab6720e7bb99a",
            "name": "pulpcore.app.tasks.base.general_delete",
            "parent_task": null,
            "progress_reports": [],
            "pulp_created": "2021-09-22T13:33:41.515332Z",
            "pulp_href": "/pulp/api/v3/tasks/20fade55-d769-48c4-a46a-1856323958d2/",
            "reserved_resources_record": [
                "/api/v3/distributions/"
            ],
            "started_at": "2021-09-22T13:33:41.564240Z",
            "state": "failed",
            "task_group": null,
            "worker": "/pulp/api/v3/workers/39da0498-71c0-461d-b97c-9493d7b984e9/"
        },

Actions #14

Updated by lmjachky over 2 years ago

Seems like we need to update the pulpcore's settings file (REDIS_*) and unit tests along with the caching workflow because we implicitly expect redis to be set up (https://github.com/pulp/pulpcore/blob/55b33d0d7fe9144a2f4dd33a020a627a59a8843a/pulpcore/cache/cache.py#L29 -> https://github.com/pulp/pulpcore/blob/55b33d0d7fe9144a2f4dd33a020a627a59a8843a/pulpcore/app/redis_connection.py#L10-L31). Once there is no redis present, random errors are raised across the functions which utilize caching. This state can be also caused when a user incorrectly configures Pulp, i.e., CACHE_ENABLED=True (I think we should make Pulp fault-tolerant either way).

Actions #15

Updated by bmbouter over 2 years ago

The situation as I understand it is that these Redis is expected by default, so these settings https://github.com/pulp/pulpcore/blob/master/pulpcore/app/settings.py#L201-L207 would be "required settings" which we likely aren't calling out as required currently as we should. For example here https://github.com/pulp/pulpcore/blob/master/docs/configuration/settings.rst#redis-settings

The fact that Redis is expected and those settings are required is a problem, but I don't believe the problem this ticket is about. To me a separate issue should be filed to switch Pulp to not requiring Redis by default, and then if Redis is configured, use it. We should talk about this at the pulpcore meeting on Tuesday.

This issue though (I believe) is only for deployments that have Redis configured, and when Pulp starts it connects just fine to Redis, but then while running Redis is no longer available either due to Redis being shutdown (simulating a failure event of the Redis service) or due to a network issue preventing Pulp from reaching Redis. To solve that I think you should:

  1. Start Pulp and have it connect to Redis (default installer config)
  2. Run the functional test suite for pulpcore, pulp_file, and pulp_rpm (as plugin examples) and verify there are no failures
  3. Stop Redis
  4. Run that same test suite again, and watch for failures <--- this is the key to identifying how Pulp isn't fault tolerant of Redis availability issues
Actions #16

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 105 to Sprint 106
Actions #17

Updated by bmbouter over 2 years ago

  • Subject changed from Ensure that Pulp can function without Redis to Ensure that Pulp can function if Redis was configured and available when Pulp starts, but has an availability issue while Pulp is running
Actions #18

Updated by pulpbot over 2 years ago

  • Status changed from ASSIGNED to POST
Actions #19

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 106 to Sprint 107
Actions #20

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 107 to Sprint 108
Actions #21

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 108 to Sprint 109
Actions #22

Updated by rchan over 2 years ago

  • Sprint changed from Sprint 109 to Sprint 110

Added by Lubos Mjachky over 2 years ago

Revision 206e8135 | View on GitHub

Add a wrapper around all redis calls

closes #8997

Actions #23

Updated by Anonymous over 2 years ago

  • Status changed from POST to MODIFIED
Actions #24

Updated by pulpbot over 2 years ago

  • Sprint/Milestone set to 3.17.0
Actions #25

Updated by pulpbot over 2 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF