Project

Profile

Help

Issue #8997

Ensure that Pulp can function without Redis

Added by dalley 3 months ago. Updated 4 days ago.

Status:
ASSIGNED
Priority:
High
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Katello
Sprint:
Sprint 106
Quarter:

Description

With the non-redis tasking system, all attempts to connect to or communicate with Redis should be tolerant of both connection failures and keys "going missing" - especially regarding to the serving of content. Redis should be a non-critical operational component in all respects.


Related issues

Related to Pulp - Task #8805: Cache the responses of the content appCLOSED - CURRENTRELEASE

<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>
Blocked by Pulp - Issue #9070: Remove Redis from status information if unusedCLOSED - CURRENTRELEASE<a title="Actions" class="icon-only icon-actions js-contextmenu" href="#">Actions</a>

History

#1 Updated by dalley 3 months ago

  • Related to Task #8805: Cache the responses of the content app added

#2 Updated by dalley 3 months ago

  • Description updated (diff)

#3 Updated by dalley 3 months ago

  • Description updated (diff)

#4 Updated by dalley 3 months ago

  • Sprint set to Sprint 100

#5 Updated by ekohl 3 months ago

  • Blocked by Issue #9070: Remove Redis from status information if unused added

#6 Updated by dkliban@redhat.com 3 months ago

  • Triaged changed from No to Yes

#7 Updated by rchan 2 months ago

  • Sprint changed from Sprint 100 to Sprint 101

#8 Updated by ipanova@redhat.com about 2 months ago

  • Sprint changed from Sprint 101 to Sprint 102

#9 Updated by rchan about 2 months ago

  • Sprint changed from Sprint 102 to Sprint 103

#10 Updated by rchan about 1 month ago

  • Sprint changed from Sprint 103 to Sprint 104

#11 Updated by rchan 18 days ago

  • Sprint changed from Sprint 104 to Sprint 105

#12 Updated by lmjachky 11 days ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to lmjachky

#13 Updated by lmjachky 6 days ago

When running some functional tests on a VM which does not contain pre-installed redis (I have got removed the pulp_redis role + https://github.com/pulp/pulp_installer/blob/adecdf7b8c146c88916d63d88419b5e9083f4e1f/roles/pulp_redis/defaults/main.yml in pulp_installer), the following error is raised:

        {
            "child_tasks": [],
            "created_resources": [],
            "error": {
                "description": "Error 111 connecting to localhost:6379. Connection refused.",
                "traceback": "  File \"/home/vagrant/devel/pulpcore/pulpcore/tasking/pulpcore_worker.py\", line 350, in _perform_task\n    result = func(*args, **kwargs)\n  File \"/home/vagrant/devel/pulpcore/pulpcore/app/tasks/base.py\", line 88, in general_delete\n    instance.delete()\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/django_lifecycle/mixins.py\", line 145, in delete\n    self._run_hooked_methods(BEFORE_DELETE)\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/django_lifecycle/mixins.py\", line 218, in _run_hooked_methods\n    method(self)\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/django_lifecycle/decorators.py\", line 69, in func\n    hooked_method(*args, **kwargs)\n  File \"/home/vagrant/devel/pulpcore/pulpcore/app/models/publication.py\", line 501, in invalidate_cache\n    Cache().delete(base_key=self.base_path)\n  File \"/home/vagrant/devel/pulpcore/pulpcore/cache/cache.py\", line 72, in delete\n    return self.redis.delete(*base_key)\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/client.py\", line 1567, in delete\n    return self.execute_command('DEL', *names)\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/client.py\", line 898, in execute_command\n    conn = self.connection or pool.get_connection(command_name, **options)\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/connection.py\", line 1192, in get_connection\n    connection.connect()\n  File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/connection.py\", line 563, in connect\n    raise ConnectionError(self._error_message(e))\n"
            },
            "finished_at": "2021-09-22T13:33:41.590589Z",
            "logging_cid": "a855e8e8df074006881ab6720e7bb99a",
            "name": "pulpcore.app.tasks.base.general_delete",
            "parent_task": null,
            "progress_reports": [],
            "pulp_created": "2021-09-22T13:33:41.515332Z",
            "pulp_href": "/pulp/api/v3/tasks/20fade55-d769-48c4-a46a-1856323958d2/",
            "reserved_resources_record": [
                "/api/v3/distributions/"
            ],
            "started_at": "2021-09-22T13:33:41.564240Z",
            "state": "failed",
            "task_group": null,
            "worker": "/pulp/api/v3/workers/39da0498-71c0-461d-b97c-9493d7b984e9/"
        },

#14 Updated by lmjachky 5 days ago

Seems like we need to update the pulpcore's settings file (REDIS_*) and unit tests along with the caching workflow because we implicitly expect redis to be set up (https://github.com/pulp/pulpcore/blob/55b33d0d7fe9144a2f4dd33a020a627a59a8843a/pulpcore/cache/cache.py#L29 -> https://github.com/pulp/pulpcore/blob/55b33d0d7fe9144a2f4dd33a020a627a59a8843a/pulpcore/app/redis_connection.py#L10-L31). Once there is no redis present, random errors are raised across the functions which utilize caching. This state can be also caused when a user incorrectly configures Pulp, i.e., CACHE_ENABLED=True (I think we should make Pulp fault-tolerant either way).

#15 Updated by bmbouter 5 days ago

The situation as I understand it is that these Redis is expected by default, so these settings https://github.com/pulp/pulpcore/blob/master/pulpcore/app/settings.py#L201-L207 would be "required settings" which we likely aren't calling out as required currently as we should. For example here https://github.com/pulp/pulpcore/blob/master/docs/configuration/settings.rst#redis-settings

The fact that Redis is expected and those settings are required is a problem, but I don't believe the problem this ticket is about. To me a separate issue should be filed to switch Pulp to not requiring Redis by default, and then if Redis is configured, use it. We should talk about this at the pulpcore meeting on Tuesday.

This issue though (I believe) is only for deployments that have Redis configured, and when Pulp starts it connects just fine to Redis, but then while running Redis is no longer available either due to Redis being shutdown (simulating a failure event of the Redis service) or due to a network issue preventing Pulp from reaching Redis. To solve that I think you should:

  1. Start Pulp and have it connect to Redis (default installer config)
  2. Run the functional test suite for pulpcore, pulp_file, and pulp_rpm (as plugin examples) and verify there are no failures
  3. Stop Redis
  4. Run that same test suite again, and watch for failures <--- this is the key to identifying how Pulp isn't fault tolerant of Redis availability issues

#16 Updated by rchan 4 days ago

  • Sprint changed from Sprint 105 to Sprint 106

Please register to edit this issue

Also available in: Atom PDF