Issue #8997
closedEnsure that Pulp can function if Redis was configured and available when Pulp starts, but has an availability issue while Pulp is running
Description
With the non-redis tasking system, all attempts to connect to or communicate with Redis should be tolerant of both connection failures and keys "going missing" - especially regarding to the serving of content. Redis should be a non-critical operational component in all respects.
Related issues
Updated by dalley over 3 years ago
- Related to Task #8805: Cache the responses of the content app added
Updated by ekohl over 3 years ago
- Blocked by Issue #9070: Remove Redis from status information if unused added
Updated by ipanova@redhat.com over 3 years ago
- Sprint changed from Sprint 101 to Sprint 102
Updated by lmjachky over 3 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to lmjachky
Updated by lmjachky about 3 years ago
When running some functional tests on a VM which does not contain pre-installed redis (I have got removed the pulp_redis role + https://github.com/pulp/pulp_installer/blob/adecdf7b8c146c88916d63d88419b5e9083f4e1f/roles/pulp_redis/defaults/main.yml in pulp_installer), the following error is raised:
{
"child_tasks": [],
"created_resources": [],
"error": {
"description": "Error 111 connecting to localhost:6379. Connection refused.",
"traceback": " File \"/home/vagrant/devel/pulpcore/pulpcore/tasking/pulpcore_worker.py\", line 350, in _perform_task\n result = func(*args, **kwargs)\n File \"/home/vagrant/devel/pulpcore/pulpcore/app/tasks/base.py\", line 88, in general_delete\n instance.delete()\n File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/django_lifecycle/mixins.py\", line 145, in delete\n self._run_hooked_methods(BEFORE_DELETE)\n File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/django_lifecycle/mixins.py\", line 218, in _run_hooked_methods\n method(self)\n File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/django_lifecycle/decorators.py\", line 69, in func\n hooked_method(*args, **kwargs)\n File \"/home/vagrant/devel/pulpcore/pulpcore/app/models/publication.py\", line 501, in invalidate_cache\n Cache().delete(base_key=self.base_path)\n File \"/home/vagrant/devel/pulpcore/pulpcore/cache/cache.py\", line 72, in delete\n return self.redis.delete(*base_key)\n File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/client.py\", line 1567, in delete\n return self.execute_command('DEL', *names)\n File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/client.py\", line 898, in execute_command\n conn = self.connection or pool.get_connection(command_name, **options)\n File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/connection.py\", line 1192, in get_connection\n connection.connect()\n File \"/usr/local/lib/pulp/lib64/python3.9/site-packages/redis/connection.py\", line 563, in connect\n raise ConnectionError(self._error_message(e))\n"
},
"finished_at": "2021-09-22T13:33:41.590589Z",
"logging_cid": "a855e8e8df074006881ab6720e7bb99a",
"name": "pulpcore.app.tasks.base.general_delete",
"parent_task": null,
"progress_reports": [],
"pulp_created": "2021-09-22T13:33:41.515332Z",
"pulp_href": "/pulp/api/v3/tasks/20fade55-d769-48c4-a46a-1856323958d2/",
"reserved_resources_record": [
"/api/v3/distributions/"
],
"started_at": "2021-09-22T13:33:41.564240Z",
"state": "failed",
"task_group": null,
"worker": "/pulp/api/v3/workers/39da0498-71c0-461d-b97c-9493d7b984e9/"
},
Updated by lmjachky about 3 years ago
Seems like we need to update the pulpcore's settings file (REDIS_*) and unit tests along with the caching workflow because we implicitly expect redis to be set up (https://github.com/pulp/pulpcore/blob/55b33d0d7fe9144a2f4dd33a020a627a59a8843a/pulpcore/cache/cache.py#L29 -> https://github.com/pulp/pulpcore/blob/55b33d0d7fe9144a2f4dd33a020a627a59a8843a/pulpcore/app/redis_connection.py#L10-L31). Once there is no redis present, random errors are raised across the functions which utilize caching. This state can be also caused when a user incorrectly configures Pulp, i.e., CACHE_ENABLED=True
(I think we should make Pulp fault-tolerant either way).
Updated by bmbouter about 3 years ago
The situation as I understand it is that these Redis is expected by default, so these settings https://github.com/pulp/pulpcore/blob/master/pulpcore/app/settings.py#L201-L207 would be "required settings" which we likely aren't calling out as required currently as we should. For example here https://github.com/pulp/pulpcore/blob/master/docs/configuration/settings.rst#redis-settings
The fact that Redis is expected and those settings are required is a problem, but I don't believe the problem this ticket is about. To me a separate issue should be filed to switch Pulp to not requiring Redis by default, and then if Redis is configured, use it. We should talk about this at the pulpcore meeting on Tuesday.
This issue though (I believe) is only for deployments that have Redis configured, and when Pulp starts it connects just fine to Redis, but then while running Redis is no longer available either due to Redis being shutdown (simulating a failure event of the Redis service) or due to a network issue preventing Pulp from reaching Redis. To solve that I think you should:
- Start Pulp and have it connect to Redis (default installer config)
- Run the functional test suite for pulpcore, pulp_file, and pulp_rpm (as plugin examples) and verify there are no failures
- Stop Redis
- Run that same test suite again, and watch for failures <--- this is the key to identifying how Pulp isn't fault tolerant of Redis availability issues
Updated by rchan about 3 years ago
- Sprint changed from Sprint 105 to Sprint 106
Updated by bmbouter about 3 years ago
- Subject changed from Ensure that Pulp can function without Redis to Ensure that Pulp can function if Redis was configured and available when Pulp starts, but has an availability issue while Pulp is running
Updated by pulpbot about 3 years ago
- Status changed from ASSIGNED to POST
Updated by rchan about 3 years ago
- Sprint changed from Sprint 106 to Sprint 107
Updated by rchan about 3 years ago
- Sprint changed from Sprint 107 to Sprint 108
Updated by rchan about 3 years ago
- Sprint changed from Sprint 108 to Sprint 109
Updated by rchan about 3 years ago
- Sprint changed from Sprint 109 to Sprint 110
Added by Lubos Mjachky about 3 years ago
Updated by Anonymous about 3 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulpcore|206e8135286585febf109cd13617dab702e0773b.
Updated by pulpbot about 3 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Add a wrapper around all redis calls
closes #8997