Project

Profile

Help

Story #4689

closed

As a user, Pulp logs have correlation ids in the logs of all services

Added by dkliban@redhat.com about 5 years ago. Updated over 3 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Katello
Sprint:
Sprint 87
Quarter:

Description

Motivation

Pulp as a multi-process system needs to make debugging easy. Currently it is hard to:

  1. Identify which log statements are related to a specific API request
  2. Filter out all of those that aren't
  3. Do this across multiple processes

Additionally, there is a use case where an external system wants to provide the correlation id, e.g. Katello wants to submit the correlation ID as part of the request.

Solution

Use django-cid by default, and document that it's available for users to use.

Implementation

Enabling in Pulp itself

  1. Add django-cid as an installed app
  2. Reconfigure the default Django logger config in settings.py to incorporate the CID
  3. Add user documentation identifying the types of things they can additionally configure
  4. Add a test to assert Pulp viewset logs contain an autogenerated CID

Enabling in the App

  1. Add a test to assert content app logs contain an autogenerated CID

Enabling in Tasking System

  1. Create the field on task and set it: Add a field to the Task model called logging_cid. Have the enqueue_with_reservation function set this field when it creates a Task object with the value of get_cid.

  2. Have the CID log on the resource manager itself: Have the _queue_reserved_task method call set_cid() with the value of the new logging_cid from the inner_task. This will enable the correlation ID in the resource_manager logs for that task also.

  3. Have all tasks also use the CID: Have the perform_job call set_cid() with the value of the logging_cid from the task. This will enable the correlation ID in all pulp task logs without them having to change their signatures.

  4. Lastly, expose the logging_cid in the task API for users to see.


Related issues

Related to Pulp - Task #3289: Support for X-Correlation-ID HTTP REST headerCLOSED - WONTFIX

Actions
Related to Pulp - Story #2654: As a User I want Pulp3 to log the first 8 characters of a task id in every log statement emitted from a running taskCLOSED - WONTFIX

Actions
Related to Pulp - Task #7792: Update the gunicorn service file to log correlation idCLOSED - CURRENTRELEASEdaviddavis

Actions
Actions #1

Updated by bmbouter almost 5 years ago

  • Tags deleted (Pulp 3)
Actions #2

Updated by jsherril@redhat.com almost 5 years ago

  • Tags Katello-P3 added
Actions #3

Updated by bmbouter about 4 years ago

  • Related to Task #3289: Support for X-Correlation-ID HTTP REST header added
Actions #5

Updated by ggainey almost 4 years ago

  • Priority changed from Normal to Low
Actions #6

Updated by ggainey almost 4 years ago

  • Tags Katello added
  • Tags deleted (Katello-P3)
Actions #7

Updated by bmbouter almost 4 years ago

  • Subject changed from As a user, I can configure pulp to accept a correlation id header to As a user, Pulp logs have correlation ids in the logs of all services
  • Description updated (diff)
Actions #8

Updated by bmbouter almost 4 years ago

Actions #9

Updated by daviddavis almost 4 years ago

For this kwarg _pulp_logging_cid, will the code calling enqueue_with_reservation have to manually set this if it wants task logging to use this CID?

Actions #10

Updated by bmbouter almost 4 years ago

daviddavis wrote:

For this kwarg _pulp_logging_cid, will the code calling enqueue_with_reservation have to manually set this if it wants task logging to use this CID?

The caller of enqueue_with_reservation will not have to specify the CID, the CID will automatically and transparently be added by enqueue_with_reservation itself. The signature of enqueue_with_reservation won't be adjusted with this change so rolling it out will require no other code to change. Also when rolled out it, will be enabled automatically on all tasks.

Actions #11

Updated by bmbouter almost 4 years ago

  • Description updated (diff)
Actions #12

Updated by bmbouter almost 4 years ago

  • Description updated (diff)
Actions #13

Updated by bmbouter almost 4 years ago

  • Description updated (diff)
Actions #14

Updated by daviddavis almost 4 years ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #15

Updated by rchan almost 4 years ago

  • Sprint set to Sprint 74
Actions #16

Updated by daviddavis almost 4 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to daviddavis
  • Priority changed from Low to Normal
Actions #17

Updated by rchan almost 4 years ago

  • Sprint changed from Sprint 74 to Sprint 75
Actions #18

Updated by daviddavis almost 4 years ago

Any idea how to handle the test? Not sure how to use functional tests since tests might be running on a different box than Pulp. Perhaps a unit test that uses mock somehow?

Actions #19

Updated by bmbouter almost 4 years ago

daviddavis wrote:

Any idea how to handle the test? Not sure how to use functional tests since tests might be running on a different box than Pulp. Perhaps a unit test that uses mock somehow?

I believe pulp-smash has a utility that allows you to run commands against the host box. You can see an example of pulp-smash code checking for root on the remote box for example here: https://github.com/pulp/pulp-smash/blob/8b0dbd3febb7a3a30367277c4f838c52a13b4642/pulp_smash/cli.py#L40

Actions #20

Updated by rchan almost 4 years ago

  • Sprint changed from Sprint 75 to Sprint 76
Actions #21

Updated by rchan almost 4 years ago

  • Sprint changed from Sprint 76 to Sprint 77
Actions #22

Updated by daviddavis almost 4 years ago

  • Description updated (diff)
Actions #23

Updated by daviddavis almost 4 years ago

  • Description updated (diff)
Actions #24

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 77 to Sprint 78
Actions #25

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 78 to Sprint 79
Actions #26

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 79 to Sprint 80
Actions #27

Updated by daviddavis over 3 years ago

  • Related to Story #2654: As a User I want Pulp3 to log the first 8 characters of a task id in every log statement emitted from a running task added
Actions #28

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 80 to Sprint 81
Actions #29

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 81 to Sprint 82
Actions #30

Updated by daviddavis over 3 years ago

I modified Pulp per the design in the description. However, there are some quirks. Here are some logs using an autogenerated CID and a CID set by the correlation id header.

Autogenerated CID (4bca4df4-d69d-4c4b-8542-fee55209af68)

Here you can see that gunicorn doesn't log the correlation_id which is unfortunate as the correlation_id cannot be matched to the request. I'm guessing it's not possible to have gunicorn log the CID as the CID gets generated/set in pulp and guincorn logs its output before execution reaches Pulp. So we'll probably have to have Pulp output the request in its logs.

The resource manager has its own generated CID which isn't going to make sense to a user viewing the logs. Also, the worker has its own CID before it picks up the task. Then it uses the task's CID which is perhaps surprising to a user.

Sep 29 20:14:22 pulp3-source-fedora.crake.example.com rq[31262]: pulp (c68135fc-5b5c-487f-86fa-a98e01f73d96): rq.worker:INFO: resource-manager: 986e0d83-db66-480c-a94d-ec6a09f9d119
Sep 29 20:14:22 pulp3-source-fedora.crake.example.com gunicorn[31259]: 127.0.0.1 - admin [29/Sep/2020:20:14:22 +0000] "POST /pulp/api/v3/repositories/file/file/aef43617-a5cf-4e7c-b8dd-52badeec6e7e/versions/0/repair/ HTTP/1.0" 202 67 "-" "HTTPie/2.0.0"
Sep 29 20:14:23 pulp3-source-fedora.crake.example.com rq[31263]: pulp (a4c84ce9-adce-42f4-811d-814d69865da0): rq.worker:INFO: 31263@pulp3-source-fedora.crake.example.com: 868422b5-673b-4ac3-a63d-c2034ab061d7
Sep 29 20:14:23 pulp3-source-fedora.crake.example.com rq[31262]: pulp (c68135fc-5b5c-487f-86fa-a98e01f73d96): rq.worker:INFO: resource-manager: Job OK (986e0d83-db66-480c-a94d-ec6a09f9d119)
Sep 29 20:14:23 pulp3-source-fedora.crake.example.com rq[31263]: pulp (4bca4df4-d69d-4c4b-8542-fee55209af68): pulpcore.app.tasks.repository:INFO: Repairing version 0 of repository test
Sep 29 20:14:23 pulp3-source-fedora.crake.example.com rq[31263]: pulp (4bca4df4-d69d-4c4b-8542-fee55209af68): rq.worker:INFO: 31263@pulp3-source-fedora.crake.example.com: Job OK (868422b5-673b-4ac3-a63d-c2034ab061d7)
Sep 29 20:14:23 pulp3-source-fedora.crake.example.com rq[31263]: pulp (a4c84ce9-adce-42f4-811d-814d69865da0): rq.worker:INFO: 31263@pulp3-source-fedora.crake.example.com: af9570a0-d648-4120-ba86-7c4b830c3f67
Sep 29 20:14:23 pulp3-source-fedora.crake.example.com rq[31263]: pulp (a4c84ce9-adce-42f4-811d-814d69865da0): rq.worker:INFO: 31263@pulp3-source-fedora.crake.example.com: Job OK (af9570a0-d648-4120-ba86-7c4b830c3f67)

Setting CID to MY_CORRELATION_ID using HTTP_X_CORRELATION_ID header

I can probably replace None with an empty string. This log perhaps makes more sense but again gunicorn isn't logging with the CID.

Sep 29 20:10:41 pulp3-source-fedora.crake.example.com rq[31170]: pulp (None): rq.worker:INFO: resource-manager: 17aef9a7-5aba-4e6f-a3cf-c44d46b5e865
Sep 29 20:10:41 pulp3-source-fedora.crake.example.com gunicorn[31168]: 127.0.0.1 - admin [29/Sep/2020:20:10:41 +0000] "POST /pulp/api/v3/repositories/file/file/aef43617-a5cf-4e7c-b8dd-52badeec6e7e/versions/0/repair/ HTTP/1.0" 202 67 "-" "HTTPie/2.0.0"
Sep 29 20:10:41 pulp3-source-fedora.crake.example.com rq[31169]: pulp (None): rq.worker:INFO: 31169@pulp3-source-fedora.crake.example.com: 20ab0ef0-faeb-482c-bae7-62cee69c7ca3
Sep 29 20:10:41 pulp3-source-fedora.crake.example.com rq[31170]: pulp (None): rq.worker:INFO: resource-manager: Job OK (17aef9a7-5aba-4e6f-a3cf-c44d46b5e865)
Sep 29 20:10:41 pulp3-source-fedora.crake.example.com rq[31169]: pulp (MY_CORRELATION_ID): pulpcore.app.tasks.repository:INFO: Repairing version 0 of repository test
Sep 29 20:10:41 pulp3-source-fedora.crake.example.com rq[31169]: pulp (MY_CORRELATION_ID): rq.worker:INFO: 31169@pulp3-source-fedora.crake.example.com: Job OK (20ab0ef0-faeb-482c-bae7-62cee69c7ca3)
Sep 29 20:10:41 pulp3-source-fedora.crake.example.com rq[31169]: pulp (None): rq.worker:INFO: 31169@pulp3-source-fedora.crake.example.com: 60ee9923-7aba-472c-8379-010542704557
Sep 29 20:10:41 pulp3-source-fedora.crake.example.com rq[31169]: pulp (None): rq.worker:INFO: 31169@pulp3-source-fedora.crake.example.com: Job OK (60ee9923-7aba-472c-8379-010542704557)
Actions #31

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 82 to Sprint 83
Actions #32

Updated by bmbouter over 3 years ago

What if we customize the gunicorn response format and have the CID be included as a response header. I see those are gunicorn logging format option here. Since gunicorn includes data only known by the response, I think it is waiting to log until the response is available.

The benefit of a response header is even auto-generated ones would be shown in the logs. It would probably be valuable for the client to have the CID sent all the way back to them so this creates value there too.

I was thinking also that in all cases there would be a CID. The tasking system I was thinking would have that be a tasking system param and the tasking system would always correlate it's task CID with the dispatcher of that task (either another task or a viewset who autogenerated it). Is that also planned to be part of this work too?

Actions #33

Updated by daviddavis over 3 years ago

What if we customize the gunicorn response format and have the CID be included as a response header. I see those are gunicorn logging format option here. Since gunicorn includes data only known by the response, I think it is waiting to log until the response is available.

The benefit of a response header is even auto-generated ones would be shown in the logs. It would probably be valuable for the client to have the CID sent all the way back to them so this creates value there too.

This makes sense and I think it should work.

I was thinking also that in all cases there would be a CID. The tasking system I was thinking would have that be a tasking system param and the tasking system would always correlate it's task CID with the dispatcher of that task (either another task or a viewset who autogenerated it). Is that also planned to be part of this work too?

I'm not sure I understand. Let me give an example: when the resource manager or worker starts up and starts logging (before it's been given a task), what should its CID be? What should it be after it has finished a task?

Actions #34

Updated by bmbouter over 3 years ago

daviddavis wrote:

I'm not sure I understand. Let me give an example: when the resource manager or worker starts up and starts logging (before it's been given a task), what should its CID be? What should it be after it has finished a task?

I see what you mean, and I didn't consider those log statements. In those I expect it should show no CID. Anytime a task is being run the CID would be there.

Actions #35

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 83 to Sprint 84
Actions #36

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 84 to Sprint 85
Actions #37

Updated by daviddavis over 3 years ago

The problem I am running into with django-cid is that it generates uuids when you go to get the cid and log messages[0] which is problematic because in some cases (eg when a worker starts before it has a task) we don't want it to generate a cid. Instead the middleware should handle this so that it only generates correlation IDs once when a request is received.

I also looked at django-guid[1] and I think it might work instead.

[0] https://github.com/Polyconseil/django-cid/blob/0dc7fc50d8aff469b989fb8b4b42242f2989ef1c/cid/locals.py#L15-L28 [1] https://github.com/snok/django-guid

Actions #38

Updated by daviddavis over 3 years ago

  • Related to Task #7792: Update the gunicorn service file to log correlation id added
Actions #39

Updated by pulpbot over 3 years ago

  • Status changed from ASSIGNED to POST
Actions #40

Updated by daviddavis over 3 years ago

  • Sprint/Milestone set to 3.9.0
Actions #41

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 85 to Sprint 86
Actions #42

Updated by daviddavis over 3 years ago

  • Sprint/Milestone deleted (3.9.0)
Actions #43

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 86 to Sprint 87

Added by daviddavis over 3 years ago

Revision ca7dbe15 | View on GitHub

Add support for correlation_id to pulp

fixes #4689

Actions #44

Updated by daviddavis over 3 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #45

Updated by daviddavis over 3 years ago

  • Sprint/Milestone set to 3.9.0
Actions #46

Updated by pulpbot over 3 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF