Project

Profile

Help

Issue #6533

closed

Task get stuck in 'running' state

Added by ipanova@redhat.com almost 4 years ago. Updated over 3 years ago.

Status:
CLOSED - WORKSFORME
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 82
Quarter:

Description

migrate repos from pulp2 ( i had 2 rpm repos)

$ pulp-admin rpm repo list  --details
+----------------------------------------------------------------------+
                            RPM Repositories
+----------------------------------------------------------------------+

Id:                  errata-references
Display Name:        None
Description:         None
Content Unit Counts: 
  Erratum:           4
  Package Category:  1
  Package Group:     2
  Package Langpacks: 1
  Rpm:               35
Notes:               
Scratchpad:          
  Checksum Type: sha256
Importers:           
  Config:               
    Feed: https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-references-updat
          einfo/
  Id:                   yum_importer
  Importer Type Id:     yum_importer
  Last Override Config: 
  Last Sync:            2020-04-20T10:56:57Z
  Last Updated:         2020-04-20T16:44:48Z
  Repo Id:              errata-references
  Scratchpad:           
    Repomd Checksum: ff6522cf2ff781799d7cf0d70362d93ada5f22856abb4691daa04ef9aa0
                     36f87
    Repomd Revision: 1582978135
Distributors:        
  Auto Publish:         True
  Config:               
    Http:         False
    Https:        True
    Relative URL: ulp/pulp/fixtures/rpm-references-updateinfo/
  Distributor Type Id:  yum_distributor
  Id:                   yum_distributor
  Last Override Config: 
  Last Publish:         2020-04-20T10:56:58Z
  Last Updated:         2020-04-20T10:56:38Z
  Repo Id:              errata-references
  Scratchpad:           
  Auto Publish:         False
  Config:               
    Http:         False
    Https:        True
    Relative URL: ulp/pulp/fixtures/rpm-references-updateinfo/
  Distributor Type Id:  export_distributor
  Id:                   export_distributor
  Last Override Config: 
  Last Publish:         None
  Last Updated:         2020-04-20T10:56:38Z
  Repo Id:              errata-references
  Scratchpad:           

Id:                  test-modulariy
Display Name:        None
Description:         None
Content Unit Counts: 
  Erratum:           6
  Modulemd:          8
  Modulemd Defaults: 3
  Package Category:  1
  Package Group:     2
  Package Langpacks: 1
  Rpm:               35
Notes:               
Scratchpad:          
  Checksum Type: sha256
Importers:           
  Config:               
    Feed: https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-test-modularity/
  Id:                   yum_importer
  Importer Type Id:     yum_importer
  Last Override Config: 
  Last Sync:            2020-04-20T10:57:38Z
  Last Updated:         2020-04-20T10:57:26Z
  Repo Id:              test-modulariy
  Scratchpad:           
    Repomd Checksum: fbcba18148bf67acc812db9ea8efd3d07541bc201ecf8a6773725d1041c
                     f83c9
    Repomd Revision: 1561056354
Distributors:        
  Auto Publish:         True
  Config:               
    Http:         False
    Https:        True
    Relative URL: pulp/pulp/fixtures/rpm-test-modularity/
  Distributor Type Id:  yum_distributor
  Id:                   yum_distributor
  Last Override Config: 
  Last Publish:         2020-04-20T10:57:38Z
  Last Updated:         2020-04-20T10:57:26Z
  Repo Id:              test-modulariy
  Scratchpad:           
  Auto Publish:         False
  Config:               
    Http:         False
    Https:        True
    Relative URL: pulp/pulp/fixtures/rpm-test-modularity/
  Distributor Type Id:  export_distributor
  Id:                   export_distributor
  Last Override Config: 
  Last Publish:         None
  Last Updated:         2020-04-20T10:57:26Z
  Repo Id:              test-modulariy
  Scratchpad:           


(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 _scripts]$ 
  1. inspect migrated repos.
$ http GET $BASE_ADDR/pulp/api/v3/pulp2repositories/
HTTP/1.1 200 OK
Allow: GET, HEAD, OPTIONS
Connection: close
Content-Length: 1600
Content-Type: application/json
Date: Mon, 20 Apr 2020 17:42:47 GMT
Server: gunicorn/20.0.4
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN

{
    "count": 2,
    "next": null,
    "previous": null,
    "results": [
        {
            "is_migrated": true,
            "not_in_plan": false,
            "pulp2_object_id": "5e9d7fe6c998ac0be6c0fb21",
            "pulp2_repo_id": "errata-references",
            "pulp2_repo_type": "rpm",
            "pulp3_distribution_hrefs": [
                "/pulp/api/v3/distributions/rpm/rpm/76476a41-632f-4a16-bfa9-434f6c215c8f/"
            ],
            "pulp3_publication_href": "/pulp/api/v3/publications/rpm/rpm/1c5a1ff1-3d38-4278-b865-77e03db4b0b6/",
            "pulp3_remote_href": "/pulp/api/v3/remotes/rpm/rpm/21b131cd-12b0-4a2a-a173-c105c9f089af/",
            "pulp3_repository_href": "/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/",
            "pulp3_repository_version": "/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/versions/1/",
            "pulp_created": "2020-04-20T17:42:02.710317Z",
            "pulp_href": "/pulp/api/v3/pulp2repositories/961cf6eb-046c-4abc-a7b3-aff77314c937/"
        },
        {
            "is_migrated": true,
            "not_in_plan": false,
            "pulp2_object_id": "5e9d8016c998ac0be7056e35",
            "pulp2_repo_id": "test-modulariy",
            "pulp2_repo_type": "rpm",
            "pulp3_distribution_hrefs": [
                "/pulp/api/v3/distributions/rpm/rpm/87aafa7b-52f7-47c2-b184-67a2147387cc/"
            ],
            "pulp3_publication_href": "/pulp/api/v3/publications/rpm/rpm/a4f3e70e-d2bb-438a-a433-bce99faadb2c/",
            "pulp3_remote_href": "/pulp/api/v3/remotes/rpm/rpm/a2d25025-a126-44d6-8a91-56615869f05a/",
            "pulp3_repository_href": "/pulp/api/v3/repositories/rpm/rpm/f7b5ef5c-d76f-41fb-ae8d-56b2168c7daa/",
            "pulp3_repository_version": "/pulp/api/v3/repositories/rpm/rpm/f7b5ef5c-d76f-41fb-ae8d-56b2168c7daa/versions/1/",
            "pulp_created": "2020-04-20T17:42:02.686249Z",
            "pulp_href": "/pulp/api/v3/pulp2repositories/f9a3a7e2-8a68-434d-b458-8fdca2b29d0b/"
        }
    ]
}

3 sync migrated repo with its migrated remote

(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 _scripts]$ http POST $BASE_ADDR/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/sync/     remote=/pulp/api/v3/remotes/rpm/rpm/21b131cd-12b0-4a2a-a173-c105c9f089af/
HTTP/1.1 202 Accepted
Allow: POST, OPTIONS
Connection: close
Content-Length: 67
Content-Type: application/json
Date: Mon, 20 Apr 2020 17:43:08 GMT
Server: gunicorn/20.0.4
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN

{
    "task": "/pulp/api/v3/tasks/6e3f15b7-9fbc-4f8b-97ae-367ba4d03aba/"
}

(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 _scripts]$ http GET $BASE_ADDR/pulp/api/v3/tasks/6e3f15b7-9fbc-4f8b-97ae-367ba4d03aba/
HTTP/1.1 200 OK
Allow: GET, PATCH, DELETE, HEAD, OPTIONS
Connection: close
Content-Length: 606
Content-Type: application/json
Date: Mon, 20 Apr 2020 17:43:14 GMT
Server: gunicorn/20.0.4
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN

{
    "child_tasks": [],
    "created_resources": [],
    "error": null,
    "finished_at": null,
    "name": "pulp_rpm.app.tasks.synchronizing.synchronize",
    "parent_task": null,
    "progress_reports": [],
    "pulp_created": "2020-04-20T17:43:08.424482Z",
    "pulp_href": "/pulp/api/v3/tasks/6e3f15b7-9fbc-4f8b-97ae-367ba4d03aba/",
    "reserved_resources_record": [
        "/pulp/api/v3/remotes/rpm/rpm/21b131cd-12b0-4a2a-a173-c105c9f089af/",
        "/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/"
    ],
    "started_at": "2020-04-20T17:43:08.535930Z",
    "state": "running",
    "task_group": null,
    "worker": "/pulp/api/v3/workers/2740bb8c-057a-40f0-b4b7-c7a9231a42bf/"
}

  1. After 6 minutes task was still in running state while usually it does not take even 30 seconds

  2. After prestart task gets cancelled and when a new triggered all works then

$ prestart
systemctl restart pulpcore-content pulpcore-worker@1 pulpcore-worker@2 pulpcore-resource-manager pulpcore-api
(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 _scripts]$ http GET $BASE_ADDR/pulp/api/v3/tasks/6e3f15b7-9fbc-4f8b-97ae-367ba4d03aba/
HTTP/1.1 200 OK
Allow: GET, PATCH, DELETE, HEAD, OPTIONS
Connection: close
Content-Length: 607
Content-Type: application/json
Date: Mon, 20 Apr 2020 17:52:48 GMT
Server: gunicorn/20.0.4
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN

{
    "child_tasks": [],
    "created_resources": [],
    "error": null,
    "finished_at": null,
    "name": "pulp_rpm.app.tasks.synchronizing.synchronize",
    "parent_task": null,
    "progress_reports": [],
    "pulp_created": "2020-04-20T17:43:08.424482Z",
    "pulp_href": "/pulp/api/v3/tasks/6e3f15b7-9fbc-4f8b-97ae-367ba4d03aba/",
    "reserved_resources_record": [
        "/pulp/api/v3/remotes/rpm/rpm/21b131cd-12b0-4a2a-a173-c105c9f089af/",
        "/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/"
    ],
    "started_at": "2020-04-20T17:43:08.535930Z",
    "state": "canceled",
    "task_group": null,
    "worker": "/pulp/api/v3/workers/2740bb8c-057a-40f0-b4b7-c7a9231a42bf/"
}

(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 _scripts]$ http POST $BASE_ADDR/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/sync/     remote=/pulp/api/v3/remotes/rpm/rpm/21b131cd-12b0-4a2a-a173-c105c9f089af/
HTTP/1.1 202 Accepted
Allow: POST, OPTIONS
Connection: close
Content-Length: 67
Content-Type: application/json
Date: Mon, 20 Apr 2020 17:52:56 GMT
Server: gunicorn/20.0.4
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN

{
    "task": "/pulp/api/v3/tasks/0e876e06-9b86-4587-b920-b642259c1e93/"
}


Related issues

Related to Pulp - Issue #7387: Tasks not delivered to resource-manager are not cleaned upCLOSED - DUPLICATEActions
Actions #1

Updated by ipanova@redhat.com almost 4 years ago

  • Project changed from RPM Support to Pulp
Actions #2

Updated by ipanova@redhat.com almost 4 years ago

  • Description updated (diff)
Actions #3

Updated by ipanova@redhat.com almost 4 years ago

  • Description updated (diff)
Actions #4

Updated by ttereshc almost 4 years ago

FWIW, after restart my tasks were stuck in the waiting state.

Actions #5

Updated by ipanova@redhat.com almost 4 years ago

sometimes after prestart task fails and does no get canceled "description": "The task f91d8e47-f38b-4447-82d5-03cba6853b77 exited immediately for some reason. Marking as failed. Check the logs for more details",

Actions #6

Updated by daviddavis almost 4 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 71
Actions #7

Updated by daviddavis almost 4 years ago

  • Triaged changed from Yes to No
  • Sprint deleted (Sprint 71)
Actions #8

Updated by daviddavis almost 4 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 71
Actions #9

Updated by mdellweg almost 4 years ago

Maybe python async debug mode is able to show, what it's waiting for:

https://docs.python.org/3/library/asyncio-dev.html#debug-mode

Actions #10

Updated by bmbouter almost 4 years ago

When trying to debug a "stuck" task the first question usually is what is it stuck on? RQ uses a post-forker model so each task forks the main RQ process of the worker to create a child process, which RQ calls "the workhorse".

We tried py-spy to determine what the workhorse is doing while it's stuck in the waiting state and it gave non-helpful output. The next tool I recommend trying is to use the Python GDB extensions to show the python stack from a GDB core dump of the stuck process. I recommend taking a few core dumps over a few seconds and answering two quesitons:

What is it stuck on? The py-gdb tools will tell us this. Is it stuck at a single place, or is it in a loop over and over? Comparing the core dumps with py-gdb tools will give insight into this.

Here is the article I usually refer to that goes over the process of using Python GDB tools to inspect a running Python process https://wiki.python.org/moin/DebuggingWithGdb

Actions #11

Updated by rchan almost 4 years ago

  • Sprint changed from Sprint 71 to Sprint 72
Actions #12

Updated by rchan almost 4 years ago

  • Sprint changed from Sprint 72 to Sprint 73
Actions #13

Updated by rchan almost 4 years ago

  • Sprint changed from Sprint 73 to Sprint 74
Actions #14

Updated by rchan almost 4 years ago

  • Sprint changed from Sprint 74 to Sprint 75
Actions #15

Updated by rchan almost 4 years ago

  • Sprint changed from Sprint 75 to Sprint 76
Actions #16

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 76 to Sprint 77
Actions #17

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 77 to Sprint 78
Actions #18

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 78 to Sprint 79
Actions #19

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 79 to Sprint 80
Actions #20

Updated by fao89 over 3 years ago

  • Related to Issue #7387: Tasks not delivered to resource-manager are not cleaned up added
Actions #21

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 80 to Sprint 81
Actions #22

Updated by dalley over 3 years ago

  • Status changed from NEW to ASSIGNED
Actions #23

Updated by dalley over 3 years ago

  • Assignee set to dalley
Actions #24

Updated by rchan over 3 years ago

  • Sprint changed from Sprint 81 to Sprint 82
Actions #25

Updated by dalley over 3 years ago

@Ipanova, when I run this now with the latest code, I'm able to sync the migrated repositories with the migrated remotes without issue and the tasks complete immediately.

Do you remember if there were any other details that might be important? If not, could you try reproducing this again and see if you still see it happening on your machine?

Actions #26

Updated by ipanova@redhat.com over 3 years ago

  • Status changed from ASSIGNED to CLOSED - WORKSFORME

I can't reproduce this issue anymore, I am going to close it.

Also available in: Atom PDF