Issue #6533
closedTask get stuck in 'running' state
Description
migrate repos from pulp2 ( i had 2 rpm repos)
$ pulp-admin rpm repo list --details
+----------------------------------------------------------------------+
RPM Repositories
+----------------------------------------------------------------------+
Id: errata-references
Display Name: None
Description: None
Content Unit Counts:
Erratum: 4
Package Category: 1
Package Group: 2
Package Langpacks: 1
Rpm: 35
Notes:
Scratchpad:
Checksum Type: sha256
Importers:
Config:
Feed: https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-references-updat
einfo/
Id: yum_importer
Importer Type Id: yum_importer
Last Override Config:
Last Sync: 2020-04-20T10:56:57Z
Last Updated: 2020-04-20T16:44:48Z
Repo Id: errata-references
Scratchpad:
Repomd Checksum: ff6522cf2ff781799d7cf0d70362d93ada5f22856abb4691daa04ef9aa0
36f87
Repomd Revision: 1582978135
Distributors:
Auto Publish: True
Config:
Http: False
Https: True
Relative URL: ulp/pulp/fixtures/rpm-references-updateinfo/
Distributor Type Id: yum_distributor
Id: yum_distributor
Last Override Config:
Last Publish: 2020-04-20T10:56:58Z
Last Updated: 2020-04-20T10:56:38Z
Repo Id: errata-references
Scratchpad:
Auto Publish: False
Config:
Http: False
Https: True
Relative URL: ulp/pulp/fixtures/rpm-references-updateinfo/
Distributor Type Id: export_distributor
Id: export_distributor
Last Override Config:
Last Publish: None
Last Updated: 2020-04-20T10:56:38Z
Repo Id: errata-references
Scratchpad:
Id: test-modulariy
Display Name: None
Description: None
Content Unit Counts:
Erratum: 6
Modulemd: 8
Modulemd Defaults: 3
Package Category: 1
Package Group: 2
Package Langpacks: 1
Rpm: 35
Notes:
Scratchpad:
Checksum Type: sha256
Importers:
Config:
Feed: https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-test-modularity/
Id: yum_importer
Importer Type Id: yum_importer
Last Override Config:
Last Sync: 2020-04-20T10:57:38Z
Last Updated: 2020-04-20T10:57:26Z
Repo Id: test-modulariy
Scratchpad:
Repomd Checksum: fbcba18148bf67acc812db9ea8efd3d07541bc201ecf8a6773725d1041c
f83c9
Repomd Revision: 1561056354
Distributors:
Auto Publish: True
Config:
Http: False
Https: True
Relative URL: pulp/pulp/fixtures/rpm-test-modularity/
Distributor Type Id: yum_distributor
Id: yum_distributor
Last Override Config:
Last Publish: 2020-04-20T10:57:38Z
Last Updated: 2020-04-20T10:57:26Z
Repo Id: test-modulariy
Scratchpad:
Auto Publish: False
Config:
Http: False
Https: True
Relative URL: pulp/pulp/fixtures/rpm-test-modularity/
Distributor Type Id: export_distributor
Id: export_distributor
Last Override Config:
Last Publish: None
Last Updated: 2020-04-20T10:57:26Z
Repo Id: test-modulariy
Scratchpad:
(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 _scripts]$
- inspect migrated repos.
$ http GET $BASE_ADDR/pulp/api/v3/pulp2repositories/
HTTP/1.1 200 OK
Allow: GET, HEAD, OPTIONS
Connection: close
Content-Length: 1600
Content-Type: application/json
Date: Mon, 20 Apr 2020 17:42:47 GMT
Server: gunicorn/20.0.4
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN
{
"count": 2,
"next": null,
"previous": null,
"results": [
{
"is_migrated": true,
"not_in_plan": false,
"pulp2_object_id": "5e9d7fe6c998ac0be6c0fb21",
"pulp2_repo_id": "errata-references",
"pulp2_repo_type": "rpm",
"pulp3_distribution_hrefs": [
"/pulp/api/v3/distributions/rpm/rpm/76476a41-632f-4a16-bfa9-434f6c215c8f/"
],
"pulp3_publication_href": "/pulp/api/v3/publications/rpm/rpm/1c5a1ff1-3d38-4278-b865-77e03db4b0b6/",
"pulp3_remote_href": "/pulp/api/v3/remotes/rpm/rpm/21b131cd-12b0-4a2a-a173-c105c9f089af/",
"pulp3_repository_href": "/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/",
"pulp3_repository_version": "/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/versions/1/",
"pulp_created": "2020-04-20T17:42:02.710317Z",
"pulp_href": "/pulp/api/v3/pulp2repositories/961cf6eb-046c-4abc-a7b3-aff77314c937/"
},
{
"is_migrated": true,
"not_in_plan": false,
"pulp2_object_id": "5e9d8016c998ac0be7056e35",
"pulp2_repo_id": "test-modulariy",
"pulp2_repo_type": "rpm",
"pulp3_distribution_hrefs": [
"/pulp/api/v3/distributions/rpm/rpm/87aafa7b-52f7-47c2-b184-67a2147387cc/"
],
"pulp3_publication_href": "/pulp/api/v3/publications/rpm/rpm/a4f3e70e-d2bb-438a-a433-bce99faadb2c/",
"pulp3_remote_href": "/pulp/api/v3/remotes/rpm/rpm/a2d25025-a126-44d6-8a91-56615869f05a/",
"pulp3_repository_href": "/pulp/api/v3/repositories/rpm/rpm/f7b5ef5c-d76f-41fb-ae8d-56b2168c7daa/",
"pulp3_repository_version": "/pulp/api/v3/repositories/rpm/rpm/f7b5ef5c-d76f-41fb-ae8d-56b2168c7daa/versions/1/",
"pulp_created": "2020-04-20T17:42:02.686249Z",
"pulp_href": "/pulp/api/v3/pulp2repositories/f9a3a7e2-8a68-434d-b458-8fdca2b29d0b/"
}
]
}
3 sync migrated repo with its migrated remote
(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 _scripts]$ http POST $BASE_ADDR/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/sync/ remote=/pulp/api/v3/remotes/rpm/rpm/21b131cd-12b0-4a2a-a173-c105c9f089af/
HTTP/1.1 202 Accepted
Allow: POST, OPTIONS
Connection: close
Content-Length: 67
Content-Type: application/json
Date: Mon, 20 Apr 2020 17:43:08 GMT
Server: gunicorn/20.0.4
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN
{
"task": "/pulp/api/v3/tasks/6e3f15b7-9fbc-4f8b-97ae-367ba4d03aba/"
}
(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 _scripts]$ http GET $BASE_ADDR/pulp/api/v3/tasks/6e3f15b7-9fbc-4f8b-97ae-367ba4d03aba/
HTTP/1.1 200 OK
Allow: GET, PATCH, DELETE, HEAD, OPTIONS
Connection: close
Content-Length: 606
Content-Type: application/json
Date: Mon, 20 Apr 2020 17:43:14 GMT
Server: gunicorn/20.0.4
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN
{
"child_tasks": [],
"created_resources": [],
"error": null,
"finished_at": null,
"name": "pulp_rpm.app.tasks.synchronizing.synchronize",
"parent_task": null,
"progress_reports": [],
"pulp_created": "2020-04-20T17:43:08.424482Z",
"pulp_href": "/pulp/api/v3/tasks/6e3f15b7-9fbc-4f8b-97ae-367ba4d03aba/",
"reserved_resources_record": [
"/pulp/api/v3/remotes/rpm/rpm/21b131cd-12b0-4a2a-a173-c105c9f089af/",
"/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/"
],
"started_at": "2020-04-20T17:43:08.535930Z",
"state": "running",
"task_group": null,
"worker": "/pulp/api/v3/workers/2740bb8c-057a-40f0-b4b7-c7a9231a42bf/"
}
-
After 6 minutes task was still in running state while usually it does not take even 30 seconds
-
After prestart task gets cancelled and when a new triggered all works then
$ prestart
systemctl restart pulpcore-content pulpcore-worker@1 pulpcore-worker@2 pulpcore-resource-manager pulpcore-api
(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 _scripts]$ http GET $BASE_ADDR/pulp/api/v3/tasks/6e3f15b7-9fbc-4f8b-97ae-367ba4d03aba/
HTTP/1.1 200 OK
Allow: GET, PATCH, DELETE, HEAD, OPTIONS
Connection: close
Content-Length: 607
Content-Type: application/json
Date: Mon, 20 Apr 2020 17:52:48 GMT
Server: gunicorn/20.0.4
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN
{
"child_tasks": [],
"created_resources": [],
"error": null,
"finished_at": null,
"name": "pulp_rpm.app.tasks.synchronizing.synchronize",
"parent_task": null,
"progress_reports": [],
"pulp_created": "2020-04-20T17:43:08.424482Z",
"pulp_href": "/pulp/api/v3/tasks/6e3f15b7-9fbc-4f8b-97ae-367ba4d03aba/",
"reserved_resources_record": [
"/pulp/api/v3/remotes/rpm/rpm/21b131cd-12b0-4a2a-a173-c105c9f089af/",
"/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/"
],
"started_at": "2020-04-20T17:43:08.535930Z",
"state": "canceled",
"task_group": null,
"worker": "/pulp/api/v3/workers/2740bb8c-057a-40f0-b4b7-c7a9231a42bf/"
}
(pulp) [vagrant@pulp2-nightly-pulp3-source-centos7 _scripts]$ http POST $BASE_ADDR/pulp/api/v3/repositories/rpm/rpm/0f6f95e9-e05d-449d-ac7d-2e22e7cdd081/sync/ remote=/pulp/api/v3/remotes/rpm/rpm/21b131cd-12b0-4a2a-a173-c105c9f089af/
HTTP/1.1 202 Accepted
Allow: POST, OPTIONS
Connection: close
Content-Length: 67
Content-Type: application/json
Date: Mon, 20 Apr 2020 17:52:56 GMT
Server: gunicorn/20.0.4
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN
{
"task": "/pulp/api/v3/tasks/0e876e06-9b86-4587-b920-b642259c1e93/"
}
Related issues
Updated by ipanova@redhat.com over 4 years ago
- Project changed from RPM Support to Pulp
Updated by ttereshc over 4 years ago
FWIW, after restart my tasks were stuck in the waiting state.
Updated by ipanova@redhat.com over 4 years ago
sometimes after prestart task fails and does no get canceled "description": "The task f91d8e47-f38b-4447-82d5-03cba6853b77 exited immediately for some reason. Marking as failed. Check the logs for more details",
Updated by daviddavis over 4 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 71
Updated by daviddavis over 4 years ago
- Triaged changed from Yes to No
- Sprint deleted (
Sprint 71)
Updated by daviddavis over 4 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 71
Updated by mdellweg over 4 years ago
Maybe python async debug mode is able to show, what it's waiting for:
https://docs.python.org/3/library/asyncio-dev.html#debug-mode
Updated by bmbouter over 4 years ago
When trying to debug a "stuck" task the first question usually is what is it stuck on? RQ uses a post-forker model so each task forks the main RQ process of the worker to create a child process, which RQ calls "the workhorse".
We tried py-spy to determine what the workhorse is doing while it's stuck in the waiting state and it gave non-helpful output. The next tool I recommend trying is to use the Python GDB extensions to show the python stack from a GDB core dump of the stuck process. I recommend taking a few core dumps over a few seconds and answering two quesitons:
What is it stuck on? The py-gdb tools will tell us this. Is it stuck at a single place, or is it in a loop over and over? Comparing the core dumps with py-gdb tools will give insight into this.
Here is the article I usually refer to that goes over the process of using Python GDB tools to inspect a running Python process https://wiki.python.org/moin/DebuggingWithGdb
Updated by fao89 over 4 years ago
- Related to Issue #7387: Tasks not delivered to resource-manager are not cleaned up added
Updated by dalley about 4 years ago
@Ipanova, when I run this now with the latest code, I'm able to sync the migrated repositories with the migrated remotes without issue and the tasks complete immediately.
Do you remember if there were any other details that might be important? If not, could you try reproducing this again and see if you still see it happening on your machine?
Updated by ipanova@redhat.com about 4 years ago
- Status changed from ASSIGNED to CLOSED - WORKSFORME
I can't reproduce this issue anymore, I am going to close it.