Project

Profile

Help

Story #5613

closed

As a user, I have an API based way to report and redownload (if possible) corrupted content on the file system for one repository

Added by jsherril@redhat.com about 5 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Katello
Sprint:
Sprint 70
Quarter:

Description

Motivation

Users can experience corrupted files either due to bit rot on mechanical hard drives or mistakenly run commands. We should make it as easy as possible to correct this situation when we can.

Idea

Make a new API endpoint that lives at /pulp/api/v3/repair/ and has 1 required parameter repository which is the href to the repository that is being requested to be repaired. Initially every request will be for a single repository.

Details

1. examines all downloaded content in the latest version and checks to see if each unit matches its expected checksum
2. re-downloads any corrupted files if possible based on RemoteArtifact entries in db 3. reports any units that could not be re-downloaded as counts

API Response

{
	"pulp_created": "2019-07-23T08:18:12.927007Z",
	"pulp_href": "/pulp/api/v3/task-groups/59f8a786-c7d7-4e2b-ad07-701479d403c5/",
	"repository_version_href": "/pulp/api/v3/repository/<path to the repository version/",
	"repaired": [
		"/pulp/api/v3/content/file/files/c23def43-44bc-45f4-8a6f-0310285f5339/",
		"/pulp/api/v3/content/file/files/18swef43-98s1-8d71-s8u1-0310285ffiq/"
	],
	"unrepairable": [
		"/pulp/api/v3/content/file/files/c23def43-44bc-45f4-8a6f-0310285f5339/"
	],
}

Note

This is only going to fix bitrot/corrupted files. It only inspects the content contained in the latest repository version, and does not consider if that repository version is an accurate or complete representation of the remote filesystem.


Related issues

Has duplicate Pulp - Story #5159: As a user, I can optionally validate and repair content at sync timeCLOSED - DUPLICATE

Actions
Actions #1

Updated by bmbouter about 5 years ago

  • Sprint/Milestone set to 3.1.0
Actions #2

Updated by daviddavis about 5 years ago

  • Sprint/Milestone deleted (3.1.0)
Actions #3

Updated by bmbouter about 5 years ago

Question 1: which endpoint?

I was wondering if this should be an action endpoint or not underneath sync, e.g. /fix/ or /repair/. I decided it should not be because:

a) we may want to offer this functionality system-wide later
b) plugin writers would have to enable that endpoint. We could enable it for all plugins without their involvement, but that weakens the practice that repo detail endpoints are plugin owned.

So if we did it system wide would it be?

/pulp/api/v3/repair/
/pulp/api/v3/filesystem-repair/
/pulp/api/v3/repo-repair/
/pulp/api/v3/fix/
/pulp/api/v3/filesystem-fix/
/pulp/api/v3/repo-fix/

Question 2: What Pulp locks need to guard this task?

I think we need to lock on the repository so that other repository operations won't make new repo versions underneath the content while it's being examined. System-wide checking (not proposed in this ticket) would be a whole different story.

Question 3: To confirm, this only fixes "downloaded content" right?

So if we implement a "continue even if 404s occur" and say two RPMs are missing from the latest repo version. This utility won't also fix that. If it's expected to also fix that then we need to integrate this deeply with sync also somehow. What can we do that is the best regarding this case?

Actions #4

Updated by jsherril@redhat.com about 5 years ago

Couple of comments:

Question 2: This makes sense to me

Question 3:

"So if we implement a "continue even if 404s occur" and say two RPMs are missing from the latest repo version. This utility won't also fix that. If it's expected to also fix that then we need to integrate this deeply with sync also somehow. What can we do that is the best regarding this case?"

I think this repair task not fixing is fine, since it was never downloaded in the first place. But what would fix that? A further sync? (assuming the underlying upstream repository is fixed too).

Actions #5

Updated by bmbouter about 5 years ago

wrote:

I think this repair task not fixing is fine, since it was never downloaded in the first place. But what would fix that? A further sync? (assuming the underlying upstream repository is fixed too).

Yes a further sync would resolve this if the upstream repository was also fixed. Pulp is going to recognize it does not have the content unit during that sync.

Actions #6

Updated by ttereshc about 5 years ago

Question 3:

"So if we implement a "continue even if 404s occur" and say two RPMs are missing from the latest repo version. This utility won't also fix that. If it's expected to also fix that then we need to integrate this deeply with sync also somehow. What can we do that is the best regarding this case?"

I think this repair task not fixing is fine, since it was never downloaded in the first place. But what would fix that? A further sync? (assuming the underlying upstream repository is fixed too).

For the upcoming sync to fix it, a sync should be always operational, with no condition to skip it.
How confident are we that every sync being operational is good enough performance-wise? If nothing changed upstream, we'll still analyse all the repodata, we'll check if every content exists in pulp and then we may perform all the checks to ensure that repo is valid (I'm not sure if we are performing this check if nothing changed in a repo). Maybe it's fast and no need to be worried.

If we want any optimization, we'll either need to introduce force_full to pulp3, or we may need to do full repair here and not only downloaded content.
What do you think?

Also we potentially can have different options for the suggested endpoints:
/pulp/api/v3/repair/ - full repair = corrupted files + download missing ones
/pulp/api/v3/filesystem-repair/ - only corrupted files
/pulp/api/v3/repo-repair/ - corrupted and downloaded for a specific repo

I'm worried that having all those options adds complexity but at the same time it gives flexibility.

Actions #7

Updated by bmbouter about 5 years ago

ttereshc wrote:

For the upcoming sync to fix it, a sync should be always operational, with no condition to skip it.

Agreed

How confident are we that every sync being operational is good enough performance-wise? If nothing changed upstream, we'll still analyse all the repodata, we'll check if every content exists in pulp and then we may perform all the checks to ensure that repo is valid (I'm not sure if we are performing this check if nothing changed in a repo). Maybe it's fast and no need to be worried.

I suspect no matter how fast it is, users will want it to be faster in cases when it can be. I'm interested in us measuring the resync time in the case that nothing changed for RPM first so we can understand how much opportunity for speedup there is.

If we want any optimization, we'll either need to introduce force_full to pulp3, or we may need to do full repair here and not only downloaded content.
What do you think?

What I learned from pulp2 is that anytime we introduce an optomization we need a way to turn it (or all of them) off.

Also we potentially can have different options for the suggested endpoints:
/pulp/api/v3/repair/ - full repair = corrupted files + download missing ones
/pulp/api/v3/filesystem-repair/ - only corrupted files
/pulp/api/v3/repo-repair/ - corrupted and downloaded for a specific repo

I'm worried that having all those options adds complexity but at the same time it gives flexibility.

This is my primary concern also.

Actions #8

Updated by bmbouter about 5 years ago

  • Description updated (diff)

I added clarification to the body that it only inspects the filesystem and won't check the remote metadata to consider if the repository version is "complete".

Question: how will the "unable to be fixed" units be reported?

In terms of current task reporting capabilities, progress reports are only prepared to report counts. Is that acceptable? It could look like:

{
    "progress_reports": [{
            "code": "fixed_count",
            "done": 5,
            "message": "The count of fixed items that were fixed.",
            "state": "completed",
            "suffix": null,
            "total": 5
        },
        {
            "code": "pre_fix_corrupted_count",
            "done": 10,
            "message": "The count of content items that were corrupted prior to fixing",
            "state": "completed",
            "suffix": null,
            "total": 10
        }
    ]
}
Actions #9

Updated by jsherril@redhat.com almost 5 years ago

That looks good to me!

Actions #10

Updated by ttereshc almost 5 years ago

  • Related to Story #5159: As a user, I can optionally validate and repair content at sync time added
Actions #11

Updated by ttereshc almost 5 years ago

FWIW, in RPM plugin we have sync optimizations, so if nothing changed in a remote repo and no changes have been made to a pulp repo, then the upcoming sync will be a no-op one. I think it's ok, it's a rare case when one hits 404 for content download and then it started working with no changes.

Actions #12

Updated by ttereshc almost 5 years ago

  • Related to deleted (Story #5159: As a user, I can optionally validate and repair content at sync time)
Actions #13

Updated by ttereshc almost 5 years ago

  • Has duplicate Story #5159: As a user, I can optionally validate and repair content at sync time added
Actions #14

Updated by jsherril@redhat.com almost 5 years ago

  • Tags Katello-P1 added
  • Tags deleted (Katello-P2)
Actions #15

Updated by bmbouter almost 5 years ago

  • Subject changed from Provide ability to report and redownload (if possible) corrupted content on the file system to As a user, I have an API based way to report and redownload (if possible) corrupted content on the file system for one repository
  • Description updated (diff)
Actions #16

Updated by jsherril@redhat.com almost 5 years ago

New response format looks good to me. +1

Actions #17

Updated by bmbouter almost 5 years ago

  • Description updated (diff)

Revised after concerns about users not being able to know what was and was not repairable.

Actions #18

Updated by daviddavis almost 5 years ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #19

Updated by daviddavis almost 5 years ago

  • Sprint set to Sprint 69

Adding to current sprint since 3.3 deadline is soon.

Actions #20

Updated by daviddavis almost 5 years ago

  • Sprint/Milestone set to 3.3.0
Actions #21

Updated by mdellweg almost 5 years ago

  • Assignee set to mdellweg
Actions #22

Updated by mdellweg almost 5 years ago

I am in the process of creating a POC with a slightly different design. As the operation is really repairing the artifacts in a repository version, i am adding a task that is triggered by detail endpoint of repository_version with the name 'repair'. To support the repair a list of repositories workflow, a second endpoint can be added to trigger a task with subtasks of the above type.

The python bindings to access the nested endpoint look like: pulpcore.client.pulp_file.api.repositories_file_versions_api.RepositoriesFileVersionsApi.repair

Actions #23

Updated by pulpbot almost 5 years ago

  • Status changed from NEW to POST
Actions #24

Updated by rchan almost 5 years ago

  • Sprint changed from Sprint 69 to Sprint 70
Actions #25

Updated by mdellweg almost 5 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #26

Updated by ttereshc almost 5 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Actions #27

Updated by ggainey over 4 years ago

  • Tags Katello added
  • Tags deleted (Katello-P1)

Also available in: Atom PDF