Issue #7549: Need a dry run option for import/export api - Pulp

Actions

Send by e-mail Copy link

Issue #7549

closed

Need a dry run option for import/export api

Added by paji@redhat.com over 4 years ago. Updated almost 4 years ago.

Status:

CLOSED - CURRENTRELEASE

Priority:

Normal

Assignee:

ggainey

Category:

Sprint/Milestone:

3.10.0

Start date:

Due date:

Estimated time:

Severity:

2. Medium

Version:

Platform Release:

OS:

Triaged:

Yes

Groomed:

Sprint Candidate:

Tags:

Katello

Sprint:

Sprint 89

Quarter:

Q4-2020

Description

Katello checks things like

Is the import path exists
Does it have the right permissions

It would be ideal if pulp provided a dry-run operation for import and export operations. Katello can then use that to figure out the correct course of action and not create stuff if its broken,

Actions

Copy link

Updated by fao89 over 4 years ago

Triaged changed from No to Yes
Quarter set to Q4-2020

Actions

Copy link

Updated by ggainey over 4 years ago

Discussion from IRC today:

<bmbouter> partha jsherrill do you know the answer to this quesiton about how katello sets pulp settings? https://pulp.plan.io/issues/7623#note-3
<daviddavis> bmbouter: we talked about this already and they're set by the foreman installer. that doesn't provide a way to read the settings unfortunately.
<jsherrill> bmbouter: an alternative might be to allow katello to submit a path to the import/export api and for it to tell us if its valid or not
<jsherrill> per the configuration
<jsherrill> kinda like a mini-dry run, but without having to create all the needed repositories beforehand
<partha> bmbouter: jsherrill I thought we just update /etc/pulp/settings.py?
<partha> bmbouter: well you'll have to be picky on what settings you expose
<jsherrill> partha: what do you think of my alternative ?  maybe it could also check if the export is actually at the path, and readable.  
<daviddavis> fwiw, we've talked about adding a dry run action before 
<jsherrill> yeah, its kinda a dry run-lite 
<jsherrill> we don't want to have to provide the full details, because that has some implications iirc
<daviddavis> well we'll probably have a dry run endpoint that does everything and then some params or something that can narrow the scope of the dry run to certain checks
<jsherrill> we really just want to check the validity and readability of the export data
<jsherrill> that sounds good
<daviddavis> ggainey: fyi ^
<ggainey> daviddavis: yup, was just reading - pretty much what I had in mind for dry-run, yus
<daviddavis> if partha agrees maybe someone can file an issue
<daviddavis> I got a meeting
<ggainey> don't we have one?
<ggainey> I think we have one
<partha> depends on the dry-run .. Do I need to pass it a repo mapping and stuff 
<ggainey> one sec
<jsherrill> partha: right, thats what i was saying, we need an option to not pass that
<ggainey> partha: jsherrill daviddavis : add thoughts to https://pulp.plan.io/issues/7549 please
<daviddavis> yea, my thought is that there are params to control the scope of the dry run
<daviddavis> so if you just want to validate the file, path, etc you can do that
<ggainey> or, does it just do everything, and not-fail at the first problem?
<partha> jturel: so if pulp implements ^ dry-run we could remove the accessible permissions checking and all that
<daviddavis> ggainey: that would work too. it would return like areport or something
<jturel> partha: that's a big huge +1 from me
<daviddavis> although some of the later checks it might not be able to perform though if the earlier checks fail
<ggainey> partha: you could call dry-run, you'd still have to respond to the (various) errors
<ggainey> daviddavis: sure
<ggainey> a;;" http://yum.theforeman.org/pulpcore/3.7/el8/x86_64/ is a reasonable place to find our RPM builds, yes? I pointed a user at it, and now I'm worried that there's a better location
<ggainey> all, even
* ggainey looks suspiciously at fingers
<partha> ggainey: yes it depends on the scope. If pulp is going to require us to provice repo-mapping for the dry run 
<partha> it might get harder
<ggainey> partha: /shrug I'd just write it to note "absence of mapping" as Just One More Error
<ggainey> I'd think decusions would be summat like "the place you want me to read from exists. I have permissions to read from it. the thing you want me to read exists. The thing you want me to read passes validation. you gave me a repo-mapping. the repo-mapping makes sense"
<ggainey> decisions
<ggainey> jeebus, I cannot type today
<partha> ggainey: daviddavis ok at the simplest level how about a general can "pulp read this dir"
<ggainey> that's part of "I have permissions to read from it"
<ggainey> I am pretty averse to having a pulp-api be a passthru to 'os' calls tho :)
<partha> "available_for_import"
<partha> :)

Actions

Copy link

Updated by paji@redhat.com over 4 years ago

In short

Make repo-mapping optional
Generate a report along the lines of

could read export = PASS
export is valid = PASS
mapping is valid = FAIL

Actions

Copy link

Updated by daviddavis over 4 years ago

Tags Katello added

Actions

Copy link

Updated by daviddavis over 4 years ago

My initial thought was that dry run would do an entire actual import/export but without actually changing any data or creating an export (respectively). This could be very expensive and invasive for users who don't want it (eg import would have to combine chunks and parse it to do a real dry run). So one idea I had is to have parameter(s) to limit the scope of the dry run.

Actions

Copy link

Updated by ggainey about 4 years ago

Discussion¶

Going over my to-do list and (finally) got back to this. Looking though the IRC conversation in #c2, and @paji 's summary in #c3, I see three separate kinds-of Things that katello wants to check:

Settings - is the location specified in --path/--toc permitted, based on ALLOWED_IMPORT_PATHS?
Mapping-sanity - optional; if provided, do the destination-repositories mentioned exist in this Pulp instance?
Import-file-existence - does the file pointed to by --path/--toc exist?
Import-file-validity - validates chunks/checksums of import-file

An implied assumption is that, if --validity is specified, nothing else happens - it is completely idempotent/nondestructive.

mapping-sanity¶

Do the specified destination-repositories exist in the Pulp instance?
Checked/enforced at IMPORTER-creation time
Can be reported via PulpImporterSerializer

settings¶

Is a specified import-file permitted based on ALLOWED_IMPORT_PATHS?
Can only be checked based on --toc or --path parameters
Checked/enforced at IMPORT time

import-file-existence¶

Does specified import-file exist?
Can only be checked based on --toc or --path parameters
Checked/enforced at IMPORT time

import-file-validity¶

Is specified import-file valid?
Can only be checked based on --toc or --path parameters
Checked/enforced at IMPORT time

import-file-existence and import-file-validity are already checked at the beginning of the import-process. Checking the file-validity in particular can be a time-consuming process, as it requires recomputing the checksum of the import-file chunks (if using --toc) and the resulting .tar.gz.

Conclusion¶

What we have, then, is a mix of things to validate, that are currently provided to two different endpoints, one of which is synchronous and one of which returns a task, a subset of which can take an appreciable/long time to compute, and most of which are already being done and reported on when invoking the entities in question.

The only thing we don't currently expose, is a way to ask for ALLOWED_IMPORT_PATHS so that the import-file can either be put in the right place, or the settings fixed in advance.

The net of this, for me, is threefold:

What katello needs isn't what we're envisioning for --dry-run
There is not currently a single endpoint that is set up to answer this question.
What we're thinking for --dry-run may not be advisable, given the existence of chunked import-files (which is likely to be the main path for this feature on any 'real' system with significant repositories)

Proposal¶

It feels to me like what is needed here is a new pulpimporter endpoint - something like

GET /pulp/api/v3/importers/core/pulp/validate?path=&toc=&mapping=

that returns a dictionary of results, with the keys being the four checks specified above.

@daviddavis jsherril@redhat.com paji@redhat.com What do you think about this? Feedback here in the issue please.

Actions

Copy link

Updated by daviddavis about 4 years ago

This sounds good to me. The only nitpick I would have here is that the API could be more restful (ie prefer the use of nouns in the API path over a verb 'validate').

Actions

Copy link

Updated by ggainey about 4 years ago

daviddavis wrote:

This sounds good to me. The only nitpick I would have here is that the API could be more restful (ie prefer the use of nouns in the API path over a verb 'validate').

Yeah, the specific name is def up-for-grabs yet. Maybe 'sanity'?

Actions

Copy link

Updated by daviddavis about 4 years ago

What about maybe an import-check?

Actions

Copy link

#10

Updated by ggainey about 4 years ago

daviddavis wrote:

What about maybe an import-check?

I love this - perfect!

Actions

Copy link

#11

Updated by daviddavis about 4 years ago

Sprint/Milestone set to 3.10.0

Actions

Copy link

#12

Updated by ggainey about 4 years ago

daviddavis jsherril@redhat.com paji@redhat.com - I am working on implementing this for Pulp3.10, and need a final agreement on the API and what we check.

Details:

Proposed API is GET /pulp/api/v3/importers/core/pulp/import-check?path=&toc=&mapping=
- This call is synchronous (returns results directly, not a task)
We ARE checking
- path (if specified) is in allowed-imports
- toc (if specified) is in allowed imports
- mapping (if specified) is valid JSON
We ARE NOT checking:
- file- or chunk-checksums

Questions:

should we be checking for existence of repositories from the provided mapping, or not?
- You might want to be checking before creating them?
should we be checking for existence of the provided path/toc files, or just 'legality'?
if we're not requiring the files to exist - should we instead take an import-dir= param?
should we be checking for write-perms in that dir?
- write-perm in dir is needed for chunked imports

I'm going to start wiring this up, but I need definite answers before I can finish the API.

Actions

Copy link

#13

Updated by daviddavis about 4 years ago

The one thing I might recommend is making the API more RESTful (ie a user creates an import check even though it's not persisted):

POST /pulp/api/v3/importers/core/pulp/import-checks

POST params: path, toc, mapping

The questions you bring up are good ones. I'll defer to Katello though since I don't have any strong opinions.

Actions

Copy link

#14

Updated by ggainey about 4 years ago

daviddavis wrote:

The one thing I might recommend is making the API more RESTful (ie a user creates an import check even though it's not persisted):

I don't know that I agree that should make it look like we're creating anything here. In my mind this is more like the /status endpoint, where the GET is returning information. I would find it more surprising that a POST call didn't actually change/persist anything on the server. But, I could be convinced otherwise, if it makes everyone more comfortable.

The questions you bring up are good ones. I'll defer to Katello though since I don't have any strong opinions.

Yeah, those are def more aimed to jsherrill and paji.

Actions

Copy link

#15

Updated by paji@redhat.com about 4 years ago

ggainey wrote:

daviddavis jsherril@redhat.com paji@redhat.com - I am working on implementing this for Pulp3.10, and need a final agreement on the API and what we check.

Details:

Proposed API is GET /pulp/api/v3/importers/core/pulp/import-check?path=&toc=&mapping=

This call is synchronous (returns results directly, not a task)

We ARE checking

path (if specified) is in allowed-imports

toc (if specified) is in allowed imports

mapping (if specified) is valid JSON

We ARE NOT checking:

file- or chunk-checksums

These sound ok to me. Katello is not checking on the checksums either. but I am ok with user having to clean up and start over again if their tar.gz checksum is incorrect.

Questions:

should we be checking for existence of repositories from the provided mapping, or not?

You might want to be checking before creating them?

Katello already checks in its database before calling pulp. Not necessary.

should we be checking for existence of the provided path/toc files, or just 'legality'?

What do you mean by legality? I can see checking permissions. Katello user doesnt have admin rights on pulp user. so we are entirely going to rely on pulp to validate if the data can be imported.

if we're not requiring the files to exist - should we instead take an import-dir= param?

Can you expand on this? What is the point of the validation if the file does not exist? Katello will likely send you the toc location rather than the directory?.

should we be checking for write-perms in that dir?

write-perm in dir is needed for chunked imports

Oh absolutely!. That's a great point. I didn't realize write permissions are needed.

I'm going to start wiring this up, but I need definite answers before I can finish the API.

Actions

Copy link

#16

Updated by ggainey about 4 years ago

paji@redhat.com wrote:

ggainey wrote:

daviddavis jsherril@redhat.com paji@redhat.com - I am working on implementing this for Pulp3.10, and need a final agreement on the API and what we check.

Details:

Proposed API is GET /pulp/api/v3/importers/core/pulp/import-check?path=&toc=&mapping=

This call is synchronous (returns results directly, not a task)

We ARE checking

path (if specified) is in allowed-imports

toc (if specified) is in allowed imports

mapping (if specified) is valid JSON

We ARE NOT checking:

file- or chunk-checksums

These sound ok to me. Katello is not checking on the checksums either. but I am ok with user having to clean up and start over again if their tar.gz checksum is incorrect.

Note that when you actually go to do the import, that does verify checksums. But it takes a long time, not suitable for synchronous quick-check.

Questions:

should we be checking for existence of repositories from the provided mapping, or not?

You might want to be checking before creating them?

Katello already checks in its database before calling pulp. Not necessary.

should we be checking for existence of the provided path/toc files, or just 'legality'?

What do you mean by legality? I can see checking permissions. Katello user doesnt have admin rights on pulp user. so we are entirely going to rely on pulp to validate if the data can be imported.

OK - one thing that had been briefly discussed was using this API before an actual export is available, as a pre-check/is-this-the-right-directory sort of thing. In that context, it would make more sense to be passing in the directory and checking it for allowed-imports and permissions. However, I think having the file-paths is the more useful workflow, so "making sure the files exist" is a useful check.

('legality' in this context was a shortcut for "in a place allowed under allowed-imports")

if we're not requiring the files to exist - should we instead take an import-dir= param?

Can you expand on this? What is the point of the validation if the file does not exist? Katello will likely send you the toc location rather than the directory?.

See previously - I think I was considering a workflow that isn't what anyone actually wants :)

should we be checking for write-perms in that dir?

write-perm in dir is needed for chunked imports

Oh absolutely!. That's a great point. I didn't realize write permissions are needed.

Yeah, because pulp has to reassemble the chunks listed in the TOC, it needs to be able to write to the same directory.