Issue #7549
closedNeed a dry run option for import/export api
Description
Katello checks things like
- Is the import path exists
- Does it have the right permissions
It would be ideal if pulp provided a dry-run operation for import and export operations. Katello can then use that to figure out the correct course of action and not create stuff if its broken,
Updated by fao89 over 4 years ago
- Triaged changed from No to Yes
- Quarter set to Q4-2020
Updated by ggainey over 4 years ago
Discussion from IRC today:
<bmbouter> partha jsherrill do you know the answer to this quesiton about how katello sets pulp settings? https://pulp.plan.io/issues/7623#note-3
<daviddavis> bmbouter: we talked about this already and they're set by the foreman installer. that doesn't provide a way to read the settings unfortunately.
<jsherrill> bmbouter: an alternative might be to allow katello to submit a path to the import/export api and for it to tell us if its valid or not
<jsherrill> per the configuration
<jsherrill> kinda like a mini-dry run, but without having to create all the needed repositories beforehand
<partha> bmbouter: jsherrill I thought we just update /etc/pulp/settings.py?
<partha> bmbouter: well you'll have to be picky on what settings you expose
<jsherrill> partha: what do you think of my alternative ? maybe it could also check if the export is actually at the path, and readable.
<daviddavis> fwiw, we've talked about adding a dry run action before
<jsherrill> yeah, its kinda a dry run-lite
<jsherrill> we don't want to have to provide the full details, because that has some implications iirc
<daviddavis> well we'll probably have a dry run endpoint that does everything and then some params or something that can narrow the scope of the dry run to certain checks
<jsherrill> we really just want to check the validity and readability of the export data
<jsherrill> that sounds good
<daviddavis> ggainey: fyi ^
<ggainey> daviddavis: yup, was just reading - pretty much what I had in mind for dry-run, yus
<daviddavis> if partha agrees maybe someone can file an issue
<daviddavis> I got a meeting
<ggainey> don't we have one?
<ggainey> I think we have one
<partha> depends on the dry-run .. Do I need to pass it a repo mapping and stuff
<ggainey> one sec
<jsherrill> partha: right, thats what i was saying, we need an option to not pass that
<ggainey> partha: jsherrill daviddavis : add thoughts to https://pulp.plan.io/issues/7549 please
<daviddavis> yea, my thought is that there are params to control the scope of the dry run
<daviddavis> so if you just want to validate the file, path, etc you can do that
<ggainey> or, does it just do everything, and not-fail at the first problem?
<partha> jturel: so if pulp implements ^ dry-run we could remove the accessible permissions checking and all that
<daviddavis> ggainey: that would work too. it would return like areport or something
<jturel> partha: that's a big huge +1 from me
<daviddavis> although some of the later checks it might not be able to perform though if the earlier checks fail
<ggainey> partha: you could call dry-run, you'd still have to respond to the (various) errors
<ggainey> daviddavis: sure
<ggainey> a;;" http://yum.theforeman.org/pulpcore/3.7/el8/x86_64/ is a reasonable place to find our RPM builds, yes? I pointed a user at it, and now I'm worried that there's a better location
<ggainey> all, even
* ggainey looks suspiciously at fingers
<partha> ggainey: yes it depends on the scope. If pulp is going to require us to provice repo-mapping for the dry run
<partha> it might get harder
<ggainey> partha: /shrug I'd just write it to note "absence of mapping" as Just One More Error
<ggainey> I'd think decusions would be summat like "the place you want me to read from exists. I have permissions to read from it. the thing you want me to read exists. The thing you want me to read passes validation. you gave me a repo-mapping. the repo-mapping makes sense"
<ggainey> decisions
<ggainey> jeebus, I cannot type today
<partha> ggainey: daviddavis ok at the simplest level how about a general can "pulp read this dir"
<ggainey> that's part of "I have permissions to read from it"
<ggainey> I am pretty averse to having a pulp-api be a passthru to 'os' calls tho :)
<partha> "available_for_import"
<partha> :)
Updated by paji@redhat.com over 4 years ago
In short
- Make repo-mapping optional
- Generate a report along the lines of
could read export = PASS export is valid = PASS mapping is valid = FAIL
Updated by daviddavis over 4 years ago
My initial thought was that dry run would do an entire actual import/export but without actually changing any data or creating an export (respectively). This could be very expensive and invasive for users who don't want it (eg import would have to combine chunks and parse it to do a real dry run). So one idea I had is to have parameter(s) to limit the scope of the dry run.
Updated by ggainey about 4 years ago
Discussion¶
Going over my to-do list and (finally) got back to this. Looking though the IRC conversation in #c2, and @paji 's summary in #c3, I see three separate kinds-of Things that katello wants to check:
-
Settings - is the location specified in
--path
/--toc
permitted, based onALLOWED_IMPORT_PATHS
? - Mapping-sanity - optional; if provided, do the destination-repositories mentioned exist in this Pulp instance?
-
Import-file-existence - does the file pointed to by
--path
/--toc
exist? - Import-file-validity - validates chunks/checksums of import-file
An implied assumption is that, if --validity
is specified, nothing else happens - it is completely idempotent/nondestructive.
mapping-sanity¶
- Do the specified destination-repositories exist in the Pulp instance?
- Checked/enforced at IMPORTER-creation time
- Can be reported via
PulpImporterSerializer
settings¶
- Is a specified import-file permitted based on
ALLOWED_IMPORT_PATHS
? - Can only be checked based on
--toc
or--path
parameters - Checked/enforced at IMPORT time
import-file-existence¶
- Does specified import-file exist?
- Can only be checked based on
--toc
or--path
parameters - Checked/enforced at IMPORT time
import-file-validity¶
- Is specified import-file valid?
- Can only be checked based on
--toc
or--path
parameters - Checked/enforced at IMPORT time
import-file-existence and import-file-validity are already checked at the beginning of the import-process. Checking the file-validity in particular can be a time-consuming process, as it requires recomputing the checksum of the import-file chunks (if using --toc
) and the resulting .tar.gz.
Conclusion¶
What we have, then, is a mix of things to validate, that are currently provided to two different endpoints, one of which is synchronous and one of which returns a task, a subset of which can take an appreciable/long time to compute, and most of which are already being done and reported on when invoking the entities in question.
The only thing we don't currently expose, is a way to ask for ALLOWED_IMPORT_PATHS
so that the import-file can either be put in the right place, or the settings fixed in advance.
The net of this, for me, is threefold:
- What katello needs isn't what we're envisioning for
--dry-run
- There is not currently a single endpoint that is set up to answer this question.
- What we're thinking for
--dry-run
may not be advisable, given the existence of chunked import-files (which is likely to be the main path for this feature on any 'real' system with significant repositories)
Proposal¶
It feels to me like what is needed here is a new pulpimporter endpoint - something like
GET /pulp/api/v3/importers/core/pulp/validate?path=&toc=&mapping=
that returns a dictionary of results, with the keys being the four checks specified above.
@daviddavis jsherril@redhat.com paji@redhat.com What do you think about this? Feedback here in the issue please.
Updated by daviddavis about 4 years ago
This sounds good to me. The only nitpick I would have here is that the API could be more restful (ie prefer the use of nouns in the API path over a verb 'validate').
Updated by ggainey about 4 years ago
daviddavis wrote:
This sounds good to me. The only nitpick I would have here is that the API could be more restful (ie prefer the use of nouns in the API path over a verb 'validate').
Yeah, the specific name is def up-for-grabs yet. Maybe 'sanity'?
Updated by ggainey about 4 years ago
daviddavis wrote:
What about maybe an
import-check
?
I love this - perfect!
Updated by ggainey about 4 years ago
daviddavis jsherril@redhat.com paji@redhat.com - I am working on implementing this for Pulp3.10, and need a final agreement on the API and what we check.
Details:
- Proposed API is
GET /pulp/api/v3/importers/core/pulp/import-check?path=&toc=&mapping=
- This call is synchronous (returns results directly, not a task)
- We ARE checking
- path (if specified) is in allowed-imports
- toc (if specified) is in allowed imports
- mapping (if specified) is valid JSON
- We ARE NOT checking:
- file- or chunk-checksums
Questions:
- should we be checking for existence of repositories from the provided mapping, or not?
- You might want to be checking before creating them?
- should we be checking for existence of the provided path/toc files, or just 'legality'?
- if we're not requiring the files to exist - should we instead take an
import-dir=
param? - should we be checking for write-perms in that dir?
- write-perm in dir is needed for chunked imports
I'm going to start wiring this up, but I need definite answers before I can finish the API.
Updated by daviddavis about 4 years ago
The one thing I might recommend is making the API more RESTful (ie a user creates an import check even though it's not persisted):
POST /pulp/api/v3/importers/core/pulp/import-checks
POST params: path, toc, mapping
The questions you bring up are good ones. I'll defer to Katello though since I don't have any strong opinions.
Updated by ggainey about 4 years ago
daviddavis wrote:
The one thing I might recommend is making the API more RESTful (ie a user creates an import check even though it's not persisted):
I don't know that I agree that should make it look like we're creating anything here. In my mind this is more like the /status endpoint, where the GET is returning information. I would find it more surprising that a POST call didn't actually change/persist anything on the server. But, I could be convinced otherwise, if it makes everyone more comfortable.
The questions you bring up are good ones. I'll defer to Katello though since I don't have any strong opinions.
Yeah, those are def more aimed to jsherrill and paji.
Updated by paji@redhat.com about 4 years ago
ggainey wrote:
daviddavis jsherril@redhat.com paji@redhat.com - I am working on implementing this for Pulp3.10, and need a final agreement on the API and what we check.
Details:
- Proposed API is
GET /pulp/api/v3/importers/core/pulp/import-check?path=&toc=&mapping=
- This call is synchronous (returns results directly, not a task)
- We ARE checking
- path (if specified) is in allowed-imports
- toc (if specified) is in allowed imports
- mapping (if specified) is valid JSON
- We ARE NOT checking:
- file- or chunk-checksums
These sound ok to me. Katello is not checking on the checksums either. but I am ok with user having to clean up and start over again if their tar.gz checksum is incorrect.
Questions:
- should we be checking for existence of repositories from the provided mapping, or not?
- You might want to be checking before creating them?
Katello already checks in its database before calling pulp. Not necessary.
- should we be checking for existence of the provided path/toc files, or just 'legality'?
What do you mean by legality? I can see checking permissions. Katello user doesnt have admin rights on pulp user. so we are entirely going to rely on pulp to validate if the data can be imported.
- if we're not requiring the files to exist - should we instead take an
import-dir=
param?
Can you expand on this? What is the point of the validation if the file does not exist? Katello will likely send you the toc location rather than the directory?.
- should we be checking for write-perms in that dir?
- write-perm in dir is needed for chunked imports
Oh absolutely!. That's a great point. I didn't realize write permissions are needed.
I'm going to start wiring this up, but I need definite answers before I can finish the API.
Updated by ggainey about 4 years ago
paji@redhat.com wrote:
ggainey wrote:
daviddavis jsherril@redhat.com paji@redhat.com - I am working on implementing this for Pulp3.10, and need a final agreement on the API and what we check.
Details:
- Proposed API is
GET /pulp/api/v3/importers/core/pulp/import-check?path=&toc=&mapping=
- This call is synchronous (returns results directly, not a task)
- We ARE checking
- path (if specified) is in allowed-imports
- toc (if specified) is in allowed imports
- mapping (if specified) is valid JSON
- We ARE NOT checking:
- file- or chunk-checksums
These sound ok to me. Katello is not checking on the checksums either. but I am ok with user having to clean up and start over again if their tar.gz checksum is incorrect.
Note that when you actually go to do the import, that does verify checksums. But it takes a long time, not suitable for synchronous quick-check.
Questions:
- should we be checking for existence of repositories from the provided mapping, or not?
- You might want to be checking before creating them?
Katello already checks in its database before calling pulp. Not necessary.
+1
- should we be checking for existence of the provided path/toc files, or just 'legality'?
What do you mean by legality? I can see checking permissions. Katello user doesnt have admin rights on pulp user. so we are entirely going to rely on pulp to validate if the data can be imported.
OK - one thing that had been briefly discussed was using this API before an actual export is available, as a pre-check/is-this-the-right-directory sort of thing. In that context, it would make more sense to be passing in the directory and checking it for allowed-imports and permissions. However, I think having the file-paths is the more useful workflow, so "making sure the files exist" is a useful check.
('legality' in this context was a shortcut for "in a place allowed under allowed-imports")
- if we're not requiring the files to exist - should we instead take an
import-dir=
param?Can you expand on this? What is the point of the validation if the file does not exist? Katello will likely send you the toc location rather than the directory?.
See previously - I think I was considering a workflow that isn't what anyone actually wants :)
- should we be checking for write-perms in that dir?
- write-perm in dir is needed for chunked imports
Oh absolutely!. That's a great point. I didn't realize write permissions are needed.
Yeah, because pulp has to reassemble the chunks listed in the TOC, it needs to be able to write to the same directory.
I'm going to start wiring this up, but I need definite answers before I can finish the API.
I think you've answered all my questions, partha - thanks!
Updated by pulpbot almost 4 years ago
- Status changed from ASSIGNED to POST
Added by ggainey almost 4 years ago
Updated by ggainey almost 4 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulpcore|2cddc5d4a02ec976373c3d5050de2cabdfbb10ba.
Updated by pulpbot almost 4 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Added import-check command, tests, and doc.
closes #7549