Story #1976
closedAs user, I can have packages sorted in "Packages/$x" directories when published
100%
Description
See comment 32 for the use case requirements. See comment 42 for a clarification on the requirements.
Updated by bmbouter over 8 years ago
Can more detail be put into this? There are some docs written on how to write good stories here[0]. To keep the discussion going, here are some questions I came up with:
This is for the yum_distributor, right?
What would the option be and can a more formal description be written for the behavior it provides?
What would the CLI options be called?
[0]: http://pulp.readthedocs.io/en/latest/dev-guide/contributing/features.html
Updated by jluza over 8 years ago
Ok, let me answer/edit description here in comment, because I don't have permissions to do that in regular way.
User Identification: arbitrary user with push permissions. Or owner of the repository but I suppose anyone who is able to modify the repository, should be able to do this.
Context
There was decision made in the past which led to this split (repodata and packages in separate directory) and that's why we have to keep this structure compatible. Although we have some repositories where repodata are in the same directory as packages. I don't know historical details, but as I mentioned above, fedora repositories also use this.
Acceptance Criteria
- attribute has to be optional
- attribute has to be in format that is allowed by mkdir
- if attribute is not set, packages have to be sync to relative_path of repository
Updated by bmbouter over 8 years ago
jluza you should have edit rights on issues. I believe you need to click 'Edit' and then next to the label 'Descrpition' is an icon of a pencil. Clicking the pencil icon will show you the editable description textbox.
Updated by jluza over 8 years ago
- Status changed from NEW to POST
Updated by bmbouter over 8 years ago
- Subject changed from as user, I can specify directory where all rpms will be published to As user, I can specify directory where all rpms will be published
jluza would you be willing to write one or more pulp-smash issues[0] on how to test this?
Added by jluza over 8 years ago
Updated by jluza over 8 years ago
- Status changed from POST to MODIFIED
- % Done changed from 0 to 100
Applied in changeset ac76ec18f3662dc6c95e34f70565240265b6fa2d.
Updated by pthomas@redhat.com over 8 years ago
I have created a pulp-smash issue. But need to add specific test cases around this
Updated by pthomas@redhat.com over 8 years ago
The CLI option doesn't seem to be available for package directory
[root@ibm-x3550m3-11 ~]# rpm -qa |grep pulp-server
pulp-server-2.9.0-0.3.beta.el7.noarch
[root@ibm-x3550m3-11 ~]#
[root@ibm-x3550m3-11 ~]# pulp-admin rpm repo update --repo-id rhel7 --packages_directory /tmp/
[root@ibm-x3550m3-11 ~]# pulp-admin rpm repo update --repo-id rhel7 --packages_directory /tmp/
Command: update
Description: changes metadata on an existing repository
Available Arguments:
--bg - if specified, the client process will end immediately (the
task will continue to run on the server)
--repo-id - (required) unique identifier; only alphanumeric, ., -, and _
allowed
--display-name - user-readable display name (may contain i18n characters)
--description - user-readable description (may contain i18n characters)
--note - adds/updates/deletes notes to programmatically identify the
resource; key-value pairs must be separated by an equal sign
(e.g. key=value); multiple notes can be changed by specifying
this option multiple times; notes are deleted by specifying
"" as the value
Synchronization
--feed - URL of the external source repository to sync
--validate - if "true", the size and checksum of each synchronized file will
be verified against the repo metadata
--skip - comma-separated list of types to omit when synchronizing, if not
specified all types will be synchronized; valid values are: rpm,
drpm, distribution, erratum
Feed SSL
--feed-ca-cert - full path to the CA certificate that should be used to
verify the external repo server's SSL certificate
--verify-feed-ssl - if "true", the feed's SSL certificate will be verified
against the feed_ca_cert
--feed-cert - full path to the certificate to use for authorization when
accessing the external feed
--feed-key - full path to the private key for feed_cert
Feed Proxy
--proxy-host - proxy server url to use
--proxy-port - port on the proxy server to make requests
--proxy-user - username used to authenticate with the proxy server
--proxy-pass - password used to authenticate with the proxy server
Basic Authentication
--basicauth-user - username used to authenticate with sync location via HTTP
basic auth
--basicauth-pass - password used to authenticate with sync location via HTTP
basic auth
Throttling
--max-downloads - maximum number of downloads that will run concurrently
--max-speed - maximum bandwidth used per download thread, in bytes/sec,
when synchronizing the repo
Repository Content Behavior
--remove-missing - if "true", units that were previously in the external
feed but are no longer found will be removed from the
repository; defaults to false
--retain-old-count - count indicating how many non-latest versions of a unit
to keep in a repository
Download Policy
--download-policy - content downloading policy (immediate | background |
on_demand)
Publishing
--relative-url - relative path the repository will be served from. Only
alphanumeric characters, forward slashes, underscores and
dashes are allowed. It defaults to the relative path of
the feed URL
--serve-http - if "true", the repository will be served over HTTP;
defaults to false
--serve-https - if "true", the repository will be served over HTTPS;
defaults to true
--checksum-type - type of checksum to use during metadata generation
--gpg-key - GPG key used to sign and verify packages in the repository
--generate-sqlite - if "true", sqlite files will be generated for the
repository metadata during publish
--repoview - if "true", static HTML files will be generated during
publish by the repoview tool for faster browsing of the
repository. Enables --generate-sqlite flag.
Consumer Authentication
--host-ca - full path to the CA certificate that signed the repository
hosts's SSL certificate when serving over HTTPS
--auth-ca - full path to the CA certificate that should be used to verify
client authentication certificates; setting this turns on client
authentication for the repository
--auth-cert - full path to the entitlement certificate that will be given to
bound consumers to grant access to this repository
The following options were specified but do not exist on the command:
--packages_directory
[root@ibm-x3550m3-11 ~]# pulp-admin rpm repo create
Command: create
Description: creates a new repository
Available Arguments:
--repo-id - (required) unique identifier; only alphanumeric, ., -, and _
allowed
--display-name - user-readable display name (may contain i18n characters)
--description - user-readable description (may contain i18n characters)
--note - adds/updates/deletes notes to programmatically identify the
resource; key-value pairs must be separated by an equal sign
(e.g. key=value); multiple notes can be changed by specifying
this option multiple times; notes are deleted by specifying
"" as the value
Synchronization
--feed - URL of the external source repository to sync
--validate - if "true", the size and checksum of each synchronized file will
be verified against the repo metadata
--skip - comma-separated list of types to omit when synchronizing, if not
specified all types will be synchronized; valid values are: rpm,
drpm, distribution, erratum
Feed SSL
--feed-ca-cert - full path to the CA certificate that should be used to
verify the external repo server's SSL certificate
--verify-feed-ssl - if "true", the feed's SSL certificate will be verified
against the feed_ca_cert
--feed-cert - full path to the certificate to use for authorization when
accessing the external feed
--feed-key - full path to the private key for feed_cert
Feed Proxy
--proxy-host - proxy server url to use
--proxy-port - port on the proxy server to make requests
--proxy-user - username used to authenticate with the proxy server
--proxy-pass - password used to authenticate with the proxy server
Basic Authentication
--basicauth-user - username used to authenticate with sync location via HTTP
basic auth
--basicauth-pass - password used to authenticate with sync location via HTTP
basic auth
Throttling
--max-downloads - maximum number of downloads that will run concurrently
--max-speed - maximum bandwidth used per download thread, in bytes/sec,
when synchronizing the repo
Repository Content Behavior
--remove-missing - if "true", units that were previously in the external
feed but are no longer found will be removed from the
repository; defaults to false
--retain-old-count - count indicating how many non-latest versions of a unit
to keep in a repository
Download Policy
--download-policy - content downloading policy (immediate | background |
on_demand)
Publishing
--relative-url - relative path the repository will be served from. Only
alphanumeric characters, forward slashes, underscores and
dashes are allowed. It defaults to the relative path of
the feed URL
--serve-http - if "true", the repository will be served over HTTP;
defaults to false
--serve-https - if "true", the repository will be served over HTTPS;
defaults to true
--checksum-type - type of checksum to use during metadata generation
--gpg-key - GPG key used to sign and verify packages in the repository
--generate-sqlite - if "true", sqlite files will be generated for the
repository metadata during publish
--repoview - if "true", static HTML files will be generated during
publish by the repoview tool for faster browsing of the
repository. Enables --generate-sqlite flag.
Consumer Authentication
--host-ca - full path to the CA certificate that signed the repository
hosts's SSL certificate when serving over HTTPS
--auth-ca - full path to the CA certificate that should be used to verify
client authentication certificates; setting this turns on client
authentication for the repository
--auth-cert - full path to the entitlement certificate that will be given to
bound consumers to grant access to this repository
The following options are required but were not specified:
--repo-id
[root@ibm-x3550m3-11 ~]#
Updated by pthomas@redhat.com over 8 years ago
- Status changed from MODIFIED to ASSIGNED
As per my previous comment and further talking with bmbouter, moving it to assigned.
Updated by mhrivnak over 8 years ago
Based on a quick glance at the pull request, I don't see anything that would add an option to pulp-admin.
Assuming that's true, I suggest we make a separate story to add the option to pulp-admin in 2.10, and change the scope of this story to only include the server-side parts.
Updated by semyers over 8 years ago
mhrivnak wrote:
Based on a quick glance at the pull request, I don't see anything that would add an option to pulp-admin.
Assuming that's true, I suggest we make a separate story to add the option to pulp-admin in 2.10, and change the scope of this story to only include the server-side parts.
+1
The functionality is there in 2.9 as far as I can tell, minus pulp-admin support:
$ git tag --contains ac76ec18f3662dc6c95e34f70565240265b6fa2d pulp-rpm-2.9.0-0.1.beta pulp-rpm-2.9.0-0.2.beta pulp-rpm-2.9.0-0.3.beta pulp-rpm-2.9.0-0.4.rc
Updated by bmbouter over 8 years ago
- Status changed from ASSIGNED to MODIFIED
Per discussion on the bug, I'm moving back to MODIFIED for verification of just the API bits. I've created a new story [0] to track the addition of the pulp-admin options. The story never specified pulp-admin requirements in the description so it didn't require any modification.
Updated by Ichimonji10 over 8 years ago
- Status changed from MODIFIED to ASSIGNED
It's still unclear to me what is being requested here.
- May
packages_directory
be an absolute path, a relative path, or either?
* Ifpackages_directory
is an absolute path, does that mean that one may publish RPMs to anywhere on the Pulp system's filesystem?
* Ifpackages_directory
is a relative path, what is the path relative to? The distributor'srelative_url
? - Are trailing slashes required, not required, or optional?
- Can this option be specified at publish-time via the
override_config
option?
I've taken jluza's pull reuqest (https://github.com/PulpQE/pulp-smash/pull/312) so that they're a bit more sane (https://github.com/PulpQE/pulp-smash/compare/master...Ichimonji10:pr/312). Unfortunately, the tests still fail. I have no idea if it's b/c the tests are wrong, or b/c the feature is wrong, or what. Some additional requirements need to be specified out before this feature can even be tested.
Updated by mhrivnak over 8 years ago
It looks like the new config option did not get documented. I think adding documentation to the yum_distributor's config docs would provide what Ichimonji10 is looking for. That's a basic expectation anyway for a new config option.
Updated by bmbouter over 8 years ago
Yes I agree documentation is a basic requirement [0]. I remember seeing them in the PR before, but I can't find the commit where they are present. I think it was lost in a rebase, along with another rebase error that was introduced [1]. Also there is a type on this line which was always there[2].
The best thing we can do now is add the documentation so that there are no code changes which would require us to go from RC back to beta.
[0]: https://github.com/pulp/pulp_rpm/pull/897#issuecomment-224296088
[1]: https://github.com/pulp/pulp_rpm/pull/897/files#diff-00f13f6321363e2b071e468d88945cf5R23
[2]: https://github.com/pulp/pulp_rpm/pull/897/files#diff-00f13f6321363e2b071e468d88945cf5R24
Updated by bmbouter over 8 years ago
The docs were definitely here because I acknowledged that he had completed that: https://github.com/pulp/pulp_rpm/pull/897#issuecomment-224594888
Unfortunately those were lost in the rebase.
Updated by jluza over 8 years ago
Ichimonji10 wrote:
It's still unclear to me what is being requested here.
- May
packages_directory
be an absolute path, a relative path, or either?- If
packages_directory
is an absolute path, does that mean that one may publish RPMs to anywhere on the Pulp system's filesystem?- If
packages_directory
is a relative path, what is the path relative to? The distributor'srelative_url
?- Are trailing slashes required, not required, or optional?
- Can this option be specified at publish-time via the
override_config
option?I've taken jluza's pull reuqest (https://github.com/PulpQE/pulp-smash/pull/312) so that they're a bit more sane (https://github.com/PulpQE/pulp-smash/compare/master...Ichimonji10:pr/312). Unfortunately, the tests still fail. I have no idea if it's b/c the tests are wrong, or b/c the feature is wrong, or what. Some additional requirements need to be specified out before this feature can even be tested.
The features is called packages "directory", because it should be really only directory, not path. I hope it will be clear from the documentation. However you have point there, because in theory I think it's probably possible to have absolute/relative path as part of destination for packages. I would have to look at repo metadata specifications. I think location for packages can be only relative to the "root" directory of repository.
Is there a way how to determine if attribute can be specified in override_config or not?
packages directory should be os.path.joinable, so no trailing slashes doesn't matter, however string can't start with slash.
Updated by Ichimonji10 over 8 years ago
The features is called packages "directory", because it should be really only directory, not path. [...] in theory I think it's probably possible to have absolute/relative path as part of destination for packages.
How does one identify a directory? With a path:
mkdir foo
mkdir ./bar
mkdir ../biz
mkdir /tmp/baz
mkdir -p one/two
mkdir -p ./three/four/
Are the paths above valid? According to note 2, yes: "attribute has to be in format that is allowed by mkdir."
I would have to look at repo metadata specifications. I think location for packages can be only relative to the "root" directory of repository.
Is there a way how to determine if attribute can be specified in override_config or not?
packages directory should be os.path.joinable, so no trailing slashes doesn't matter, however string can't start with slash.
So, it sounds like:
-
packages_directory
must be a relative path. It must not be an absolute path. -
packages_directory
specifies where a distributor should place packages, relative to itsrelative_url
option. (By default, packages are placed in the current directory, relative to therelative_url
option.) -
packages_directory
may end with a trailing slash. If present, it is stripped. Directory names with literal trailing slashes are not created.
Outstanding questions include:
- May
packages_directory
may be passed at publish-time via theoverride_config
option? - May
packages_directory
specify a parent directory? That is, may one pass a value such as../foo
? - May
packages_directory
specify multiple path components? That is, may one pass a value such asfoo/bar
or./foo/bar/
? - What is the rationale for introducing this option? Is this being introduced so that RCM's workflow doesn't break? Because we're trying to reach parity with an existing RPM-management tool? Because it's neat? Knowing why we're introducing this option will help out when bugs are inevitably filed against the feature: we'll be able to consult with the original stakeholder/document/specification/tool/etc when sorting out the bug. It'll also help out if this feature is ever brought up as a candidate for deprecation.
Updated by jluza over 8 years ago
Ichimonji10 wrote:
Outstanding questions include:
- May
packages_directory
may be passed at publish-time via theoverride_config
option?
I always assumed, any distributor's option can be passed to override_config. So answer is yes.
- May
packages_directory
specify a parent directory? That is, may one pass a value such as../foo
?
That's probably something that depends on implementation in yum/dnf. We don't need this to be possible.
- May
packages_directory
specify multiple path components? That is, may one pass a value such asfoo/bar
or./foo/bar/
?
Same as above. I will try to get statement from packaging team today.
- What is the rationale for introducing this option? Is this being introduced so that RCM's workflow doesn't break? Because we're trying to reach parity with an existing RPM-management tool? Because it's neat? Knowing why we're introducing this option will help out when bugs are inevitably filed against the feature: we'll be able to consult with the original stakeholder/document/specification/tool/etc when sorting out the bug. It'll also help out if this feature is ever brought up as a candidate for deprecation.
Yes, it's due compatibility. I think even fedora has packages in separate directory. You can easily create repodata like that with following steps:
- cd repodir
- mkdir packages
- download some rpms to "packages"
- run createrepo_c .
and in output primary.xml, packages locations will be "packages/*"
Updated by Ichimonji10 over 8 years ago
- May
packages_directory
may be passed at publish-time via theoverride_config
option?I always assumed, any distributor's option can be passed to override_config. So answer is yes.
OK.
- May
packages_directory
specify a parent directory? That is, may one pass a value such as../foo
?That's probably something that depends on implementation in yum/dnf. We don't need this to be possible.
I'd advocate for this not being possible. If a user passes a path of ../2000000000.00
, then they've got an interesting situation on their hands. You can imagine what else a user might do with relative paths.
- May
packages_directory
specify multiple path components? That is, may one pass a value such asfoo/bar
or./foo/bar/
?Same as above. I will try to get statement from packaging team today.
Thank you.
- What is the rationale for introducing this option? Is this being introduced so that RCM's workflow doesn't break? Because we're trying to reach parity with an existing RPM-management tool? Because it's neat? Knowing why we're introducing this option will help out when bugs are inevitably filed against the feature: we'll be able to consult with the original stakeholder/document/specification/tool/etc when sorting out the bug. It'll also help out if this feature is ever brought up as a candidate for deprecation.
Yes, it's due compatibility. I think even fedora has packages in separate directory. You can easily create repodata like that with following steps:
- cd repodir
- mkdir packages
- download some rpms to "packages"
- run createrepo_c .and in output primary.xml, packages locations will be "packages/*"
What are the exact commands that Pulp should be compatible with? Here's what I tried:
mkdir -p myrepo/packages
cd myrepo/packages
wget \
https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/bear-4.1-1.noarch.rpm \
https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/camel-0.1-1.noarch.rpm \
https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/cat-1.0-1.noarch.rpm
cd .. # go to myrepo
createrepo_c packages
This was the final result:
# tree myrepo
myrepo
└── packages
├── bear-4.1-1.noarch.rpm
├── camel-0.1-1.noarch.rpm
├── cat-1.0-1.noarch.rpm
└── repodata
├── 2549caec4b86ab2b41d949d7f2fcf0c8a76227ec75abe52b8b8f8cda0404e47e-other.xml.gz
├── 2785256942bd4c8272c270dc24cf87705b2a51b33e0cd48821ceb3e8f4115c40-primary.xml.gz
├── 2ae9d62070e80329023358fa1f42a2cf0efdf54093c1d280c37b8d7e82362f3d-filelists.xml.gz
├── 6579534a6d9950cf921b315b27823c4ce7019b5ece186c4f1505d6727f48aab0-filelists.sqlite.bz2
├── e9b13712c3b81d38bdc6c8d3e5211c81cff402fcf89e665e05c0baa6ee8618d3-other.sqlite.bz2
├── ff252618bc3be91c8ebd5cfd460b0e972ab6cf79a89762128dee56882561793d-primary.sqlite.bz2
└── repomd.xml
2 directories, 10 files
Updated by jluza over 8 years ago
Ichimonji10 wrote:
- What is the rationale for introducing this option? Is this being introduced so that RCM's workflow doesn't break? Because we're trying to reach parity with an existing RPM-management tool? Because it's neat? Knowing why we're introducing this option will help out when bugs are inevitably filed against the feature: we'll be able to consult with the original stakeholder/document/specification/tool/etc when sorting out the bug. It'll also help out if this feature is ever brought up as a candidate for deprecation.
Yes, it's due compatibility. I think even fedora has packages in separate directory. You can easily create repodata like that with following steps:
- cd repodir
- mkdir packages
- download some rpms to "packages"
- run createrepo_c .and in output primary.xml, packages locations will be "packages/*"
What are the exact commands that Pulp should be compatible with? Here's what I tried:
[...]
This was the final result:
[...]
I forgot to mention you need to run createrepo_c ., not createrepo_c packages
Updated by Ichimonji10 over 8 years ago
What are the exact commands that Pulp should be compatible with? Here's what I tried: [...]
I forgot to mention you need to run createrepo_c ., not createrepo_c packages
Aha. It looks like you wrote that above, and I just misread things. With that in mind, this script:
mkdir -p myrepo/packages
cd myrepo/packages
wget \
https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/bear-4.1-1.noarch.rpm \
https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/camel-0.1-1.noarch.rpm \
https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/cat-1.0-1.noarch.rpm
cd .. # go to myrepo
createrepo_c .
...produces this result:
$ tree myrepo/
myrepo/
├── packages
│ ├── bear-4.1-1.noarch.rpm
│ ├── camel-0.1-1.noarch.rpm
│ └── cat-1.0-1.noarch.rpm
└── repodata
├── 2549caec4b86ab2b41d949d7f2fcf0c8a76227ec75abe52b8b8f8cda0404e47e-other.xml.gz
├── 2ae9d62070e80329023358fa1f42a2cf0efdf54093c1d280c37b8d7e82362f3d-filelists.xml.gz
├── 6579534a6d9950cf921b315b27823c4ce7019b5ece186c4f1505d6727f48aab0-filelists.sqlite.bz2
├── 89c3b359eee148e584e63bbffa1d388c0361d45d94cb0921c7ee03d246276542-primary.sqlite.bz2
├── 8e7ba82c2f7e1eee8251f1a93c15959b083e032912d0549364709453584a9512-primary.xml.gz
├── e9b13712c3b81d38bdc6c8d3e5211c81cff402fcf89e665e05c0baa6ee8618d3-other.sqlite.bz2
└── repomd.xml
For the record, the …primary.xml.gz
file has lines like this:
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://linux.duke.edu/metadata/common" xmlns:rpm="http://linux.duke.edu/metadata/rpm" packages="3">
<package type="rpm">
<location href="packages/bear-4.1-1.noarch.rpm"/>
I think that's enough information to confidently make an initial Pulp Smash test.
Updated by jluza over 8 years ago
btw, I talked to dnf guys and it's possible to have ../packages/ ,all/packages/ in location and maybe it's even possible to have absolute path in location. But I need to confirm that last part with tmlcoch.
Updated by bmbouter over 8 years ago
- Platform Release changed from 2.9.0 to 2.10.0
The code merged to 2.9-dev only uses the specified directory for Distribution unit types (not all RPMs). As such it not match the expectations for the feature. This is being removed from the docs and moved to 2.10 so we can try to get it right before releasing it.
Updated by bmbouter over 8 years ago
- Platform Release deleted (
2.10.0)
Unsetting Target Release so that it can be set once code is merged.
Added by bmbouter over 8 years ago
Revision dc6ba71c | View on GitHub
Removes packages_directory release note
This feature is being pulled from 2.9.0.
Added by bmbouter over 8 years ago
Revision 6145a71b | View on GitHub
Revert "added packages_directory option for yum_distributor"
This reverts commit ac76ec18f3662dc6c95e34f70565240265b6fa2d.
This feature is being pulled so that it can be re-introduced later when it is working.
The docs portain is being negative commited separately so there are no 2.9.x.rst changes in this negative commit.
Updated by Ichimonji10 over 8 years ago
Updated by jcline@redhat.com over 8 years ago
I did a little bit of investigation into this issue. There are two sub-cases for this particular feature which are complicated by the model Pulp uses for distribution trees. In both these cases I will assume the href for the packages in the repository are relative. This is not, as far as I know, required[0], but we will assume it is the case for now.
Pure Yum repository (case 1)
This is the straight-forward case. In this situation, the following needs to happen:
1. The directory structure needs to be built in the root of the repository. The configured path should probably be checked to make sure it's within the repository root.
2. The symlinks need to be placed in that directory structure.
3. The repodata generated must have the ``location`` tag set to properly reference the symlinks. This is, unfortunately, not terribly simple. Pulp currently stores the primary.xml, filelist.xml, and other.xml snippets as strings in the database. If I recall correctly, the only one with a ``location`` tag is the primary.xml. For issue 1618, I believe these were turned into templates that contained placeholders for the checksum fields. We'll have to do something similar here for the location tag (assuming we want to continue down this template path). During the publish the template will have to be rendered before being written to the XML file.
A Distribution Tree with Yum Repository
In the case of a distribution tree, in addition to all the things we need to do for a pure yum repository, we have to account for both the repository setting and the distribution tree ``packagedir``/``packages``[1] setting. To do this, I think that we might have to symlink one to the other if they differ.
[0] That is, you could have metadata living at https://pulp.example.com/somerepo/repodata and RPMs at https://somehost.example.com/PackageDir/ and that's perfectly acceptable and workable. Yum/DNF clients should handle that just fine, I think.
[1] http://release-engineering.github.io/productmd/treeinfo-1.0.html
Updated by jgreguske over 8 years ago
I was asked to provide what RCM requires with respect to this feature. There are 2 use cases.
Number 1¶
In our CDN we include big sha checksums in the file names of repodata so that proxy caches and the caches of our content provider never get in the way of critical updates from the client's perspective. Thus every time the repodata is regenerated, an additional set of repodata files is written to disk beside the originals previously written in the repodata directory. For long standing repositories (RHEL gets updates for 13 years) this file list can become quite large. We found it prudent to exclude it from the RPMs which correspondingly get quite large too, and we have hit filesytem limitations where we exceeded the number of files allowed in a directory. That is how we ended up with this simple structure:
10:37 AM zyzyx@thrull:/tmp $ tree -d os
os
├── Packages
└── repodata
I'm not sure it is technically needed at this point, but many parts of our infrastructure assume this layout, so as usual legacy is dictating a feature to us.
Number 2¶
We need separate subdirectories to store RPMs because of filesystem limitations in the Fedora space too. Specifically, the mirrors we sync Fedora content to have a variety of different filesystems in place including silly ones like AFS in some universities. The number of files that can be present in a directory is limited, thus we often separate RPMs into different subdirectories by the first letter of the RPM. It looks like this:
10:39 AM zyzyx@thrull:/tmp $ tree -d os
os
├── Packages
│ ├── 3
│ ├── a
│ ├── b
[...]
│ ├── x
│ ├── y
│ └── z
└── repodata
The RPM subdirectories should always start as peers of the repodata directory. They're always relative and the href links in the repodata should never reach outside of the root of the repository. yum/dnf support that, but RCM does not need it. I recommend against supporting it for security reasons.
Hope this clarifies things.
Updated by bmbouter over 8 years ago
- Status changed from ASSIGNED to NEW
- Assignee deleted (
jluza)
Moving back to NEW after IRC convo w/ jluza
Updated by bmbouter over 8 years ago
Would making a repo publish as described in Comment 32 in 2.11 by default be a backwards incompatible change? I can see it both ways. One thing is that the repodata specifies where content lives so I think the real question is: is fetching published content without using repodata a valid use case?
If we did make it publish with this new layout for 2.11 by default each repo that existed before and was published again would be a non-incremental publish. We could handle this without much fuss using some internal bookeeping on if repos had been published with the new layout or not.
Updated by semyers over 8 years ago
bmbouter wrote:
Would making a repo publish as described in Comment 32 in 2.11 by default be a backwards incompatible change? I can see it both ways. One thing is that the repodata specifies where content lives so I think the real question is: is fetching published content without using repodata a valid use case?
Comment #32 describes two methods of publishing content, and both include repodata. Can you elaborate on what you mean here?
Updated by jsherril@redhat.com over 8 years ago
i don't think its considered a backwards incompatible change. Pulp is publishing a yum repo, and a yum repo doesn't dictate any sort of structure (as long as the metadata is correct). I don't think fetching rpms directly via a given url should be 'supported' in a backwards compatible way
I also would love having the a,b,c,d directory structure. +1 to that.
Updated by bmbouter over 8 years ago
semyers wrote:
bmbouter wrote:
Would making a repo publish as described in Comment 32 in 2.11 by default be a backwards incompatible change? I can see it both ways. One thing is that the repodata specifies where content lives so I think the real question is: is fetching published content without using repodata a valid use case?
Comment #32 describes two methods of publishing content, and both include repodata. Can you elaborate on what you mean here?
I had imagined that use case 1 and 2 in Comment 32 would be used together at all times. Use case 2 looks very clear, but use case 1 I may not fully understand.
To restate my understanding of the use cases
- use case 1 amounts to "don't put rpms in the root of the repo. Instead have a directory called Packages".
- use case 2 would be "don't put rpms directly into the Packages directory, organize them into folders by the first name of the rpm"
Updated by semyers over 8 years ago
That makes more sense, thanks. I think if we stick to the current behavior as a default, but support other layouts as a config option, we could potentially have our cake and eat it too, so to speak. That said, I agree with jsherrill that the layout of the repositories we create is probably safe to change at any time, provided yum/dnf/zypper/pulp/whatever can pull the content from it.
Updated by bmbouter over 8 years ago
@jgreguske can you confirm that my understanding in Comment 39 is correct?
Also the original title suggests that the name 'Package' can be provided as an option. Is that a requirement?
Lastly, would you want this behavior in all publishes or in some kind of optional way? If optional, would you want use case 1 and use case 2 to be enabled separately or always together?
Updated by jgreguske about 8 years ago
Sorry for the late response. I circulated this with the RCM team leads and have some answers:
- We will only ever use the subdirectory name "Packages", and that is a requirement. RCM does not need that to be configurable to have other names like "SRPMS" or "RPMS" or whatever.
- We do require support of "Packages/x" where x is a bunch of lower case characters that match the first character in an rpm file name. It is also a requirement that they be lower case (or numbers in rare cases like 389).
Updated by mhrivnak about 8 years ago
- Subject changed from As user, I can specify directory where all rpms will be published to As user, I can have packages sorted in "Packages/$x" directories when published
- Description updated (diff)
- Sprint Candidate changed from No to Yes
Updated by mhrivnak about 8 years ago
- Blocks Story #2064: Add pulp-admin support for the 'packages_directory' option added
Updated by bmbouter about 8 years ago
- Description updated (diff)
- Sprint/Milestone set to 29
- Tags deleted (
2.11)
Updating the bug after some discussion. Adding to Sprint 11 based on this update.
Updated by bmbouter about 8 years ago
- Blocks deleted (Story #2064: Add pulp-admin support for the 'packages_directory' option)
Updated by fdobrovo about 8 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to fdobrovo
Updated by bmbouter about 8 years ago
We need to ensure that the first publish of each repo after upgrade is full publish and not an incremental one. Otherwise the new metadata provided by the migration will not point to the original package layout which would be preserved by the incremental publish.
I'm adding a checklist item for us to require that the first publish of a repo that was published in 2.11 or earlier set the force_full flag upon the first publish of that repo.
[update: let's not do ^]
Updated by mhrivnak about 8 years ago
I was thinking that an incremental publish might work out ok with the new logic, but you'd get a mix of layouts. All rpms published previous to the upgrade would have their symlinks in the root of the repo, and the location in primary.xml would reference them correctly.
On an incremental publish post-upgrade, newly-added rpms would have symlinks created in the new layout, and new entries added to primary.xml would reference the new and correct locations for those rpms.
It might look confusing to users, which itself might be a good reason to do full publishes, but I think the output would likely be correct and usable either way.
Updated by bmbouter about 8 years ago
mhrivnak: oh yes I can see what you are saying. I reverted my changes on the checklist.
Updated by bmbouter about 8 years ago
I posted a blog post on this change here: http://pulpproject.org/2016/11/14/yum-repo-layout-changes/
Updated by ipanova@redhat.com about 8 years ago
- Status changed from ASSIGNED to POST
Updated by ttereshc almost 8 years ago
Pulp-smash issue: https://github.com/PulpQE/pulp-smash/issues/472
Updated by dkliban@redhat.com almost 8 years ago
Here are the changes needed in platform to make the RPM rsync distributor consistent with the new repository layout generated by yum distributor.
Updated by fdobrovo almost 8 years ago
New PR against master branch https://github.com/pulp/pulp_rpm/pull/1016
Added by fdobrovo almost 8 years ago
Revision e154cb6d | View on GitHub
Change publish layout to Packages/[a-z]/*
Updated by fdobrovo almost 8 years ago
- Status changed from POST to MODIFIED
Applied in changeset e154cb6d63d5b518a8d82e969a2e231f4070aeb9.
Added by dkliban@redhat.com almost 8 years ago
Revision 97259ea1 | View on GitHub
Problem: RSyncFastForwardUnitPublishStep publishes incorrectly
Solution: Enable RSyncFastForwardUnitPublishStep to publish the same type of units into different directories by extracting the publish path of a unit from it's symlink location.
This problem was introduced when the yum distributor was modified to publish repositories with a new layout. This patch makes the RPM rsync distributor consistent with the yum distributor.
Added by dkliban@redhat.com almost 8 years ago
Revision 97259ea1 | View on GitHub
Problem: RSyncFastForwardUnitPublishStep publishes incorrectly
Solution: Enable RSyncFastForwardUnitPublishStep to publish the same type of units into different directories by extracting the publish path of a unit from it's symlink location.
This problem was introduced when the yum distributor was modified to publish repositories with a new layout. This patch makes the RPM rsync distributor consistent with the yum distributor.
Updated by semyers almost 8 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE
added packages_directory option for yum_distributor
fixes #1976