Project

Profile

Help

Story #1976

As user, I can have packages sorted in "Packages/$x" directories when published

Added by jluza about 4 years ago. Updated over 1 year ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
2.12.0
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Pulp 2
Sprint:
Sprint 12

Description

See comment 32 for the use case requirements. See comment 42 for a clarification on the requirements.


Checklist

Associated revisions

Revision ac76ec18 View on GitHub
Added by jluza about 4 years ago

added packages_directory option for yum_distributor

fixes #1976

Revision dc6ba71c View on GitHub
Added by bmbouter about 4 years ago

Removes packages_directory release note

This feature is being pulled from 2.9.0.

https://pulp.plan.io/issues/1976 re #1976

Revision 6145a71b View on GitHub
Added by bmbouter about 4 years ago

Revert "added packages_directory option for yum_distributor"

This reverts commit ac76ec18f3662dc6c95e34f70565240265b6fa2d.

This feature is being pulled so that it can be re-introduced later when it is working.

The docs portain is being negative commited separately so there are no 2.9.x.rst changes in this negative commit.

https://pulp.plan.io/issues/1976 re #1976

Revision e154cb6d View on GitHub
Added by fdobrovo over 3 years ago

Change publish layout to Packages/[a-z]/*

closes #1976 https://pulp.plan.io/issues/1976

Revision 97259ea1 View on GitHub
Added by dkliban@redhat.com over 3 years ago

Problem: RSyncFastForwardUnitPublishStep publishes incorrectly

Solution: Enable RSyncFastForwardUnitPublishStep to publish the same type of units into different directories by extracting the publish path of a unit from it's symlink location.

This problem was introduced when the yum distributor was modified to publish repositories with a new layout. This patch makes the RPM rsync distributor consistent with the yum distributor.

re #1976 https://pulp.plan.io/issues/1976

Revision 97259ea1 View on GitHub
Added by dkliban@redhat.com over 3 years ago

Problem: RSyncFastForwardUnitPublishStep publishes incorrectly

Solution: Enable RSyncFastForwardUnitPublishStep to publish the same type of units into different directories by extracting the publish path of a unit from it's symlink location.

This problem was introduced when the yum distributor was modified to publish repositories with a new layout. This patch makes the RPM rsync distributor consistent with the yum distributor.

re #1976 https://pulp.plan.io/issues/1976

History

#1 Updated by bmbouter about 4 years ago

Can more detail be put into this? There are some docs written on how to write good stories here[0]. To keep the discussion going, here are some questions I came up with:

This is for the yum_distributor, right?
What would the option be and can a more formal description be written for the behavior it provides?
What would the CLI options be called?

[0]: http://pulp.readthedocs.io/en/latest/dev-guide/contributing/features.html

#2 Updated by jluza about 4 years ago

Ok, let me answer/edit description here in comment, because I don't have permissions to do that in regular way.
User Identification: arbitrary user with push permissions. Or owner of the repository but I suppose anyone who is able to modify the repository, should be able to do this.

Context
There was decision made in the past which led to this split (repodata and packages in separate directory) and that's why we have to keep this structure compatible. Although we have some repositories where repodata are in the same directory as packages. I don't know historical details, but as I mentioned above, fedora repositories also use this.

Acceptance Criteria
- attribute has to be optional
- attribute has to be in format that is allowed by mkdir
- if attribute is not set, packages have to be sync to relative_path of repository

#4 Updated by bmbouter about 4 years ago

jluza you should have edit rights on issues. I believe you need to click 'Edit' and then next to the label 'Descrpition' is an icon of a pencil. Clicking the pencil icon will show you the editable description textbox.

#5 Updated by jluza about 4 years ago

  • Assignee set to jluza

Ah! You were right :-).

#6 Updated by jluza about 4 years ago

  • Status changed from NEW to POST

#7 Updated by bmbouter about 4 years ago

  • Subject changed from as user, I can specify directory where all rpms will be published to As user, I can specify directory where all rpms will be published

jluza would you be willing to write one or more pulp-smash issues[0] on how to test this?

[0]: https://github.com/PulpQE/pulp-smash/issues

#8 Updated by mhrivnak about 4 years ago

  • Project changed from Pulp to RPM Support

#10 Updated by jluza about 4 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100

#11 Updated by pthomas@redhat.com about 4 years ago

I have created a pulp-smash issue. But need to add specific test cases around this

https://github.com/PulpQE/pulp-smash/issues/300

#12 Updated by pthomas@redhat.com about 4 years ago

The CLI option doesn't seem to be available for package directory

[root@ibm-x3550m3-11 ~]# rpm -qa |grep pulp-server
pulp-server-2.9.0-0.3.beta.el7.noarch
[root@ibm-x3550m3-11 ~]#

[root@ibm-x3550m3-11 ~]# pulp-admin rpm repo update --repo-id rhel7 --packages_directory /tmp/
[root@ibm-x3550m3-11 ~]# pulp-admin rpm repo update --repo-id rhel7 --packages_directory /tmp/
Command: update
Description: changes metadata on an existing repository

Available Arguments:

  --bg           - if specified, the client process will end immediately (the
                   task will continue to run on the server)
  --repo-id      - (required) unique identifier; only alphanumeric, ., -, and _
                   allowed
  --display-name - user-readable display name (may contain i18n characters)
  --description  - user-readable description (may contain i18n characters)
  --note         - adds/updates/deletes notes to programmatically identify the
                   resource; key-value pairs must be separated by an equal sign
                   (e.g. key=value); multiple notes can be changed by specifying
                   this option multiple times; notes are deleted by specifying
                   "" as the value

Synchronization
  --feed     - URL of the external source repository to sync
  --validate - if "true", the size and checksum of each synchronized file will
               be verified against the repo metadata
  --skip     - comma-separated list of types to omit when synchronizing, if not
               specified all types will be synchronized; valid values are: rpm,
               drpm, distribution, erratum

Feed SSL
  --feed-ca-cert    - full path to the CA certificate that should be used to
                      verify the external repo server's SSL certificate
  --verify-feed-ssl - if "true", the feed's SSL certificate will be verified
                      against the feed_ca_cert
  --feed-cert       - full path to the certificate to use for authorization when
                      accessing the external feed
  --feed-key        - full path to the private key for feed_cert

Feed Proxy
  --proxy-host - proxy server url to use
  --proxy-port - port on the proxy server to make requests
  --proxy-user - username used to authenticate with the proxy server
  --proxy-pass - password used to authenticate with the proxy server

Basic Authentication
  --basicauth-user - username used to authenticate with sync location via HTTP
                     basic auth
  --basicauth-pass - password used to authenticate with sync location via HTTP
                     basic auth

Throttling
  --max-downloads - maximum number of downloads that will run concurrently
  --max-speed     - maximum bandwidth used per download thread, in bytes/sec,
                    when synchronizing the repo

Repository Content Behavior
  --remove-missing   - if "true", units that were previously in the external
                       feed but are no longer found will be removed from the
                       repository; defaults to false
  --retain-old-count - count indicating how many non-latest versions of a unit
                       to keep in a repository

Download Policy
  --download-policy - content downloading policy (immediate | background |
                      on_demand)

Publishing
  --relative-url    - relative path the repository will be served from. Only
                      alphanumeric characters, forward slashes, underscores and
                      dashes are allowed. It defaults to the relative path of
                      the feed URL
  --serve-http      - if "true", the repository will be served over HTTP;
                      defaults to false
  --serve-https     - if "true", the repository will be served over HTTPS;
                      defaults to true
  --checksum-type   - type of checksum to use during metadata generation
  --gpg-key         - GPG key used to sign and verify packages in the repository
  --generate-sqlite - if "true", sqlite files will be generated for the
                      repository metadata during publish
  --repoview        - if "true", static HTML files will be generated during
                      publish by the repoview tool for faster browsing of the
                      repository. Enables --generate-sqlite flag.

Consumer Authentication
  --host-ca   - full path to the CA certificate that signed the repository
                hosts's SSL certificate when serving over HTTPS
  --auth-ca   - full path to the CA certificate that should be used to verify
                client authentication certificates; setting this turns on client
                authentication for the repository
  --auth-cert - full path to the entitlement certificate that will be given to
                bound consumers to grant access to this repository

The following options were specified but do not exist on the command:
  --packages_directory

[root@ibm-x3550m3-11 ~]# pulp-admin rpm repo create
Command: create
Description: creates a new repository

Available Arguments:

  --repo-id      - (required) unique identifier; only alphanumeric, ., -, and _
                   allowed
  --display-name - user-readable display name (may contain i18n characters)
  --description  - user-readable description (may contain i18n characters)
  --note         - adds/updates/deletes notes to programmatically identify the
                   resource; key-value pairs must be separated by an equal sign
                   (e.g. key=value); multiple notes can be changed by specifying
                   this option multiple times; notes are deleted by specifying
                   "" as the value

Synchronization
  --feed     - URL of the external source repository to sync
  --validate - if "true", the size and checksum of each synchronized file will
               be verified against the repo metadata
  --skip     - comma-separated list of types to omit when synchronizing, if not
               specified all types will be synchronized; valid values are: rpm,
               drpm, distribution, erratum

Feed SSL
  --feed-ca-cert    - full path to the CA certificate that should be used to
                      verify the external repo server's SSL certificate
  --verify-feed-ssl - if "true", the feed's SSL certificate will be verified
                      against the feed_ca_cert
  --feed-cert       - full path to the certificate to use for authorization when
                      accessing the external feed
  --feed-key        - full path to the private key for feed_cert

Feed Proxy
  --proxy-host - proxy server url to use
  --proxy-port - port on the proxy server to make requests
  --proxy-user - username used to authenticate with the proxy server
  --proxy-pass - password used to authenticate with the proxy server

Basic Authentication
  --basicauth-user - username used to authenticate with sync location via HTTP
                     basic auth
  --basicauth-pass - password used to authenticate with sync location via HTTP
                     basic auth

Throttling
  --max-downloads - maximum number of downloads that will run concurrently
  --max-speed     - maximum bandwidth used per download thread, in bytes/sec,
                    when synchronizing the repo

Repository Content Behavior
  --remove-missing   - if "true", units that were previously in the external
                       feed but are no longer found will be removed from the
                       repository; defaults to false
  --retain-old-count - count indicating how many non-latest versions of a unit
                       to keep in a repository

Download Policy
  --download-policy - content downloading policy (immediate | background |
                      on_demand)

Publishing
  --relative-url    - relative path the repository will be served from. Only
                      alphanumeric characters, forward slashes, underscores and
                      dashes are allowed. It defaults to the relative path of
                      the feed URL
  --serve-http      - if "true", the repository will be served over HTTP;
                      defaults to false
  --serve-https     - if "true", the repository will be served over HTTPS;
                      defaults to true
  --checksum-type   - type of checksum to use during metadata generation
  --gpg-key         - GPG key used to sign and verify packages in the repository
  --generate-sqlite - if "true", sqlite files will be generated for the
                      repository metadata during publish
  --repoview        - if "true", static HTML files will be generated during
                      publish by the repoview tool for faster browsing of the
                      repository. Enables --generate-sqlite flag.

Consumer Authentication
  --host-ca   - full path to the CA certificate that signed the repository
                hosts's SSL certificate when serving over HTTPS
  --auth-ca   - full path to the CA certificate that should be used to verify
                client authentication certificates; setting this turns on client
                authentication for the repository
  --auth-cert - full path to the entitlement certificate that will be given to
                bound consumers to grant access to this repository

The following options are required but were not specified:
  --repo-id
[root@ibm-x3550m3-11 ~]# 

#13 Updated by pthomas@redhat.com about 4 years ago

  • Status changed from MODIFIED to ASSIGNED

As per my previous comment and further talking with bmbouter, moving it to assigned.

#14 Updated by mhrivnak about 4 years ago

Based on a quick glance at the pull request, I don't see anything that would add an option to pulp-admin.

Assuming that's true, I suggest we make a separate story to add the option to pulp-admin in 2.10, and change the scope of this story to only include the server-side parts.

#15 Updated by semyers about 4 years ago

mhrivnak wrote:

Based on a quick glance at the pull request, I don't see anything that would add an option to pulp-admin.

Assuming that's true, I suggest we make a separate story to add the option to pulp-admin in 2.10, and change the scope of this story to only include the server-side parts.

+1

The functionality is there in 2.9 as far as I can tell, minus pulp-admin support:

$ git tag --contains ac76ec18f3662dc6c95e34f70565240265b6fa2d pulp-rpm-2.9.0-0.1.beta pulp-rpm-2.9.0-0.2.beta pulp-rpm-2.9.0-0.3.beta pulp-rpm-2.9.0-0.4.rc

#16 Updated by bmbouter about 4 years ago

  • Status changed from ASSIGNED to MODIFIED

Per discussion on the bug, I'm moving back to MODIFIED for verification of just the API bits. I've created a new story [0] to track the addition of the pulp-admin options. The story never specified pulp-admin requirements in the description so it didn't require any modification.

[0]: https://pulp.plan.io/issues/2064

#17 Updated by Ichimonji10 about 4 years ago

  • Status changed from MODIFIED to ASSIGNED

It's still unclear to me what is being requested here.

  • May packages_directory be an absolute path, a relative path, or either?
    * If packages_directory is an absolute path, does that mean that one may publish RPMs to anywhere on the Pulp system's filesystem?
    * If packages_directory is a relative path, what is the path relative to? The distributor's relative_url?
  • Are trailing slashes required, not required, or optional?
  • Can this option be specified at publish-time via the override_config option?

I've taken jluza's pull reuqest (https://github.com/PulpQE/pulp-smash/pull/312) so that they're a bit more sane (https://github.com/PulpQE/pulp-smash/compare/master...Ichimonji10:pr/312). Unfortunately, the tests still fail. I have no idea if it's b/c the tests are wrong, or b/c the feature is wrong, or what. Some additional requirements need to be specified out before this feature can even be tested.

#18 Updated by mhrivnak about 4 years ago

It looks like the new config option did not get documented. I think adding documentation to the yum_distributor's config docs would provide what Ichimonji10 is looking for. That's a basic expectation anyway for a new config option.

#19 Updated by bmbouter about 4 years ago

Yes I agree documentation is a basic requirement [0]. I remember seeing them in the PR before, but I can't find the commit where they are present. I think it was lost in a rebase, along with another rebase error that was introduced [1]. Also there is a type on this line which was always there[2].

The best thing we can do now is add the documentation so that there are no code changes which would require us to go from RC back to beta.

[0]: https://github.com/pulp/pulp_rpm/pull/897#issuecomment-224296088
[1]: https://github.com/pulp/pulp_rpm/pull/897/files#diff-00f13f6321363e2b071e468d88945cf5R23
[2]: https://github.com/pulp/pulp_rpm/pull/897/files#diff-00f13f6321363e2b071e468d88945cf5R24

#20 Updated by bmbouter about 4 years ago

The docs were definitely here because I acknowledged that he had completed that: https://github.com/pulp/pulp_rpm/pull/897#issuecomment-224594888

Unfortunately those were lost in the rebase.

#21 Updated by jluza about 4 years ago

Ichimonji10 wrote:

It's still unclear to me what is being requested here.

  • May packages_directory be an absolute path, a relative path, or either?
  • If packages_directory is an absolute path, does that mean that one may publish RPMs to anywhere on the Pulp system's filesystem?
  • If packages_directory is a relative path, what is the path relative to? The distributor's relative_url?
  • Are trailing slashes required, not required, or optional?
  • Can this option be specified at publish-time via the override_config option?

I've taken jluza's pull reuqest (https://github.com/PulpQE/pulp-smash/pull/312) so that they're a bit more sane (https://github.com/PulpQE/pulp-smash/compare/master...Ichimonji10:pr/312). Unfortunately, the tests still fail. I have no idea if it's b/c the tests are wrong, or b/c the feature is wrong, or what. Some additional requirements need to be specified out before this feature can even be tested.

The features is called packages "directory", because it should be really only directory, not path. I hope it will be clear from the documentation. However you have point there, because in theory I think it's probably possible to have absolute/relative path as part of destination for packages. I would have to look at repo metadata specifications. I think location for packages can be only relative to the "root" directory of repository.

Is there a way how to determine if attribute can be specified in override_config or not?

packages directory should be os.path.joinable, so no trailing slashes doesn't matter, however string can't start with slash.

#22 Updated by Ichimonji10 about 4 years ago

The features is called packages "directory", because it should be really only directory, not path. [...] in theory I think it's probably possible to have absolute/relative path as part of destination for packages.

How does one identify a directory? With a path:

mkdir foo
mkdir ./bar
mkdir ../biz
mkdir /tmp/baz
mkdir -p one/two
mkdir -p ./three/four/

Are the paths above valid? According to note 2, yes: "attribute has to be in format that is allowed by mkdir."

I would have to look at repo metadata specifications. I think location for packages can be only relative to the "root" directory of repository.

Is there a way how to determine if attribute can be specified in override_config or not?

packages directory should be os.path.joinable, so no trailing slashes doesn't matter, however string can't start with slash.

So, it sounds like:

  • packages_directory must be a relative path. It must not be an absolute path.
  • packages_directory specifies where a distributor should place packages, relative to its relative_url option. (By default, packages are placed in the current directory, relative to the relative_url option.)
  • packages_directory may end with a trailing slash. If present, it is stripped. Directory names with literal trailing slashes are not created.

Outstanding questions include:

  • May packages_directory may be passed at publish-time via the override_config option?
  • May packages_directory specify a parent directory? That is, may one pass a value such as ../foo?
  • May packages_directory specify multiple path components? That is, may one pass a value such as foo/bar or ./foo/bar/?
  • What is the rationale for introducing this option? Is this being introduced so that RCM's workflow doesn't break? Because we're trying to reach parity with an existing RPM-management tool? Because it's neat? Knowing why we're introducing this option will help out when bugs are inevitably filed against the feature: we'll be able to consult with the original stakeholder/document/specification/tool/etc when sorting out the bug. It'll also help out if this feature is ever brought up as a candidate for deprecation.

#23 Updated by jluza about 4 years ago

Ichimonji10 wrote:

Outstanding questions include:

  • May packages_directory may be passed at publish-time via the override_config option?

I always assumed, any distributor's option can be passed to override_config. So answer is yes.

  • May packages_directory specify a parent directory? That is, may one pass a value such as ../foo?

That's probably something that depends on implementation in yum/dnf. We don't need this to be possible.

  • May packages_directory specify multiple path components? That is, may one pass a value such as foo/bar or ./foo/bar/?

Same as above. I will try to get statement from packaging team today.

  • What is the rationale for introducing this option? Is this being introduced so that RCM's workflow doesn't break? Because we're trying to reach parity with an existing RPM-management tool? Because it's neat? Knowing why we're introducing this option will help out when bugs are inevitably filed against the feature: we'll be able to consult with the original stakeholder/document/specification/tool/etc when sorting out the bug. It'll also help out if this feature is ever brought up as a candidate for deprecation.

Yes, it's due compatibility. I think even fedora has packages in separate directory. You can easily create repodata like that with following steps:
- cd repodir
- mkdir packages
- download some rpms to "packages"
- run createrepo_c .

and in output primary.xml, packages locations will be "packages/*"

#24 Updated by Ichimonji10 about 4 years ago

  • May packages_directory may be passed at publish-time via the override_config option?

I always assumed, any distributor's option can be passed to override_config. So answer is yes.

OK.

  • May packages_directory specify a parent directory? That is, may one pass a value such as ../foo?

That's probably something that depends on implementation in yum/dnf. We don't need this to be possible.

I'd advocate for this not being possible. If a user passes a path of ../2000000000.00, then they've got an interesting situation on their hands. You can imagine what else a user might do with relative paths.

  • May packages_directory specify multiple path components? That is, may one pass a value such as foo/bar or ./foo/bar/?

Same as above. I will try to get statement from packaging team today.

Thank you.

  • What is the rationale for introducing this option? Is this being introduced so that RCM's workflow doesn't break? Because we're trying to reach parity with an existing RPM-management tool? Because it's neat? Knowing why we're introducing this option will help out when bugs are inevitably filed against the feature: we'll be able to consult with the original stakeholder/document/specification/tool/etc when sorting out the bug. It'll also help out if this feature is ever brought up as a candidate for deprecation.

Yes, it's due compatibility. I think even fedora has packages in separate directory. You can easily create repodata like that with following steps:
- cd repodir
- mkdir packages
- download some rpms to "packages"
- run createrepo_c .

and in output primary.xml, packages locations will be "packages/*"

What are the exact commands that Pulp should be compatible with? Here's what I tried:

mkdir -p myrepo/packages
cd myrepo/packages
wget \
  https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/bear-4.1-1.noarch.rpm \
  https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/camel-0.1-1.noarch.rpm \
  https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/cat-1.0-1.noarch.rpm
cd ..  # go to myrepo
createrepo_c packages

This was the final result:

# tree myrepo
myrepo
└── packages
    ├── bear-4.1-1.noarch.rpm
    ├── camel-0.1-1.noarch.rpm
    ├── cat-1.0-1.noarch.rpm
    └── repodata
        ├── 2549caec4b86ab2b41d949d7f2fcf0c8a76227ec75abe52b8b8f8cda0404e47e-other.xml.gz
        ├── 2785256942bd4c8272c270dc24cf87705b2a51b33e0cd48821ceb3e8f4115c40-primary.xml.gz
        ├── 2ae9d62070e80329023358fa1f42a2cf0efdf54093c1d280c37b8d7e82362f3d-filelists.xml.gz
        ├── 6579534a6d9950cf921b315b27823c4ce7019b5ece186c4f1505d6727f48aab0-filelists.sqlite.bz2
        ├── e9b13712c3b81d38bdc6c8d3e5211c81cff402fcf89e665e05c0baa6ee8618d3-other.sqlite.bz2
        ├── ff252618bc3be91c8ebd5cfd460b0e972ab6cf79a89762128dee56882561793d-primary.sqlite.bz2
        └── repomd.xml

2 directories, 10 files

#25 Updated by jluza about 4 years ago

Ichimonji10 wrote:

  • What is the rationale for introducing this option? Is this being introduced so that RCM's workflow doesn't break? Because we're trying to reach parity with an existing RPM-management tool? Because it's neat? Knowing why we're introducing this option will help out when bugs are inevitably filed against the feature: we'll be able to consult with the original stakeholder/document/specification/tool/etc when sorting out the bug. It'll also help out if this feature is ever brought up as a candidate for deprecation.

Yes, it's due compatibility. I think even fedora has packages in separate directory. You can easily create repodata like that with following steps:
- cd repodir
- mkdir packages
- download some rpms to "packages"
- run createrepo_c .

and in output primary.xml, packages locations will be "packages/*"

What are the exact commands that Pulp should be compatible with? Here's what I tried:

[...]

This was the final result:

[...]

I forgot to mention you need to run createrepo_c ., not createrepo_c packages

#26 Updated by Ichimonji10 about 4 years ago

What are the exact commands that Pulp should be compatible with? Here's what I tried: [...]

I forgot to mention you need to run createrepo_c ., not createrepo_c packages

Aha. It looks like you wrote that above, and I just misread things. With that in mind, this script:

mkdir -p myrepo/packages
cd myrepo/packages
wget \
  https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/bear-4.1-1.noarch.rpm \
  https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/camel-0.1-1.noarch.rpm \
  https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm/cat-1.0-1.noarch.rpm
cd ..  # go to myrepo
createrepo_c .

...produces this result:

$ tree myrepo/
myrepo/
├── packages
│   ├── bear-4.1-1.noarch.rpm
│   ├── camel-0.1-1.noarch.rpm
│   └── cat-1.0-1.noarch.rpm
└── repodata
    ├── 2549caec4b86ab2b41d949d7f2fcf0c8a76227ec75abe52b8b8f8cda0404e47e-other.xml.gz
    ├── 2ae9d62070e80329023358fa1f42a2cf0efdf54093c1d280c37b8d7e82362f3d-filelists.xml.gz
    ├── 6579534a6d9950cf921b315b27823c4ce7019b5ece186c4f1505d6727f48aab0-filelists.sqlite.bz2
    ├── 89c3b359eee148e584e63bbffa1d388c0361d45d94cb0921c7ee03d246276542-primary.sqlite.bz2
    ├── 8e7ba82c2f7e1eee8251f1a93c15959b083e032912d0549364709453584a9512-primary.xml.gz
    ├── e9b13712c3b81d38bdc6c8d3e5211c81cff402fcf89e665e05c0baa6ee8618d3-other.sqlite.bz2
    └── repomd.xml

For the record, the …primary.xml.gz file has lines like this:

<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://linux.duke.edu/metadata/common" xmlns:rpm="http://linux.duke.edu/metadata/rpm" packages="3">
<package type="rpm">
  <location href="packages/bear-4.1-1.noarch.rpm"/>

I think that's enough information to confidently make an initial Pulp Smash test.

#27 Updated by jluza about 4 years ago

btw, I talked to dnf guys and it's possible to have ../packages/ ,all/packages/ in location and maybe it's even possible to have absolute path in location. But I need to confirm that last part with tmlcoch.

#28 Updated by bmbouter about 4 years ago

  • Platform Release changed from 2.9.0 to 2.10.0

The code merged to 2.9-dev only uses the specified directory for Distribution unit types (not all RPMs). As such it not match the expectations for the feature. This is being removed from the docs and moved to 2.10 so we can try to get it right before releasing it.

#29 Updated by bmbouter about 4 years ago

  • Platform Release deleted (2.10.0)

Unsetting Target Release so that it can be set once code is merged.

#31 Updated by jcline@redhat.com about 4 years ago

I did a little bit of investigation into this issue. There are two sub-cases for this particular feature which are complicated by the model Pulp uses for distribution trees. In both these cases I will assume the href for the packages in the repository are relative. This is not, as far as I know, required[0], but we will assume it is the case for now.

Pure Yum repository (case 1)

This is the straight-forward case. In this situation, the following needs to happen:

1. The directory structure needs to be built in the root of the repository. The configured path should probably be checked to make sure it's within the repository root.

2. The symlinks need to be placed in that directory structure.

3. The repodata generated must have the ``location`` tag set to properly reference the symlinks. This is, unfortunately, not terribly simple. Pulp currently stores the primary.xml, filelist.xml, and other.xml snippets as strings in the database. If I recall correctly, the only one with a ``location`` tag is the primary.xml. For issue 1618, I believe these were turned into templates that contained placeholders for the checksum fields. We'll have to do something similar here for the location tag (assuming we want to continue down this template path). During the publish the template will have to be rendered before being written to the XML file.

A Distribution Tree with Yum Repository

In the case of a distribution tree, in addition to all the things we need to do for a pure yum repository, we have to account for both the repository setting and the distribution tree ``packagedir``/``packages``[1] setting. To do this, I think that we might have to symlink one to the other if they differ.

[0] That is, you could have metadata living at https://pulp.example.com/somerepo/repodata and RPMs at https://somehost.example.com/PackageDir/ and that's perfectly acceptable and workable. Yum/DNF clients should handle that just fine, I think.

[1] http://release-engineering.github.io/productmd/treeinfo-1.0.html

#32 Updated by jgreguske almost 4 years ago

I was asked to provide what RCM requires with respect to this feature. There are 2 use cases.

Number 1

In our CDN we include big sha checksums in the file names of repodata so that proxy caches and the caches of our content provider never get in the way of critical updates from the client's perspective. Thus every time the repodata is regenerated, an additional set of repodata files is written to disk beside the originals previously written in the repodata directory. For long standing repositories (RHEL gets updates for 13 years) this file list can become quite large. We found it prudent to exclude it from the RPMs which correspondingly get quite large too, and we have hit filesytem limitations where we exceeded the number of files allowed in a directory. That is how we ended up with this simple structure:

10:37 AM zyzyx@thrull:/tmp $ tree -d os
os
├── Packages
└── repodata

I'm not sure it is technically needed at this point, but many parts of our infrastructure assume this layout, so as usual legacy is dictating a feature to us.

Number 2

We need separate subdirectories to store RPMs because of filesystem limitations in the Fedora space too. Specifically, the mirrors we sync Fedora content to have a variety of different filesystems in place including silly ones like AFS in some universities. The number of files that can be present in a directory is limited, thus we often separate RPMs into different subdirectories by the first letter of the RPM. It looks like this:

10:39 AM zyzyx@thrull:/tmp $ tree -d os
os
├── Packages
│   ├── 3
│   ├── a
│   ├── b

  [...]

│   ├── x
│   ├── y
│   └── z
└── repodata

The RPM subdirectories should always start as peers of the repodata directory. They're always relative and the href links in the repodata should never reach outside of the root of the repository. yum/dnf support that, but RCM does not need it. I recommend against supporting it for security reasons.

Hope this clarifies things.

#33 Updated by bmbouter almost 4 years ago

  • Tags 2.11 added

#34 Updated by bmbouter almost 4 years ago

  • Status changed from ASSIGNED to NEW
  • Assignee deleted (jluza)

Moving back to NEW after IRC convo w/ jluza

#35 Updated by bmbouter almost 4 years ago

@jgreguske Thank you for the description

#36 Updated by bmbouter almost 4 years ago

Would making a repo publish as described in Comment 32 in 2.11 by default be a backwards incompatible change? I can see it both ways. One thing is that the repodata specifies where content lives so I think the real question is: is fetching published content without using repodata a valid use case?

If we did make it publish with this new layout for 2.11 by default each repo that existed before and was published again would be a non-incremental publish. We could handle this without much fuss using some internal bookeeping on if repos had been published with the new layout or not.

#37 Updated by semyers almost 4 years ago

bmbouter wrote:

Would making a repo publish as described in Comment 32 in 2.11 by default be a backwards incompatible change? I can see it both ways. One thing is that the repodata specifies where content lives so I think the real question is: is fetching published content without using repodata a valid use case?

Comment #32 describes two methods of publishing content, and both include repodata. Can you elaborate on what you mean here?

#38 Updated by jsherril@redhat.com almost 4 years ago

i don't think its considered a backwards incompatible change. Pulp is publishing a yum repo, and a yum repo doesn't dictate any sort of structure (as long as the metadata is correct). I don't think fetching rpms directly via a given url should be 'supported' in a backwards compatible way

I also would love having the a,b,c,d directory structure. +1 to that.

#39 Updated by bmbouter almost 4 years ago

semyers wrote:

bmbouter wrote:

Would making a repo publish as described in Comment 32 in 2.11 by default be a backwards incompatible change? I can see it both ways. One thing is that the repodata specifies where content lives so I think the real question is: is fetching published content without using repodata a valid use case?

Comment #32 describes two methods of publishing content, and both include repodata. Can you elaborate on what you mean here?

I had imagined that use case 1 and 2 in Comment 32 would be used together at all times. Use case 2 looks very clear, but use case 1 I may not fully understand.

To restate my understanding of the use cases

  • use case 1 amounts to "don't put rpms in the root of the repo. Instead have a directory called Packages".
  • use case 2 would be "don't put rpms directly into the Packages directory, organize them into folders by the first name of the rpm"

#40 Updated by semyers almost 4 years ago

That makes more sense, thanks. I think if we stick to the current behavior as a default, but support other layouts as a config option, we could potentially have our cake and eat it too, so to speak. That said, I agree with jsherrill that the layout of the repositories we create is probably safe to change at any time, provided yum/dnf/zypper/pulp/whatever can pull the content from it.

#41 Updated by bmbouter almost 4 years ago

@jgreguske can you confirm that my understanding in Comment 39 is correct?

Also the original title suggests that the name 'Package' can be provided as an option. Is that a requirement?

Lastly, would you want this behavior in all publishes or in some kind of optional way? If optional, would you want use case 1 and use case 2 to be enabled separately or always together?

#42 Updated by jgreguske almost 4 years ago

Sorry for the late response. I circulated this with the RCM team leads and have some answers:

  • We will only ever use the subdirectory name "Packages", and that is a requirement. RCM does not need that to be configurable to have other names like "SRPMS" or "RPMS" or whatever.
  • We do require support of "Packages/x" where x is a bunch of lower case characters that match the first character in an rpm file name. It is also a requirement that they be lower case (or numbers in rare cases like 389).

#43 Updated by mhrivnak over 3 years ago

  • Checklist item Document the new setting in distributor technical reference added
  • Checklist item Release notes added
  • Subject changed from As user, I can specify directory where all rpms will be published to As user, I can have packages sorted in "Packages/$x" directories when published
  • Description updated (diff)
  • Sprint Candidate changed from No to Yes

#44 Updated by mhrivnak over 3 years ago

  • Blocks Story #2064: Add pulp-admin support for the 'packages_directory' option added

#45 Updated by pcreech over 3 years ago

  • Groomed changed from No to Yes

#46 Updated by bmbouter over 3 years ago

  • Checklist item deleted (Document the new setting in distributor technical reference)
  • Checklist item changed from Release notes to Add release note
  • Checklist item update the publish layout machinery to create the new directory structure added
  • Checklist item write a migration to update existing metadata stored in the db to point their rpms to the new layout added
  • Checklist item update the metadata processing steps to create new metadata stored in the db correctly added
  • Description updated (diff)
  • Sprint/Milestone set to 29
  • Tags deleted (2.11)

Updating the bug after some discussion. Adding to Sprint 11 based on this update.

#47 Updated by bmbouter over 3 years ago

  • Blocks deleted (Story #2064: Add pulp-admin support for the 'packages_directory' option)

#48 Updated by fdobrovo over 3 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to fdobrovo

#49 Updated by bmbouter over 3 years ago

  • Checklist item changed from Add release note to Add release note about feature. Be sure to include the part about the first publish post-upgrade being a full publish.
  • Checklist item ensure that if a repo was published in 2.11 or earlier, the first publish sets the force_full flag added

We need to ensure that the first publish of each repo after upgrade is full publish and not an incremental one. Otherwise the new metadata provided by the migration will not point to the original package layout which would be preserved by the incremental publish.

I'm adding a checklist item for us to require that the first publish of a repo that was published in 2.11 or earlier set the force_full flag upon the first publish of that repo.

[update: let's not do ^]

#50 Updated by mhrivnak over 3 years ago

I was thinking that an incremental publish might work out ok with the new logic, but you'd get a mix of layouts. All rpms published previous to the upgrade would have their symlinks in the root of the repo, and the location in primary.xml would reference them correctly.

On an incremental publish post-upgrade, newly-added rpms would have symlinks created in the new layout, and new entries added to primary.xml would reference the new and correct locations for those rpms.

It might look confusing to users, which itself might be a good reason to do full publishes, but I think the output would likely be correct and usable either way.

#51 Updated by bmbouter over 3 years ago

  • Checklist item deleted (ensure that if a repo was published in 2.11 or earlier, the first publish sets the force_full flag)
  • Checklist item changed from Add release note about feature. Be sure to include the part about the first publish post-upgrade being a full publish. to Add release note about feature.

mhrivnak: oh yes I can see what you are saying. I reverted my changes on the checklist.

#52 Updated by bmbouter over 3 years ago

I posted a blog post on this change here: http://pulpproject.org/2016/11/14/yum-repo-layout-changes/

#53 Updated by mhrivnak over 3 years ago

  • Sprint/Milestone changed from 29 to 30

#54 Updated by ipanova@redhat.com over 3 years ago

  • Status changed from ASSIGNED to POST

#57 Updated by dkliban@redhat.com over 3 years ago

Here are the changes needed in platform to make the RPM rsync distributor consistent with the new repository layout generated by yum distributor.

https://github.com/pulp/pulp/pull/2913

#58 Updated by fdobrovo over 3 years ago

  • Checklist item Add release note about feature. set to Done
  • Checklist item update the publish layout machinery to create the new directory structure set to Done
  • Checklist item write a migration to update existing metadata stored in the db to point their rpms to the new layout set to Done
  • Checklist item update the metadata processing steps to create new metadata stored in the db correctly set to Done

#59 Updated by fdobrovo over 3 years ago

New PR against master branch https://github.com/pulp/pulp_rpm/pull/1016

#61 Updated by fdobrovo over 3 years ago

  • Status changed from POST to MODIFIED

#62 Updated by semyers over 3 years ago

  • Platform Release set to 2.12.0

#63 Updated by semyers over 3 years ago

  • Status changed from MODIFIED to 5

#65 Updated by semyers over 3 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE

#66 Updated by bmbouter over 2 years ago

  • Sprint set to Sprint 12

#67 Updated by bmbouter over 2 years ago

  • Sprint/Milestone deleted (30)

#68 Updated by bmbouter over 1 year ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF