Project

Profile

Help

Story #136

closed

As a user, I can upload wheel packages to Pulp

Added by rbarlow almost 10 years ago. Updated over 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Sprint/Milestone:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Platform Release:
2.13.0
Target Release - Python:
2.0.0
Groomed:
Yes
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

Pulp only allows sdist packages to be uploaded currently. We should add support for wheel (bdist) packages as well. Deliverables:

  • CLI Support for wheel packages
  • Importer support for uploaded wheel packages
  • Add a wheel example to the getting started guide in the documentation
  • Write release notes
  • Write tests

Note that #135 has the task of creating the wheel type, so it must be done first.


Related issues

Blocked by Python Support - Story #135: As a user, I can synchronize wheels from PyPICLOSED - CURRENTRELEASEamacdona@redhat.com

Actions
Blocked by Python Support - Issue #2561: pulp_python 2.0 new features are not documentedCLOSED - CURRENTRELEASEamacdona@redhat.comActions
Actions #1

Updated by rbarlow over 9 years ago

  • Subject changed from As a user, I can upload wheel packages to Pulp to As a user, I can use wheel packages with Pulp
  • Description updated (diff)
Actions #2

Updated by rbarlow over 9 years ago

  • Subject changed from As a user, I can use wheel packages with Pulp to As a user, I can upload wheel packages to Pulp
  • Description updated (diff)
Actions #3

Updated by rbarlow over 9 years ago

  • Sprint Candidate set to Yes
Actions #4

Updated by mhrivnak over 9 years ago

  • Groomed set to Yes
Actions #5

Updated by mhrivnak over 8 years ago

  • Sprint Candidate changed from Yes to No
Actions #6

Updated by amacdona@redhat.com over 8 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to amacdona@redhat.com
Actions #7

Updated by amacdona@redhat.com over 8 years ago

  • Description updated (diff)

Description of current upload process

The root of the problem is that our upload API functions this way:

1. We make an API call to initialize an upload which returns the `upload_id`. This call 
   does not require or accept any additional information. Internally Pulp creates a file
   in the working directory, filename = upload_id. `upload_id` is a randomly generated uuid
   and it contains no information about itself, specifically, there is no file extension.
2. The `upload_id` is used by additional API calls that contain the actual bits. The upload
   id is used to tell core Pulp where to put the bits. Note: Because the `upload_id` 
   parameter is included in the url, it cannot have '/' in it.

Problem 1:

I am using twine to pull the metadata out of the a Python package, however twine relies on
the filename in order to know how to process that metadata. Since the filename is a uuid without
an extension, twine cannot do this.

Problem 2:

Python packages of type wheel can be created specifically for certain architectures, so name and
version are no longer enough to be the unit key. Instead, I have chosen filename because the spec
demonstrates that filenames must be unique. Current Python upload code rebuilds the filename from
metadata. Though this continues to be possible, adding the architectures significantly increases
the complexity of building the filename and increases the risk that we will generate these
filenames differently.

Possible solutions:

We could add an optional parameter of the body of the API call to initialize an upload that
contains the filename. Because the first call does nothing but create a file and return the
upload id (which is just the file name), we can either:

  • Keep the filename of the file we are uploading by making the upload_id/filename of the temporary file the same as the original filename.
  • We could write the filename into the file so it can be read later. Once the upload is complete, we could mv or copy the file to its original filename (or just add the appropriate extension).

Unfortunately, the devil will be in the details so I need to explain implementation for each of
these possibilities.

Solution 1

Python filenames are unique, but I am not sure that we can say the same thing about every
plugin. If filenames are not unique, it is possible that we could simultaneously upload
bits into the same file. If it is a requirement to avoid collisions between filenames at
upload time, we could work around the problem by creating a directory (perhaps the uuid) with the
filename inside of it. If we take this approach, the upload_id will need to be the relative path
inside of the working directory, but it must be urlencoded to prevent '/' from breaking the
upload_id parameter in the url. When the upload is finished, we know can decode the path, split at
'/' and take the last part to find the file name.

There are a lot of places that upload_id is used, and we would need to really explore that to make
sure that this wouldn't break something.

Solution 2

If we read the filename, when it comes time to initialize a unit for the file, we can change the
filename just before it is passedd to twine. This would probably have fewer places it could break,
but seems even more hacky and this behavior would be very surprising to any developers who were
trying to fix bugs.

Conclusions

This is messy, and all solutions that I have explored so far are fragile. Additionally, I think
that they will be unusual solutions to common problems that rely on surprising behaviors. In my
opinion, a better path forward would be to create a new upload api as a part of the next release,
depricating our old upload API. This needs to be done anyway, but if we created new endpoints, we
could get this done before we go to Pulp 3.0. Honestly, this is ideal anyway because this allows
us to depricate before we drop. This approach would continue to conform to semantic versioning
because the only plugin that would require the new API and would no longer work with the old API
would be the Python plugin, which we are already planning on bumping to 2.0 for the next release.

Actions #8

Updated by amacdona@redhat.com over 8 years ago

  • Description updated (diff)
Actions #9

Updated by bmbouter over 8 years ago

I've written up some stories on redesigning the upload API, but for now we should make it work on the 2.y line.

Keeping the filename out of the data file would be good. Maybe we could allow the filename to be added to the import call? After the data is delivered the file is imported via POST /pulp/api/v2/repositories/<repoid>/actions/import_upload/

Perhaps that could pass in the argument as a parameter to the python upload code, or maybe it could rename the file in the directory before calling into the import code.

Actions #10

Updated by mhrivnak over 8 years ago

I suggested the same thing in an in-person discussion. Especially since the filename is in the unit key, and the unit key is a required parameter for the import_upload call, I think the problem is solved.

Added by Austin Macdonald over 8 years ago

Revision 5afab5b6 | View on GitHub

Refactor to process uploaded files with twine

By generating the unit key (which for Python is the filename), we are able to rename the tempororary file (the filename is a random uuid) to the actual filename. This allows twine to determine the file type so it can be processed accordingly.

closes #136

Actions #11

Updated by amacdona@redhat.com over 8 years ago

  • Status changed from ASSIGNED to POST
Actions #12

Updated by Anonymous over 8 years ago

  • Status changed from POST to MODIFIED
  • % Done changed from 0 to 100
Actions #13

Updated by semyers over 8 years ago

  • Status changed from MODIFIED to POST

Added by Austin Macdonald about 8 years ago

Revision f63de8d2 | View on GitHub

Refactor to process uploaded files with twine

By generating the unit key (which for Python is the filename), we are able to rename the tempororary file (the filename is a random uuid) to the actual filename. This allows twine to determine the file type so it can be processed accordingly.

closes #136

Actions #14

Updated by Anonymous about 8 years ago

  • Status changed from POST to MODIFIED
Actions #15

Updated by semyers about 8 years ago

  • Platform Release set to 2.12.0
Actions #16

Updated by semyers almost 8 years ago

  • Status changed from MODIFIED to 5
Actions #18

Updated by semyers almost 8 years ago

  • Status changed from 5 to MODIFIED
  • Platform Release deleted (2.12.0)
  • Target Release - Python set to 2.0.0

This issue has been removed from the 2.12.0 release, and returned to its MODIFIED state for inclusion in a future release of Pulp.

Actions #19

Updated by semyers almost 8 years ago

  • Blocked by Issue #2561: pulp_python 2.0 new features are not documented added
Actions #20

Updated by semyers almost 8 years ago

  • Status changed from MODIFIED to POST

I'm moving this back to POST to flag this bug as untestable in pulp-smash. This status is also appropriate because while the changes are merged to the 2.0-dev branch of pulp_python, that entire branch was deemed unreleasable for pulp 2.12, and the situation has not improved since then.

Actions #21

Updated by semyers almost 8 years ago

  • Status changed from POST to MODIFIED

2.0-dev is now being included in Platform 2.13 builds again, returning this to a testable state. Note that any smash tests related to this issue should not be run prior to 2.13.0; this is tracked in pulp-smash issue https://github.com/PulpQE/pulp-smash/issues/588.

Actions #22

Updated by pcreech over 7 years ago

  • Platform Release set to 2.13.0
Actions #23

Updated by pcreech over 7 years ago

  • Status changed from MODIFIED to 5
Actions #24

Updated by pcreech over 7 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #25

Updated by bmbouter over 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF