Project

Profile

Help

Refactor #2950

Updated by dkliban@redhat.com over 6 years ago

The Django FileField[0] can be set using either use a file path as a string or an instance variety of File[1]. 

 When storage backends. By default it comes with a string is passed in, the file is considered committed filesystem storage backend. Pulp 3.0 should switch to disk. When saving an instance of a model that is using that FileField, the string that was used to set the file field is stored in the database with the rest of the model fields. 

 When an instance of File is used to set the field, the file is considered not committed. When saving a model instance with a field set that way, a default storage backend is used to save backend. This ticket would remove the file to it's destination defined by 'upload_to' parameter of the FileField. After the file is put in place, it's final path is stored in the database along with other model field values. "custom backend":https://github.com/pulp/pulp/blob/884ae9d7cd20ab612e3a635302c9bd963e07fb47/platform/pulpcore/app/models/storage.py#L103-L365 we currently have. 

 The Artifact model uses a FileField to store only difference I noticed between the path to an Artifact stored in /var/lib/pulp/artifact. The 'upload_to' is a callable that returns a path in /var/lib/pulp/artifact based on the Artifact's sha256 field value. All this leaves the plugin writer with two options when creating an Artifact: move the file into /var/lib/pulp on her own or she can rely on the default storage backend to move the file into the correct location. The latter provides a much simpler experience for plugin writers and a consistent behavior across plugins. 

 The FileSystemStorage[2] backend provided by Django is designed to work with File, TemporaryUploadedFile[3], and InMemoryUploadedFile[4] objects. The latter two inherit from the first. The type of File determines how the storage backend places it in it's final destination. The Artifact upload API always uses a TemporaryUploadedFile, which writes the file to a temporary location on disk during the upload. Below is an outline of how FileSystemStorage saves a file to its destination for the two File types custom one we are interested in: 

 TemporaryUploadedFile 
 ------------------------------ 
     1) suing currently is name available? 
          2a) yes, os.rename() 
                  3a) no exception, you are done 
                  3b) exception, copy from source to destination using python and delete that the original 
          2b) no, append random characters to to the name and go back to 1 

 File 
 ----- 
     1)    serialized Artifact's file field is name available? 
          2a) yes, copy from source to destination using python 
          2b) no, append random characters to to the name and go back to 1 

 The save behavior for TemporaryUploadedFile is the most efficient for Pulp's use case. However, the behavior from 2b can result in duplicate files being created, but not associated with any Artifact. serialized as <hostname>/var/lib/pulp/artifact/ba/asldkfkl instead of file:///var/lib/pulp/ba/askldsj. There is a race condition if the same file is being uploaded in are two parallel requests. One of options for resolving this problem:  

 1) don't return the requests will create the Artifact in /var/lib/pulp/artifact and then save it to the database. The second request will also create a file in /var/lib/pulp/artifacts, but it will have field at all since a random name. The database insert will fail due to uniqueness constraint and the user will receive a 400 error. I don't think any cleanup happens when likely not need this occurs. 

 Due to how os.rename() works, there is also a race condition that would allow an existing information.  
 2) override the file field serializer to be overwritten by a newer one. Since Pulp uses content addressable storage, this does not pose a problem. 

 The FileSystemStorage backend has been in Django's codebase for 9 years. Extending it to stop fix the file renaming behavior would allow us to take advantage of all the testing that has been performed on it's file moving and copying code. The Pulp plugin API should provide a TemporaryDownloadedFile that plugin writers can use to create Artifacts. The TemporaryDownloadedFile needs to have a temporary_file_path() method that tells FileSystemStorage to use the code path for TemporaryUploadedFile. 


 [0] https://github.com/django/django/blob/1.11/django/db/models/fields/files.py#L220 
 [1] https://github.com/django/django/blob/1.11/django/core/files/base.py#L14 
 [2] https://github.com/django/django/blob/1.11/django/core/files/storage.py#L249 
 [3] https://github.com/django/django/blob/1.11/django/core/files/uploadedfile.py#L59 
 [4] https://github.com/django/django/blob/1.11/django/core/files/uploadedfile.py#L84

Back