bmbouter wrote:
I agree it's possible for a plugin writer to do something unsafe if they are doing it wrong. That's a usability problem, but I'm not sure how we can improve it (see the last bit).
The main problem statement suggests that it's unexpected that the file will be moved in to place. I don't understand that problem statement. Django works that way, and Pulp documents it too here: https://github.com/pulp/pulp/blob/master/pulpcore/pulpcore/app/models/content.py#L15-L16
I don't agree that "Django works that way". The delete only happens because our ArtifactFileField is implemented using (indirectly) TemporaryUploadFile which deletes the source by default. The stock django FileField does not behave this way. Except for [1]. So, I think most django developers will find the delete surprising.
As for being documented. Agreed. I missed that. Although the term "move" literally means that the file will no longer be located at the source, It would be easy read that figuratively and the delete could be unexpected. That said, clearly saying that the source will be DELETED would be an improvement.
[1] The FileSystemStorage does move the file instead of copy when on the same filesystem which means that it isn't consistent behavior or consistent with the Storage API.
In practice how would this simple mistake happen? plugin writers don't type in these paths, they get them using Artifact.storage_path() and use the path unmodified.
Per this example in the description, the plugin writer could mean to be doing the right thing but accidentally do something else. Our artifact storage API should guard against foreseeable pitfalls much as possible/practical.
I could see adding some validation as one way to improve this. The problem is though, that validation would be only be called when calling save() itself, that validation would never be called when plugin writers are moving files into place themsleves as with Artifact.bulk_create().
So even if we add validation plugin writers can still mess things up. There isn't any way we can stop them because they have access to the filesystem directly.
True, but I think plugin code that touches the filesystem directly would be a clear indicator that writer did not have good intentions.
@jortel so with all ^ comments what do you think we should do?
Adding clearer, more explicit language in the Artifact docstrings that the source file will be deleted would help.
I'm mainly concerned about setting file= to something already in storage (within MEDIA_ROOT) when creating an Artifact given the (move) delete behavior. I don't think we need to permit file= being a path or anything in side the MEDIA_ROOT to support bulk create and we should prevent it.
This reproducer bulk creates an artifact using file=/tmp/jeff and the file is moved into storage.
def reproducer():
artifact = Artifact(
file='/tmp/jeff',
size=1,
md5='a',
sha1='a',
sha224='a',
sha256='apple',
sha384='a',
sha512='a')
Artifact.objects.bulk_create([artifact])
File /var/lib/pulp/artifact/ap/ple is created with jeff content.
[jortel@f27a artifact]$ tree .
.
├── 03
│ └── f03d345a6e562ad6618cfab11615dec474f1f593265fceff7694ce5f8024d8
├── a7
│ └── ad7899772cd1148c1b103cf661899890cd4e325582f947b05e1d36ecc94ae1
└── ap
└── ple <------- HERE
pulp=# select id,file from pulp_app_artifact;
id | file
----+------------------------------------------------------------------------------------------
19 | /var/lib/pulp/artifact/03/f03d345a6e562ad6618cfab11615dec474f1f593265fceff7694ce5f8024d8
22 | /var/lib/pulp/artifact/49/bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14
23 | /var/lib/pulp/artifact/ap/ple <----- HERE
(5 rows)
Problem: ArtifactFileField can delete an existing artifact's file
Solution: Add a check to see if the source file is already in artifact storage
Prior to this patch it was possible to break an existing artifact by passing a path to it's file as the file for a new Artifact.
This patch also removes extraneous code in the ArtifactSaver stage. The code was present because an assumption was made that bulk_save() on Artifacts would not call the presave handler on the ArtifactFileField. However, this is not the case and the pre_save handler is getting called.
closes: #3915 https://pulp.plan.io/issues/3915