Story #8088: As a user, I can configure Pulp to virus scan files prior to saving or serving them

Story #8088

## Motivation 

 The business I've setup a on_demand PyPi remote. 
 Business policy here is that all downloaded files should be scanned for viruses. The on_demand policy is being used with a PyPI remote because syncing all of PyPI virusses. 
 I'd like to just use a few files is not practical. 

 ## User Experience 

 1. The user sets up a virus scanner like `clamav` in daemon mode. 
 2. Administrator configures between the setting `SECURITY_SCAN_SHELL`. 
 3. User restarts Pulp 
 4. All content already in saved in Pulp is *not* scanned. Any new content brought into Pulp through any policy type, e.g. `immediate|on_demand|streamed` is first scanned before being saved or handed to a client. 

 ## Implementation 

 ### on_demand download from PyPi and streamed policies 

 The [`pulpcore_content app`](https://github.com/pulp/pulpcore/blob/master/pulpcore/content/handler.py) handles the `policy=on_demand` and `policy=streamed` modes. Currently those modes "stream" bits to clients meaning they begin serving data prior to receiving all of it. When scanning is enabled those modes can no longer do this, they have to "store, scan, and forward" the data to clients because security scanners can only scan entire files not a publish / stream of data. So the first change to prototype is having a store and forward change when this setting is in place. 

 ### immediate policy 

 To handle the immediate policy a new "Artifact" stage should be created. The [Artifact stages live here](https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/stages/artifact_stages.py). client. 

 The pipeline is a series of stages, and this stage should only be used when bmbouter pointed out the `SECURITY_SCAN_SHELL` setting file is set. The default stage construction [happens here](https://github.com/pulp/pulpcore/blob/master/pulpcore/plugin/stages/declarative_version.py#L107-L133). 

 ## Submitting data for scanning 

 In all policy types data should be submitted for scanning by calling the `SECURITY_SCAN_SHELL` with the path being streamed to the temporary file added into that command somehow. The response code 0 from the AV scanner shell subprocess call will tell Pulp to accept the file and that's its safe. Any other response code will tell Pulp to throw client right away the file and send the client an error code so maybe there can implemented some kind of some kind. 

 ## Notes 

 * The on_demand and streamed policies have hook to submit files to the scanner one-by-one. This is unavoidable. 
 * Having the `clamav` scanner run support scans in daemon mode should resolve the "startup time" problem. 
 * The calls to the shell need to be validly callable by the Pulp user on wherever the `pulpcore_content` process is running *and* any `pulpcore reserved worker`. between..? 

 ## TBD 

 * What HTTP error code to submit to clients when the AV scanner says the file is not safe? 
 * The file saved may have a filename like `/path/to/tmp/dirs/e0e9fe76-a80a-47c7-8255-1ec4c8f8c7da`. This doesn't have a filetype. Does it need a filetype? 
 * How will the subprocess call know where in the command to put the path to the file to scan? 
 I've attached IRC log for reference.
Back
Project

Profile

Help

Pulp

Story #8088