implementing virus scan on on_demand repository
I've setup a on_demand PyPi remote. Business policy here is that all downloaded files should be scanned for virusses. I'd like to use a virus scanner in between the download from PyPi and the publish / stream to the client.
bmbouter pointed out the file is being streamed to the client right away so maybe there can implemented some kind of hook to support scans in between..?
I've attached IRC log for reference.
Extending the content app to make a call during the workflow could be pretty straight forward. This would cause the entire file to be downloaded before any part of it could be served, is that ok?
So the workflow for the content app would be:
- User requests the file to be downloaded from a repo with, e.g. policy="on_demand"
- pulpcore-content downloads that file (not streaming any bit to the user yet)
- Calls out to the virus scanner knowing the file path
- Reads an "ok" to proceed based on the return code maybe?
- Serves and saves the file per the on_demand policy.
For steps 3 and 4, I imagined there could be a system-wide config with a script that pulp would call that an admin would configure. The path to the file would be the first positional argument to the script. Then for set 4, the script would return an exit code of 0 would tell pulp to proceed, anything else to not.
@ByteSore what do you think about all this?
I was thinking.. Since the scanner takes a while to spin up, scan all the files and do it's thing maybe it's a good thing to check if there are multiple files being downloaded. (ie package with all dependencies) Park them all in a folder or store all the file path's to a textfile which can be used by the scanner. if you look at clamav, it knows an option to scan files listed in a textfile: --file-list=FILE
Please register to edit this issue