Story #5216
Updated by bmbouter about 4 years ago
## Background
Some users would like to disallow the use of certain checksums now determined to be insecure, e.g. md5 or sha1. It is desirable to allow users to configure which checksum types they want to use with Pulp.
## When does Pulp call checksums?
When computing the Artifacts themselves a variety of checksums are [computed here](https://github.com/pulp/pulpcore/blob/5c77622365eb1e9b03f835dca5a4f536b1382cd6/pulpcore/app/models/content.py#L220-L230) and then stored on [the Artifact model's checksum fields](https://github.com/pulp/pulpcore/blob/5c77622365eb1e9b03f835dca5a4f536b1382cd6/pulpcore/app/models/content.py#L135-L140).
## Feature plan
Introduce a new setting called `CONTENT_CHECKSUMS` which would identify the set() of CHECKSUMS that Pulp should be using. Here's an example of the default:
`CONTENT_CHECKSUMS = set("md5", "sha1", “sha224”, “sha256”, “sha384”, “sha512”)`
In this case, all checksums would be computed and stored as they do today.
If a user configured this with:
`CONTENT_CHECKSUMS = set("sha1", “sha224”, “sha256”, “sha384”, “sha512”)`
Then all checksums would be computed and used except md5.
If a user configured this with:
`CONTENT_CHECKSUMS = set(“sha224”, “sha256”, “sha384”, “sha512”)`
Then all checksums would be computed and used except md5 and sha1.
## sha256 cannot be removed
sha256 cannot be removed and must always be present in `CONTENT_CHECKSUMS` because Pulp's content addressable storage requires sha256 to lay the files out on disk.
All Pulp processes should refuse to start if sha256 is not present in `CONTENT_CHECKSUMS` by emitting a [`django.exceptions.ImproperlyConfigured` exception](https://docs.djangoproject.com/en/2.2/ref/exceptions/#improperlyconfigured) indicating that sha256 is required in `CONTENT_CHECKSUMS`.
## Model changes
The model changes should likely become:
```
md5 = models.CharField(max_length=32, null=True, unique=False, db_index=True)
sha1 = models.CharField(max_length=40, null=True, unique=False, db_index=True)
sha224 = models.CharField(max_length=56, null=True, unique=False, db_index=True)
sha256 = models.CharField(max_length=64, null=False, unique=True, db_index=True)
sha384 = models.CharField(max_length=96, null=True, unique=True, db_index=True)
sha512 = models.CharField(max_length=128, null=True, unique=True, db_index=True)
```
## Class attribute re-work
The `DIGEST_FIELDS`, `COMMON_DIGEST_FIELDS`, and `RELIABLE_DIGEST_FIELDS` should become properties which are memoized computations that are built from the configured `CONTENT_CHECKSUMS`.
## Docs
The new setting should have documentation on [this page in the Pulp Settings area](https://docs.pulpproject.org/pulpcore/settings.html#pulp-settings).
NOTE: this setting can never be changed once it's set prior to any data loaded into Pulp. We do not validate this; it's difficult to validate. Please document with a `.. warning::` block at the settings documentation.
## An additional check at Artifact instantiation time
The stages pipeline creates in-memory Artifacts, and these are later used to query the db if those Artifacts exist or not. We need to add a new `Artifact.__init__` which checks that all checksum values being set are in the set of `CONTENT_CHECKSUMS` available. If they are not raise a [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError).