https://pulp.plan.io/https://pulp.plan.io/favicon.ico2016-02-16T19:15:26ZPulpPulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=90142016-02-16T19:15:26Zbmbouterbmbouter@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-11 priority-6 priority-default closed child" href="/issues/1619">Issue #1619</a>: as user, I can export repo groups with different checksum than sha256</i> added</li></ul> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=90972016-02-19T13:45:23Zbmbouterbmbouter@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-11 priority-7 priority-high2 closed child" href="/issues/1618">Issue #1618</a>: --checksum-type is broken</i> added</li></ul> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=105922016-04-12T13:49:14ZAnonymous
<ul><li><strong>Sprint/Milestone</strong> set to <i>19</i></li></ul> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=108962016-04-20T15:48:13Zjcline@redhat.comjcline@redhat.com
<ul></ul><p>I feel like this issue is part of a larger issue, namely designing a plugin API. Furthermore, I imagine this 'unification' requires actually tracking individual files in Pulp with metadata requires to validate the integrity of the file (checksums/checksum types, size, maybe permissions/ownership, location, etc). Our current data models don't do this.</p>
<p>Since we don't have either of those things fleshed out, would it be reasonable figure out what we're doing there before trying to work on this story? Or is this about having a very high-level idea of where we'd like to be?</p> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=109022016-04-20T17:58:08Zjcline@redhat.comjcline@redhat.com
<ul><li><strong>Related to</strong> deleted (<i><a class="issue tracker-1 status-11 priority-6 priority-default closed child" href="/issues/1619">Issue #1619</a>: as user, I can export repo groups with different checksum than sha256</i>)</li></ul> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=109062016-04-20T18:34:20Zbmbouterbmbouter@redhat.com
<ul></ul><p>@jcline I agree with all of the observations you made, especially the part that requires us to know what we want out of this story.</p>
<p>For myself, I was thinking the latter thing you suggested ... This story is to write out "a very high-level idea of where we'd like to be".</p> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=109072016-04-20T19:54:12Zjcline@redhat.comjcline@redhat.com
<ul></ul><p>Okay, given that, how about I outline what I'm thinking and if no one objects we can edit the description on this story.</p>
<a name="Content-Integrity-in-Pulp"></a>
<h1 >Content Integrity in Pulp<a href="#Content-Integrity-in-Pulp" class="wiki-anchor">¶</a></h1>
<p>As a user of Pulp, I would like to ensure the content I am serving to my clients is correct (that is to say, it is what I think it is). Content can become incorrect for several reasons. The most likely reason for incorrect content is probably a bug in either Pulp's code, or one of the libraries we use. However, these bugs are not the only reason content could be incorrect. Bit rot can occur, as can bugs in hard drive firmware[0]. Some file systems are capable of detecting bit rot of whole files and potentially repairing it[1][2][3], but many do not (like ext4 and XFS). Even if they could, we should not rely on a software layer below us for the integrity of the content we manage.</p>
<p>Things we want to be able to do:</p>
<ul>
<li>Tell Pulp to check every file's integrity (a pulp-scrub if you will)</li>
<li>Tell Pulp to attempt to fix problems it finds (re-download the file or similar)</li>
<li>Tell Pulp a particular file is <em>bad</em> and it should retrieve it again (I'm thinking about content types that don't have checksums as part of their metadata, or to recover from bugs in the two bullet points above)</li>
</ul>
<a name="How-Our-Data-Model-Must-Change"></a>
<h2 >How Our Data Model Must Change<a href="#How-Our-Data-Model-Must-Change" class="wiki-anchor">¶</a></h2>
<p>To have any chance of providing content integrity validation and potential repair (a pulp-scrub, if you will), we must track each and every file for all content units. We don't currently do this. There are multi-file units (like Distributions, and maybe OSTree?). Information we probably want to track for each file:</p>
<ul>
<li>checksum</li>
<li>checksum_type (although we might want to just stick with sha256 and not tie this to any potential metadata we know about the file)</li>
<li>size of the file</li>
<li>the origin of the file</li>
<li>the storage location of the file</li>
<li>access control settings?</li>
</ul>
<p>It might not be a bad idea to also track the integrity of the this metadata by hashing it and storing the hash with it. This should happen for every file we manage, regardless of content type. Therefore, this should live in the platform and leads us to...</p>
<a name="Plugin-API"></a>
<h2 >Plugin API<a href="#Plugin-API" class="wiki-anchor">¶</a></h2>
<p>We need to define a plugin API with this feature in mind. It could potentially happen as part of the file retrieval. The user would provide a URL and if its available to them, the data integrity information (think RPM's primary.xml metadata file which provides locations and checksums). The platform would handle retrieving the files, validating the download went smoothly (with the provided checksums) or generating the initial checksum, creating the database record for the file, etc. This is just a thought though, and worth fleshing out.</p>
<p>[0] <a href="http://indico.cern.ch/event/13797/contributions/1362288/attachments/115080/163419/Data_integrity_v3.pdf" class="external">http://indico.cern.ch/event/13797/contributions/1362288/attachments/115080/163419/Data_integrity_v3.pdf</a><br>
[1] <a href="https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.7/BitRot.md" class="external">https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.7/BitRot.md</a><br>
[2] <a href="https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub" class="external">https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub</a><br>
[3] <a href="https://pthree.org/2012/12/11/zfs-administration-part-vi-scrub-and-resilver/" class="external">https://pthree.org/2012/12/11/zfs-administration-part-vi-scrub-and-resilver/</a></p> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=109272016-04-22T13:42:01Zmhrivnakmhrivnak@redhat.com
<ul></ul><p>This is a good plan that will go well with other plans for data model improvements in 3.y.</p>
<p>ostree is a strange case. It may not make sense to track all of its files. Otherwise, this should work well for other content types.</p>
<p>Thinking of this as a whishlist, in at least some cases we want to store a gpg signature with a file.</p>
<p>What do you have in mind for access control settings? I don't think we have anything like that on units today.</p>
<p>When you say "It might not be a bad idea to also track the integrity of the this metadata by hashing it and storing the hash with it.", what are you trying to guard against? Database corruption?</p> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=109282016-04-22T13:59:20Zjcline@redhat.comjcline@redhat.com
<ul></ul><blockquote>
<p>ostree is a strange case. It may not make sense to track all of its files. Otherwise, this should work well for other content types.</p>
</blockquote>
<p>There are, of course, types of content that provide their own integrity checks. That's fine, and whether or not we introduce additional safeguards or not really depends on the situation, but we should ensure there is a way to "get at" that feature of OSTree (or Git, or whatever) to scrub the content unit and repair it.</p>
<blockquote>
<p>Thinking of this as a whishlist, in at least some cases we want to store a gpg signature with a file.</p>
</blockquote>
<p>It feels like it might be a layer up, abstraction-wise, but I'm not familiar enough with how each content type that supports GPG-signing does it. It's worth investigating further, in any case.</p>
<blockquote>
<p>What do you have in mind for access control settings? I don't think we have anything like that on units today.</p>
</blockquote>
<p>Just trying to be forward-thinking. Suppose permissions in /var/lib/pulp/content get trashed - it'd be nice to recover from that.</p>
<blockquote>
<p>When you say "It might not be a bad idea to also track the integrity of the this metadata by hashing it and storing the hash with it.", what are you trying to guard against? Database corruption?</p>
</blockquote>
<p>Sure. I don't know much about the integrity checks databases provide (I'd be surprised if they didn't have some), but I also know people write bugs. It'd be good to have an additional check and a good way to recover when something inevitably goes wrong. I don't feel as strongly about this particular feature because it pales in comparison to the other problems we have, but while we're thinking about it I think it's worth looking into.</p> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=111642016-05-02T15:30:00Zmhrivnakmhrivnak@redhat.com
<ul><li><strong>Sprint/Milestone</strong> changed from <i>19</i> to <i>20</i></li></ul> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=116532016-05-23T02:27:29Zmhrivnakmhrivnak@redhat.com
<ul><li><strong>Sprint/Milestone</strong> changed from <i>20</i> to <i>21</i></li></ul> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=122202016-06-09T14:40:56Zjortel@redhat.comjortel@redhat.com
<ul></ul><p>Under model changes:</p>
<blockquote>
<p>the origin of the file</p>
</blockquote>
<p>What is an <em>origin</em> and why would we track it?</p> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=122222016-06-09T14:45:19Zbmbouterbmbouter@redhat.com
<ul><li><strong>Sprint/Milestone</strong> deleted (<del><i>21</i></del>)</li><li><strong>Platform Release</strong> changed from <i>2.9.0</i> to <i>3.0.0</i></li><li><strong>Groomed</strong> changed from <i>No</i> to <i>Yes</i></li></ul> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=122242016-06-09T14:48:02Zbmbouterbmbouter@redhat.com
<ul><li><strong>Sprint Candidate</strong> changed from <i>Yes</i> to <i>No</i></li></ul> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=122872016-06-10T15:15:51Zjcline@redhat.comjcline@redhat.com
<ul></ul><p><a href="mailto:jortel@redhat.com" class="email">jortel@redhat.com</a> wrote:</p>
<blockquote>
<p>Under model changes:</p>
<blockquote>
<p>the origin of the file</p>
</blockquote>
<p>What is an <em>origin</em> and why would we track it?</p>
</blockquote>
<p>Where the file came from (a list of urls, I guess) so we can make an attempt to automatically retrieve and repair the file.</p> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=367732019-04-12T19:24:26Zbmbouterbmbouter@redhat.com
<ul><li><strong>Status</strong> changed from <i>NEW</i> to <i>CLOSED - WONTFIX</i></li></ul> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=368022019-04-12T19:27:04Zbmbouterbmbouter@redhat.com
<ul></ul><p>Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" class="external">developer mailing list</a>.</p> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=391842019-04-15T20:35:55Zbmbouterbmbouter@redhat.com
<ul><li><strong>Tags</strong> <i>Pulp 2</i> added</li></ul> Pulp - Story #1647: Unify checksum management to the platform and add some featureshttps://pulp.plan.io/issues/1647?journal_id=408672019-04-16T17:09:24Zdaviddavis
<ul><li><strong>Platform Release</strong> deleted (<del><i>3.0.0</i></del>)</li></ul>