4420_IRC_Conversation.log

Issue #4420 » 4420_IRC_Conversation.log

Short conversation about the problem from IRC - bherring, 02/18/2019 02:04 PM

       [11:09:58] <x9c4> dkliban: about #4420: Some of the tests run on travis explicitely use 'random.choice' which might end up hitting code paths only occasionally. That is not going to produce stable test results, and it is quite visible when coverage reports missing lines you did not even touch.
       [11:10:24] <asmacdo> x9c4: pulp 2 or pulp 3?
       [11:10:34] <x9c4> pulp3 at least.
       [11:11:35] <bherring> x9c4: so, you would prefer static test run order?
       [11:12:24] <x9c4> It's not about the order.
       [11:12:47] <bherring> Consistent coverage?
       [11:13:52] <asmacdo> yeah that would be cool. the coverage report sometimes drops (and indicates failure) when it shouldn't have
       [11:13:54] <x9c4> Yes.
       [11:14:15] <dawalker> +1
       [11:14:27] <x9c4> I've seen it drop like 2% without even changing a line.
       [11:14:28] <dawalker> have experienced that and it's annoying
       [11:14:29] <bherring> asmacdo: could you present an example?
       [11:15:06] <asmacdo> bherring: don't know of any off the top of my head
       [11:15:18] <asmacdo> x9c4: did you see this on your PR?
       [11:15:47] <dawalker> bherring, I don't have a recent one off the top of my head either, but I made like a character change in one line or something that didn't even change line count and it failed on coverage dropping, when the line change was 0%
       [11:16:18] <bherring> `failed on coverage` -- I need ot understand better on what this means. It sounds like a vanity metric.
       [11:17:41] <asmacdo> yeah-- its something we mostly ignore, but we mostly ignore it because its not reliable
       [11:17:44] <dawalker> I don't know how to pull up the types of messages travis gives.  basically something along the lines of, coverage isn't allowed to drop and then it registered it has and fails blocking merge.  I don't even remember which repo that was on.
       [11:18:09] <dawalker> I think it was back when I was working w/ dkliban maybe he can clarify
       [11:18:16] <bherring> hmm... I would like to have a few more examples of this, if possible.
       [11:18:31] <bherring> In general, random order is something functional strives for, not runs away from.
       [11:19:43] <x9c4> But the set of codelines that are hit by tests should not depend on that order.
       [11:19:43] <bherring> ... iff I am understanding correctly. Provide some more examples, and I am sure QE is more than happy to take input/suggestions.
       [11:20:13] <bherring> Maybe? I mean, the simple answer is `add more tests`
       [11:20:35] <dawalker> someone said it's due to removing lines and thus the remaining skipped ones count for a greater percent and hit the limits set? https://github.com/lemurheavy/coveralls-public/issues/565
       [11:21:27] <bherring> Is code coverage including unit + functional, or functional only?
       [11:22:20] <x9c4> it is both.
       [11:22:24] <bherring> From what dawalker , there appears to be `COVERAGE DECREASE THRESHOLD FOR FAILURE` that can be be evaluated.
       [11:23:34] <asmacdo> hmm. well, personally I think coverage is a useful metric-- but probably something we can defer for now until all the [everything] calms down a bit
       [11:24:27] <bherring> I think I would need more information to understand if there is concern, as there may be. I would just like to know what tolerances are acceptable, how often it happens, and what is causing the issues. All good things to talk about! :)
       [11:24:36] <dawalker> yeah, https://github.com/gonum/internal/pull/25#issuecomment-177295526
       [11:25:22] <bherring> I am never against MORE useful/valid testing and ensuring there are no holes in coverage, for valid KPIs
       [11:26:07] <bherring> nice, x9c4 and dawalker . QE will look into it as well
       [11:26:51] <dawalker> bherring, next time it happens, I'll be sure to ping you.  It's been a while since I've run into it.  x9c4 likely has the most recent example.
       [11:27:16] <x9c4> I am still searching for a good example, but if for example we choose the sync_policy at random, different code paths are tested in different runs.
       [11:27:38] <bherring> x9c4: that is true and accurate
       [11:27:50] <bherring> However, should not unit test cover a mock of all of those (possibly)?
       [11:28:01] <bherring> Therefore, and aggregate usage of them should cover all code paths.
       [11:28:39] <x9c4> haven't looked enough into unit tests...
       [11:29:18] <bherring> For download-policy, I am pretty sure we cover all the base uses of a flag + perms that are useful. There may be other tests that engage download-policies that choose to use random, as it isn't the main purpose of the test.
       [11:29:39] <x9c4> I will try to observe and report the next time i see it.
       [11:29:57] <bherring> x9c4: awesome! Add it to the ticket. I will watch that ticket as well! Thank you!
       [11:30:10] <x9c4> bherring++
       [11:30:10] <pulpbot> x9c4: bherring's karma is now 8
       [11:31:31] <bherring> x9c4++ thanks for helping to improve test coverage and quality ;)
       [11:31:43] <bherring> x9c4++
       [11:31:44] <pulpbot> bherring: x9c4's karma is now 15

(1-1/1)

Project

Profile

Help

Pulp

Issue #4420 » 4420_IRC_Conversation.log