-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve testing framework #37
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
# Conflicts: # src/lib.rs # src/tweenable.rs
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]> # Conflicts: # examples/sequence.rs # src/lib.rs # src/tweenable.rs
Signed-off-by: Alex Saveau <[email protected]>
# Conflicts: # examples/sequence.rs # src/lib.rs # src/tweenable.rs
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
# Conflicts: # examples/sequence.rs # src/tweenable.rs
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
# Conflicts: # src/tweenable.rs
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
@djeedai This PR is ready and is already showing its value by making it super easy to see that a bunch of stuff is broken (the transforms, completion counts, progress). I didn't bother converting the sequence/track/delay tests because I'll do those in a pre-PR before rewriting the implementations. |
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
# Conflicts: # Cargo.toml
src/tweenable.rs
Outdated
repeat(TweenState::Active), | ||
[ | ||
Transform::from_translation(Vec3::splat(0.6)), | ||
Transform::from_translation(Vec3::splat(-0.2)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it ever possible that a tween animating a transform between x=0 and x=1 produces a negative x value?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because main
is broken. :) Progress currently increases forever so when we pong you get 1. - 1.2
.
I'm actually super happy that you spotted this because it was broken in #19 but the tests were too complicated for us to even notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now in favor of progress being just this thing always in [0-1]
that only tells you how far along you are on a single loop. Anything else, especially since it's (or will be) a second-class concept, is too complicated I think and not valuable anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm personally still in favor of not including a progress API at all since it's not that hard to implement yourself with whatever semantics you like. That said, this won't be used internally so it doesn't matter too much and I'm fine implementing it either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though that's discussion for #38
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we write the test as if there was no bug, and merge a failing test, rather than merge a test that ensure there is a bug? I'm uncomfortable having tests wrong on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, comment out the test for now if the concern is CI will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually disagree on this one, opinions forthcoming.
Tests validate behavior, not correctness — only a human can assert correctness. In other words, tests document the current behavior.
By removing the tests, we lose the ability to see how the code currently works. That has value in and of itself: someone believing something is broken can find a test that confirms their understanding of the current behavior and then propose new behavior that the fix must adhere.
Furthermore, if there are no existing tests, when a change comes to "fix" the behavior we won't have old tests to validate the bug was actually fixed. That is, adding a new tests prevents you from seeing the transition from incorrect to correct, meaning you can't verify that the test is doing anything at all because you as the reviewer have never seen it fail. Again, this transition provides effortless documentation of the change in behavior that would otherwise be invisible... are the new tests just adding more coverage or validating some new behavior? Who knows!
Adding tests that validate behavior, right or wrong, document the state of the world and its transitions.
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I was not convinced at first, but that does look a lot easier to read now indeed. I've left a couple of comments, mostly nit-picking that you may skip if you wish, but I'd like to clarify #43 and the negative transform thing first before merging:
- on Figure out what to do in tick when tween is completed via set_{elapsed,progress} #43 it feels like there's a bug maybe?
- on the negative transform thing, I'd rather comment out a correct but failing test, than merge a test passing but asserting a wrong result.
src/tweenable.rs
Outdated
repeat(TweenState::Active), | ||
[ | ||
Transform::from_translation(Vec3::splat(0.6)), | ||
Transform::from_translation(Vec3::splat(-0.2)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we write the test as if there was no bug, and merge a failing test, rather than merge a test that ensure there is a bug? I'm uncomfortable having tests wrong on purpose.
src/tweenable.rs
Outdated
repeat(TweenState::Active), | ||
[ | ||
Transform::from_translation(Vec3::splat(0.6)), | ||
Transform::from_translation(Vec3::splat(-0.2)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, comment out the test for now if the concern is CI will fail.
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Signed-off-by: Alex Saveau <[email protected]>
Ok I've been thinking about these tests some more and I'm realizing that I've been dumb. What I'm trying to do is replicate expects tests but I'm forcing us to write code to generate the goldens instead of just using a golden file test. I'm going to switch to that which should be way less code and even easier to understand. |
Signed-off-by: Alex Saveau <[email protected]>
@djeedai Ok, WDYT? |
I think I don't like goldens conceptually. I can see the value for something like a codegen tool, or some logs, or other text-heavy things. But using that to compare numerical values output as text sounds pretty contrived when you can compare the values directly. This also removes the value of having an |
Signed-off-by: Alex Saveau <[email protected]>
I strongly disagree here. The asserts are basically useless which is part of why I'm trying to get rid of them. Let me demonstrate with an incorrect completion count. Here's what a failing golden test looks like:
The OG tests:
The previous iteration of the new framework:
For me, the goldens are hands down the easiest to understand. The OG one requires you to look at the code to find out which value is wrong while the previous iteration clearly needs some context on the current state of the tween. That aside, I think this goes to your next point:
I kind of agree here. But! The problem I'm trying to solve is understandability and ease of test addition. The OG tests are pretty much impossible to understand with all the indexing and branching (which also means it's impossible to add/update the tests). In the previous iteration, I think it became reasonable to understand which values were expected, but it was still hard to reason about the state context (i.e. after the 7th tick, what should my completion count be?). That meant writing the tests was still a pain because you have to come up with the math to generate the right values and it's super easy to make off-by-one errors or just goof the math. I do agree that the goldens are a bit of a weird approach, but they make writing tests effortless (just create a tween and say what your deltas are) and most importantly they're super easy to audit (they read like a story). I generally bias towards doing whatever makes the tests easiest to write, because that will lead to the most coverage and therefore the most bugs found. Alrighty, WDYT now? 😁 Any of that convincing? |
I like the idea and rationale to push for tests which are both easier to understand and to maintain. But I'm not convinced we need goldens for that. I think the value argued here is that the version with goldens outputs a human understandable text output that developers can look at to get some context on the failure. On the other hand, the fact goldens use text compare to check correctness is not required, and probably not wanted at all. Conversely, the assert version misses clarity for developers. So can we have the best of both worlds? Can we output to console the same paragraph of summary text information for each step, and then right after that use |
Yes except for the last part. I've been thinking about this off and on and I've discovered the following underlying assumption motivating the golden approach: I believe writing the math to generate correct values is too difficult. That is, generating the sequence of correct values takes too long to think about (so fewer tests will get written: to put numbers on this, writing the 6 tests in Hence, the golden approach embraces the idea that figuring out what the correct values are is too difficult and just says, "Hey, check out what this program did!" Correctness is then determined by deciding whether or not you agree with what the program did rather than pre-determining what the program should do.
I think this misses the key difference presented above: do you want to write a test that checks what the program did or one that determines what the program should do? The argument is that "did" is much easier to write and understand (for correctness) than "should do."
If the golden approach is abandoned, I'll definitely do this since it will make understanding the source of failures much easier, but it won't address the difficulty in writing the test or reasoning about its correctness when it's passing. AppendixIs the transform correct? let tween = Tween::new(
EaseMethod::Linear,
Duration::from_secs(1),
TransformPositionLens {
start: Vec3::ZERO,
end: Vec3::ONE,
},
)
.with_direction(TweeningDirection::Forward)
.with_repeat_count(RepeatCount::Finite(3))
.with_repeat_strategy(RepeatStrategy::Repeat);
let expected_values = ExpectedValues {
...
transforms: successors(Some(0.), |progress| Some(f32::min(3., progress + 1. / 3.)))
.map(|progress| Transform::from_translation(Vec3::splat(progress))),
}; We both know it's not because of #42, but I would argue that's not immediately obvious. You could make it easier to see by expanding out the
|
Fair enough, it's hard to write animation tests because there's a fair number of moving variables, and we're testing them in groups so we need to reason about all of them at once. And maybe that's the issue, we should test individual quantities (repeat count, duration, progress, etc.) in separate tests. But that doesn't mean making it easier to write those test should take priority over having tests that are correct.
Yes, "did" is much easier to write. It's also not what a test should do. I know you disagree with that, but I'll repeat my position which is that a test is here to confirm the developer's assumptions of how the code they wrote is supposed to behave. You have a mental model, convert it into some code, then write a test that says "here I wrote the code to achieve this result". And then you run the test and confirm that assumption. And to the extreme, with test-driven development, you actually write the test first. I see very little value in testing what the program actually does (as opposed to what it was meant to do):
In short, testing the actual behavior serves as a second set of documentation (you said it yourself) that necessarily conflicts with the textual docs (comments), otherwise you'd have a feature. It's also all too easy to forget about such a "wrong behavior" test in the middle of all the other tests. It encourages users looking at the test code to think of bugs as features and take a dependency on them, backing developers in a corner where the "bug" stops being a bug and can never be fixed again (looking at you Windows API, and so many others). Finally, I don't see where you draw the line between an "acceptable" wrong behavior and an "unacceptable" one. Is the behavior acceptable for users if the library produces "3.999999" instead of the "4.0" that the developer intended? Probably, for most users at least, even if not 100% accurate? What about negative With intended testing, you test what the behavior should be, as imagined by whoever wrote the code. If something is found wrong in practice, you either fix it now, or if not possible you log a bug and comment out the test Also, small digression, but on that topic we should have a clean |
The current tests are quite painful to work with since it's very difficult to tell what the expected values should actually be. This PR flips the testing around by passing in a sequence of expected values that the test framework will validate.
Note: depends on #19.