-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: cross validate timings #233
base: main
Are you sure you want to change the base?
Feature: cross validate timings #233
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #233 +/- ##
===========================================
Coverage 100.00% 100.00%
===========================================
Files 45 59 +14
Lines 2242 3893 +1651
===========================================
+ Hits 2242 3893 +1651 ☔ View full report in Codecov by Sentry. |
|
||
if compute_timings: | ||
for data in actual["metrics"]: | ||
assert len(expected_keys.intersection(set(data.keys()))) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking only keys intersection because we have very differing timings for each run? If this is the case, let's just round the actual results to some reasonable number. And check values.
Or check that these values are less then some treshold.
Also we need to check that we didn't mess metrics results.
So we need to check all of the values like you do below for case when timings are not needed. (but as I wrote in another place, you don't need to check compute_timings=False
scenario in this test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, chose threshold = 0.5
if timings is not None: | ||
start_time = time.time() | ||
yield | ||
timings[label] = round(time.time() - start_time, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we needed to round not in the actual code but in the tests below. we are adding rounding just to pass the tests. so it shouldn't affect the actual code in framework.
let's round to 5 digits here
"precision@2": 0.375, | ||
"recall@1": 0.5, | ||
"intersection_popular": 0.75, | ||
"fit_time": 0.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's drop timings from expected dicts.
and keep thresholds comparison.
just pop timings from the actual dict
"ref_models,validate_ref_models,expected_metrics,compute_timings", | ||
( | ||
( | ||
["popular"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep only ["popular"] and not put ref_models
in parametrize.
let's iterate over validate_ref_models
and compute_timings
. only 4 test cases are needed
Description
Closes issue #138
Type of change
How Has This Been Tested?
Before submitting a PR, please check yourself against the following list. It would save us quite a lot of time.