Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add util function to help automatically get horizon #1509

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Prev Previous commit
Next Next commit
Update tests
chenditc committed May 12, 2023
commit 091dae6c60ef6dd3153ac896ebfde19fd3552f1b
2 changes: 0 additions & 2 deletions qlib/contrib/data/dataset.py
Original file line number Diff line number Diff line change
@@ -140,8 +140,6 @@ def __init__(
handler = init_instance_by_config(handler)
label = handler.data_loader.fields["label"][0][0]
horizon = guess_horizon(label)
if horizon is None:
horizon = 0

assert num_states == 0 or horizon > 0, "please specify `horizon` to avoid data leakage"
assert memory_mode in ["sample", "daily"], "unsupported memory mode"
4 changes: 2 additions & 2 deletions qlib/utils/data.py
Original file line number Diff line number Diff line change
@@ -115,9 +115,9 @@ def guess_horizon(label):
horizon_list = [int(x) for x in re.findall(regex, label)]

if len(horizon_list) == 0:
return None
return 0
max_horizon = max(horizon_list)
# Unlikely the label doesn't use future information
if max_horizon < 2:
return None
return 0
return max_horizon + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it return max_horizon+1 here instead of return max_horizon or return max_horizon-min_horizon?

Copy link
Contributor Author

@chenditc chenditc May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with exisiting deinition of "horizon" in other code like DDG-DA and TRA. There is no official documentation, TRA workflow config use max_horizon to truncate and DDG-DA use max_horizon + 1. I use max_horizon + 1 here just to make things safe.

Any suggestion from qlib team?

2 changes: 1 addition & 1 deletion tests/misc/test_utils.py
Original file line number Diff line number Diff line change
@@ -129,7 +129,7 @@ def test_guess_horizon(self):

label = "Ref($close, -1) / Ref($close, -1) - 1"
result = guess_horizon(label)
assert result is None
assert result is 0


if __name__ == "__main__":