Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: could not convert string to float: 'SH600000' when i use dump_bin.py #1852

Open
nb7123 opened this issue Oct 12, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@nb7123
Copy link

nb7123 commented Oct 12, 2024

🐛 Bug Description

To Reproduce

Steps to reproduce the behavior:

  1. python scripts/data_collector/baostock_5min/collector.py download_data --source_dir ~/.qlib/stock_data/source/hs300_5min_original --start 2022-01-01 --end 2022-01-30 --interval 5min --region HS300

  2. python scripts/dump_bin.py dump_all --csv_path ~/.qlib/stock_data/source/hs300_5min_original --qlib_dir ~/.qlib/qlib_data/samples

Error:
"""
Traceback (most recent call last):
File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk
return [fn(*args) for args in chunk]
File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 198, in
return [fn(*args) for args in chunk]
File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 264, in _dump_bin
self._data_to_bin(df, calendar_list, features_dir)
File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 238, in _data_to_bin
np.hstack([date_index, _df[field]]).astype("<f").tofile(str(bin_path.resolve()))
ValueError: could not convert string to float: 'SH600000'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 508, in
fire.Fire({"dump_all": DumpDataAll, "dump_fix": DumpDataFix, "dump_update": DumpDataUpdate})
File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 568, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 271, in call
self.dump()
File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 322, in dump
self._dump_features()
File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 313, in _dump_features
for _ in executor.map(_dump_func, self.csv_files):
File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
ValueError: could not convert string to float: 'SH600000'

Screenshot

Environment

Darwin
arm64
macOS-14.2-arm64-arm-64bit
Darwin Kernel Version 23.2.0: Wed Nov 15 21:54:55 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8122

Python version: 3.8.19 (default, Mar 20 2024, 15:27:52) [Clang 14.0.6 ]

Qlib version: 0.9.5
numpy==1.23.5
pandas==2.0.3
scipy==1.10.1
requests==2.32.3
sacred==0.8.6
python-socketio==5.11.4
redis==5.0.8
python-redis-lock==4.0.0
schedule==1.2.2
cvxpy==1.5.2
hyperopt==0.1.2
fire==0.6.0
statsmodels==0.14.1
xlrd==2.0.1
plotly==5.24.1
matplotlib==3.7.5
tables==3.7.0
pyyaml==6.0.2
mlflow==1.14.1
tqdm==4.66.5
loguru==0.7.2
lightgbm==4.5.0
tornado==6.4.1
joblib==1.4.2
fire==0.6.0
ruamel.yaml==0.17.36

Additional Notes

@nb7123 nb7123 added the bug Something isn't working label Oct 12, 2024
@ghyzx
Copy link

ghyzx commented Oct 13, 2024

same issue

1 similar comment
@zqingr
Copy link

zqingr commented Oct 15, 2024

same issue

@SunsetWolf
Copy link
Collaborator

According to the information you provided, there is a problem with your operation steps, after download_data, you need to do the normalization_data first, and then you can use dump_bin to convert the data to bin format. The documentation is here.
Of course, before you do the normalization, you need to prepare a daily frequency data, the time of the daily frequency data should include the time of the normalized data.

Nathan-Bransby-NMT added a commit to Nathan-Bransby-NMT/qlib that referenced this issue Dec 3, 2024
…sion

Fixes microsoft#1852

Modify `scripts/dump_bin.py` to handle the conversion of string 'SH600000' to float correctly.

* **Exclude 'symbol' field from conversion**:
  - Modify `_data_to_bin` method to exclude the 'symbol' field from conversion to float.
  - Add a check to ensure 'symbol' field is not included in the fields to be converted.

* **Update `normalize_data` method**:
  - Ensure `normalize_data` method in `scripts/data_collector/baostock_5min/collector.py` processes data correctly without converting 'symbol' to float.
  - Update `normalize_baostock` method to retain 'symbol' field as a string.

* **Documentation update**:
  - Emphasize the importance of `normalize_data` before using `dump_bin.py` in `scripts/data_collector/baostock_5min/README.md`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants