Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax nanosecond datetime restriction in CF time decoding #9618

Open
wants to merge 92 commits into
base: main
Choose a base branch
from

Conversation

kmuehlbauer
Copy link
Contributor

@kmuehlbauer kmuehlbauer commented Oct 13, 2024

This is another attempt to resolve #7493. This goes a step further than #9580.

The idea of this PR is to automatically infer the needed resolutions for decoding/encoding and only keep the constraints pandas imposes ("s" - lowest resolution, "ns" - highest resolution). There is still the idea of a default resolution, but this should only take precedence if it doesn't clash with the automatic inference. This can be discussed, though. Update: I've implemented time-unit-kwarg a first try to have default resolution on decode, which will override the current inferred resolution only to higher resolution (eg. 's' -> 'ns'). To work towards #4490 the time decoding options (decode_time and use_cftime are bundled within CFDatetimeCoder which is distributed via decode_times kwarg. use_cftime-kwarg is deprecated.

For sanity checking, and also for my own good, I've created a documentation page on time-coding in the internal dev section. Any suggestions (especially grammar) or ideas for enhancements are much appreciated.

There still might be room for consolidation of functions/methods (mostly in coding/times.py), but I have to leave it alone for some days. I went down that rabbit hole and need to relax, too 😬.

Looking forward to get your insights here, @spencerkclark, @ChrisBarker-NOAA, @pydata/xarray.

Todo:

  • floating point handling
  • update decoding tests to iterate over time_units (where appropriate)
  • CFDatetimeCoder as input for decode_times kwarg
  • ...

@kmuehlbauer
Copy link
Contributor Author

Nice, mypy 1.12 is out and breaks our typing, 😭.

@TomNicholas
Copy link
Member

Nice, mypy 1.12 is out and breaks our typing, 😭

Can we pin it in the CI temporarily?

@TomNicholas TomNicholas mentioned this pull request Oct 14, 2024
4 tasks
@kmuehlbauer
Copy link
Contributor Author

Can we pin it in the CI temporarily?

Yes, 1.11.2 was the last version.

@kmuehlbauer kmuehlbauer force-pushed the any-time-resolution-2 branch from ca5050d to f7396cf Compare October 14, 2024 16:09
@kmuehlbauer kmuehlbauer marked this pull request as ready for review October 14, 2024 18:05
@kmuehlbauer
Copy link
Contributor Author

This is now ready for a first round of review. I think this is already in a quite usable state.

But no rush, this should be thoroughly tested.

@spencerkclark
Copy link
Member

Sounds good @kmuehlbauer! I’ll try and take an initial look this weekend.

@alippai
Copy link

alippai commented Dec 12, 2024

I see many notes about the units s-us. Will this work with minutes or days too? Eg date_range and other sources

Copy link
Contributor Author

@kmuehlbauer kmuehlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review Deepak, ❤️

I'll try to get along over the weekend.

@@ -75,18 +74,14 @@ using a standard calendar, but outside the `nanosecond-precision range`_
any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the nanosecond-precision range.
- Any dates are outside the nanosecond-precision range (prior xarray version 2024.11)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: Fix version

@@ -75,18 +74,14 @@ using a standard calendar, but outside the `nanosecond-precision range`_
any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the nanosecond-precision range.
- Any dates are outside the nanosecond-precision range (prior xarray version 2024.11)
- Any dates are outside the time span limited by the resolution (from xarray version v2024.11)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Fix version

doc/user-guide/weather-climate.rst Outdated Show resolved Hide resolved
doc/user-guide/time-series.rst Outdated Show resolved Hide resolved
@@ -644,13 +644,14 @@ def to_datetimeindex(self, unsafe=False):
CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00],
dtype='object', length=2, calendar='standard', freq=None)
>>> times.to_datetimeindex()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it.

@kmuehlbauer
Copy link
Contributor Author

I see many notes about the units s-us. Will this work with minutes or days too? Eg date_range and other sources

@alippai Can you give an example what you have in mind? Currently xarray holds datetimes as np.datetime64[ns] under the hood. This PR relaxes this restriction so datetimes can be represented as np.datetime64[s], np.datetime64[ms] and np.datetime64[us], too.

@kmuehlbauer
Copy link
Contributor Author

We need to have a big note in whats-new.rst about behaviour changes.

1. The default for `pd.date_range` seems to have changed, and we now preserve the unit instead of casting. This will impact downstream code.

2. Has the default unit for `xr.date_range` changed? We'll need to add the `unit` kwarg with default `ns` and start warning about switching to `us` by default.

3. What is the impact on `polyfit` and `differentiate`? These are places where I (and many others) have manually rescaled from `ns` to other units dividing by `1e9`. We don't want to break this silently.

Thanks @dcherian for bringing this to attention. I think we can circumvent these behaviour changes. Let me try first.

@alippai
Copy link

alippai commented Dec 13, 2024

I see many notes about the units s-us. Will this work with minutes or days too? Eg date_range and other sources

@alippai Can you give an example what you have in mind? Currently xarray holds datetimes as np.datetime64[ns] under the hood. This PR relaxes this restriction so datetimes can be represented as np.datetime64[s], np.datetime64[ms] and np.datetime64[us], too.

Larger units like np.datetime64[D] and m types

@kmuehlbauer
Copy link
Contributor Author

kmuehlbauer commented Dec 17, 2024

@kmuehlbauer kmuehlbauer mentioned this pull request Dec 18, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants