infer_freq() doesn't recognize monthly output if the time dimension is the middle of each month #9877

mnlevy1981 · 2024-12-11T22:31:47Z

What happened?

CESM used to write the time dimension of its output files at the end of the averaging period, so for monthly output the following would hold:

January averages would have a time dimension of midnight on February 1
February averages would have a time dimension of midnight on March 1
etc

The version currently being developed uses the middle of the averaging period, so

January averages now have a time dimension of noon on January 16 (15.5 days into a 31 day month)
February averages now have a time dimension of midnight on February 15 (14 days into a 28 day month)
etc

Some of our diagnostic packages (https://geocat-comp.readthedocs.io/en/latest/user_api/generated/geocat.comp.climatologies.climatology_average.html) require uniformly spaced data and rely on xr.infer_freq() to enforce that. infer_freq() recognizes Feb 1, March 1, April 1, ... as monthly but does not do the same for January 16 (12:00), Feb 15, March 16 (12:00), April 16, ...

What did you expect to happen?

It would be great if infer_freq() could recognize a time dimension of monthly mid-points as having a monthly frequency

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr

month_bounds = np.array([0., 31., 59., 90., 120., 151., 181., 212., 243., 273., 304., 334., 365.])
mid_month = xr.decode_cf(xr.DataArray(0.5*(month_bounds[:-1] + month_bounds[1:]), attrs={'units': 'days since 0001-01-01 00:00:00', 'calendar': 'noleap'}).to_dataset(name='time'))['time']
end_month = xr.decode_cf(xr.DataArray(month_bounds[1:], attrs={'units': 'days since 0001-01-01 00:00:00', 'calendar': 'noleap'}).to_dataset(name='time'))['time']

print(f'infer_freq(mid_month) = {xr.infer_freq(mid_month)}') # None
print(f'infer_freq(end_month) = {xr.infer_freq(end_month)}') # 'MS'

MVCE confirmation

Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.
Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

>>> print(f'infer_freq(mid_month) = {xr.infer_freq(mid_month)}') # None
infer_freq(mid_month) = None
>>> print(f'infer_freq(end_month) = {xr.infer_freq(end_month)}') # 'MS'
infer_freq(end_month) = MS

Anything else we need to know?

I'm not familiar enough with xarray to be able to offer up a solution, but I figured logging the issue was a good first step. Sorry I can't do more!

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:24:40) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 5.14.21-150400.24.18-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2024.11.0
pandas: 2.2.3
numpy: 2.2.0
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.6.0
pip: 24.3.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

The text was updated successfully, but these errors were encountered:

mnlevy1981 added bug needs triage Issue that has not been reviewed by xarray team member labels Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

infer_freq() doesn't recognize monthly output if the time dimension is the middle of each month #9877

infer_freq() doesn't recognize monthly output if the time dimension is the middle of each month #9877

mnlevy1981 commented Dec 11, 2024

INSTALLED VERSIONS

infer_freq() doesn't recognize monthly output if the time dimension is the middle of each month #9877

infer_freq() doesn't recognize monthly output if the time dimension is the middle of each month #9877

Comments

mnlevy1981 commented Dec 11, 2024

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS