Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add temporal bounds and center times for group_average() API #717

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tomvothecoder
Copy link
Collaborator

Description

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder added the type: enhancement New enhancement request label Nov 22, 2024
@tomvothecoder tomvothecoder self-assigned this Nov 22, 2024
@tomvothecoder tomvothecoder force-pushed the feature/565-temporal-bnds branch from 12afd27 to 3bf227c Compare November 22, 2024 22:11
@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Nov 22, 2024

@pochedls and @oliviermarti this PR should address this GH issue (same as this comment from @oliviermarti).

If you can check this branch out and try it that'd be great.

import numpy as np
import pandas as pd
import xcdat as xc

# Create a dummy xarray dataset
time = pd.date_range("2000-01-01", "2001-12-31", freq="D")
data = np.random.rand(len(time))
dummy_ds = xr.Dataset({"dummy_var": (["time"], data)}, coords={"time": time})
dummy_ds["time"].encoding["calendar"] = "standard"
dummy_ds = dummy_ds.bounds.add_missing_bounds(axes=["T"])

ds_avg = dummy_ds.temporal.group_average("dummy_var", freq="month")

Before -- no time_bnds and time starts at the beginning of the averaged period

ds_avg.time

<xarray.DataArray 'time' (time: 24)> Size: 192B
array([cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 2, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 3, 1, 0, 0, 0, 0, has_year_zero=False),
		...
      dtype=object)
Coordinates:
  * time     (time) object 192B 2000-01-01 00:00:00 ... 2001-12-01 00:00:00
Attributes:
    bounds:   time_bnds

Result -- time is now centered using time_bnds

ds_avg.time

array([cftime.DatetimeGregorian(2000, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 2, 15, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
		...
      dtype=object)
ds_avg.time_bnds

array([[cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 2, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 3, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 3, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 4, 1, 0, 0, 0, 0, has_year_zero=False)],
		...
      dtype=object)

@pochedls
Copy link
Collaborator

@tomvothecoder – this is great – thanks for pushing this forward so quickly.

I think add_missing_bounds will work in most cases, but will fail for seasonal averages (and definitely custom seasons).

I think we'll need to collect the bounds for each group, (e.g., group_bounds_array = [("2000-01-01 00:00", "2000-01-02 00:00"), ("2000-01-02 00:00", "2000-01-03 00:00"), ..., ("2000-01-31 00:00", "2000-02-01 00:00")] and then take the min of the lower bound and the max of the upper bound (i.e., group_bnd = [np.min(groups_bound_array[:, 0]), np.max(group_bounds_array[:, 1])].

@tomvothecoder
Copy link
Collaborator Author

I think we'll need to collect the bounds for each group, (e.g., group_bounds_array = [("2000-01-01 00:00", "2000-01-02 00:00"), ("2000-01-02 00:00", "2000-01-03 00:00"), ..., ("2000-01-31 00:00", "2000-02-01 00:00")] and then take the min of the lower bound and the max of the upper bound (i.e., group_bnd = [np.min(groups_bound_array[:, 0]), np.max(group_bounds_array[:, 1])]

This makes sense to me. I'll think of an algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

[Feature]: Retain bounds and compute time point for group averaging operations
2 participants