Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change history time to be equal to the middle of the time bounds #2838

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

slevis-lmwg
Copy link
Contributor

@slevis-lmwg slevis-lmwg commented Oct 18, 2024

Description of changes

This PR subsets the scope of issue #1059 and PR #2445 as a result of the October 2024 conversation in #2445.
This PR changes history time to be equal to the middle of the time bounds.
This PR does not put instantaneous fields on their own separate history files.

I will also bring submodule changes from ESCOMP/MOSART#106 (was ESCOMP/MOSART#69) and ESCOMP/RTM#39.

Specific notes

Contributors other than yourself, if any:

Are answers expected to change (and if so in what way)?
No.

Does this create a need to change or add documentation? Did you do so?
Maybe. No.

Testing performed, if any:
Plan to run aux_clm, mosart, rtm test-suites.

slevis-lmwg and others added 3 commits March 28, 2024 17:27
...and other mods that I'm preserving from closed PR ESCOMP#2019, such as
- changes to long_names and
- treating avgflag as a tape (not field) trait for 'I' and 'L' tapes
@slevis-lmwg slevis-lmwg self-assigned this Oct 18, 2024
@slevis-lmwg slevis-lmwg added enhancement new capability or improved behavior of existing capability bfb bit-for-bit size: small labels Oct 18, 2024
@slevis-lmwg
Copy link
Contributor Author

I submitted this manual test to confirm that the committed modifications work as intended:
./create_test SMS_Lm1.f10_f10_mg37.I1850Clm60BgcCropCmip6waccm.derecho_gnu.clm-basic -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.009
Check on Friday.

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Oct 18, 2024

The previous test completed its 1 month and the monthly output looked good, but there were annual history files that I could not tell. So I started another test (default is Ly1, but I changed to Ly2) and I added hist_avgflag_pertape(6) = 'I' to see what happens:
SMS_Ly2_Mmpi-serial.1x1_brazil.IHistClm60BgcQianRs.derecho_intel.clm-output_bgc_highfreq

PASS

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Oct 18, 2024

I updated the submodules to point to ESCOMP/MOSART#106 and ESCOMP/RTM#39 and submitted the three corresponding test-suites:

OK ./run_sys_tests -s rtm -c rtm1_0_80-ctsm5.2.029 --skip-generate
OK ./run_sys_tests -s mosart -c mosart1.1.02-ctsm5.2.029 --skip-generate
OK ./run_sys_tests -s aux_clm -c ctsm5.3.009 --skip-generate

All the cases that differ from the baseline, differ only in the time variable.

UPDATE
Repeating the rtm and mosart test-suites with the suggested code modification:
ESCOMP/MOSART#106 (comment)

@slevis-lmwg slevis-lmwg marked this pull request as ready for review October 23, 2024 21:11
@slevis-lmwg slevis-lmwg added the external issue needs to be addressed elsewhere (submodule); issue here for the sake of project tracking label Oct 24, 2024
@wwieder wwieder removed the bfb bit-for-bit label Oct 24, 2024
@slevis-lmwg slevis-lmwg added PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete PR status: awaiting review Work on this PR is paused while waiting for review. labels Oct 24, 2024
@slevis-lmwg
Copy link
Contributor Author

@ekluzek and I agreed on the order that the "hist" PRs would get merged. The order as shown in Upcoming Tags is
#2838
#2084
#2052
...and we will follow the same order in mosart/rtm. Before I merge the mosart/rtm "hist" PRs, @ekluzek will merge the work in the "simple bfb" mosart/rtm cards.

slevis-lmwg added a commit to olyson/MOSART that referenced this pull request Nov 11, 2024
time in hist now equals the middle of the time_bounds

MOSART equivalent to CTSM work done in ESCOMP/CTSM#2838
Answers change only for the time variable.

slevis resolved conflicts:
src/riverroute/RtmHistFile.F90
src/riverroute/RtmTimeManager.F90
slevis-lmwg added a commit to olyson/RTM that referenced this pull request Nov 11, 2024
time in hist now equals the middle of the time_bounds

RTM equivalent to CTSM work done in ESCOMP/CTSM#2838
Answers change only for the time variable.
@slevis-lmwg slevis-lmwg requested a review from ekluzek November 12, 2024 22:47
@slevis-lmwg
Copy link
Contributor Author

@ekluzek review and approval of this PR should take 5 minutes, as it looks the same as the corresponding RTM and MOSART PRs that you reviewed/approved. Thanks :-)

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 12, 2024

TODOs left for me:

  • Update to ctsm5.3.012
  • Update .gitmodules
  • mosart/rtm test suites
  • Run aux_clm
  • Merge and make tag

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 13, 2024

izumi testing
OK ./run_sys_tests -s aux_clm -c ctsm5.3.012 -g ctsm5.3.013
OK ./run_sys_tests -s mosart -c mosart1.1.04_ctsm5.3.009 -g mosart1.1.04_ctsm5.3.013

derecho testing
OK ./run_sys_tests -s rtm -c rtm1_0_82-ctsm5.3.009 -g rtm1_0_82-ctsm5.3.013
OK ./run_sys_tests -s mosart -c mosart1.1.04-ctsm5.3.009 -g mosart1.1.04-ctsm5.3.013
FAIL ./run_sys_tests -s aux_clm -c ctsm5.3.012 -g ctsm5.3.013
RXCROPMATURITYSKIPGEN_Ld1097.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput
with a conda error. Troubleshooting with @samsrabin

Also I'm getting diffs in the cpl and mosart output of two tests BUT both are 3-yr tests:

ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.cpl.hi.1853-01-01-00000.nc.cprnc.out: RMS rofImp_Forr_rofi_glc             5.2778E-06            NORMALIZED  5.5871E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.cpl.hi.1853-01-01-00000.nc.cprnc.out: RMS rofImp_Forr_rofl_glc             7.6124E-12            NORMALIZED  7.1083E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS DIRECT_DISCHARGE_TO_OCEAN_GLC_IC 2.7946E+00            NORMALIZED  5.2264E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS DIRECT_DISCHARGE_TO_OCEAN_GLC_LI 8.5847E-06            NORMALIZED  7.1784E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS QGLC_ICE_INPUT                   2.0309E+00            NORMALIZED  3.7981E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS QGLC_LIQ_INPUT                   8.2673E-06            NORMALIZED  6.9130E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS TOTAL_DISCHARGE_TO_OCEAN_ICE     2.7946E+00            NORMALIZED  1.4117E+01
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS TOTAL_DISCHARGE_TO_OCEAN_LIQ     8.5847E-06            NORMALIZED  2.3013E-06

SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.cpl.hi.0004-02-01-00000.nc.cprnc.out: RMS rofImp_Forr_rofi_glc             3.2518E-06            NORMALIZED  4.0968E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.cpl.hi.0004-02-01-00000.nc.cprnc.out: RMS rofImp_Forr_rofl_glc             3.9452E-11            NORMALIZED  7.5332E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS DIRECT_DISCHARGE_TO_OCEAN_GLC_IC 1.8679E+00            NORMALIZED  3.8169E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS DIRECT_DISCHARGE_TO_OCEAN_GLC_LI 4.4943E-05            NORMALIZED  7.8313E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS QGLC_ICE_INPUT                   1.5340E+00            NORMALIZED  3.1346E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS QGLC_LIQ_INPUT                   4.4188E-05            NORMALIZED  7.6998E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS TOTAL_DISCHARGE_TO_OCEAN_ICE     1.8679E+00            NORMALIZED  7.2051E+00
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS TOTAL_DISCHARGE_TO_OCEAN_LIQ     4.4943E-05            NORMALIZED  1.5745E-05

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 14, 2024

The diffs of the two tests above seem vaguely related to the earlier update to mosart1.1.02 (ESCOMP/MOSART#94), but Adrianna's test passed just fine pointing to mosart1.1.02. So I will try the two tests pointing to mosart1.1.02 and mosart1.1.03:

./create_test SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.012
DIFF in mosart1.1.03
OK in mosart1.1.02

The same test from ctsm5.3.012: PASS

DID THESE TESTS EXIST WHEN I LAST RAN aux_clm? Yes (ctsm5.3.009). So now I checked out 1e81456 from above, pointed to mosart1.1.04/rtm1_0_82, and submitted:
DIFF ./create_test SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.009
which tells me that the diffs were there, and I didn't notice the first time I ran aux_clm.
BUT the same test pointing to mosart1.1.02 passes.

@slevis-lmwg
Copy link
Contributor Author

I brainstormed for a bit with @billsacks and Bill pointed out/suggested:

  • these two are the only long tests with active cism
  • the changes appear in the coupler due to the changes in mosart and not due to changes in ctsm
  • that he would not expect these diffs (as I also didn't), so he would recommend going through a methodical way of testing, making baselines, and updating the code and the baselines to confirm whether I still get these diffs

@slevis-lmwg
Copy link
Contributor Author

@billsacks I mentioned to you a vague memory I had of an issue that could relate to these diffs, and it is this one:
#2542

@billsacks
Copy link
Member

I mentioned to you a vague memory I had of an issue that could relate to these diffs, and it is this one:
#2542

Ah, yes. But I don't think that should be the issue with these tests, right?

@slevis-lmwg
Copy link
Contributor Author

Right, I don't think so. I think I have now found that the problem starts with the introduction of mosart1.1.03. I see this in a new test today and in my testing from yesterday (somehow I missed the sign when I looked originally). First I will confirm beyond doubt and then I will try bisecting mosart1.1.03 to find the culprit.

src/main/histFileMod.F90 Outdated Show resolved Hide resolved
@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 14, 2024

then I will try bisecting mosart1.1.03 to find the culprit.

Looking at ESCOMP/MOSART#70, I have converged on two commits:
We removed two lines: 7749459
Instead of removing the two lines, we added if-statemt that is .false.: 692d183

I have confirmed that the two lines that we removed caused the diffs.
@ekluzek I will check with you how to resolve this.

My first guess: Keep the if-statement but need changes elsewhere to make the if-statement be true as suggested in ESCOMP/MOSART#103

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 15, 2024

Trying a case with cism NOT active and the if-statement still commented out (as in my last test)
PASS ./create_test SMS_D.f10_f10_mg37.I2000Clm60Bgc.derecho_intel -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.012

Submitted ./run_sys_tests -s aux_clm -c ctsm5.3.012 -g ctsm5.3.013
with the if-statement still commented out, to take advantage of the computer overnight.
FAIL RXCROPMATURITYSKIPGEN_Ld1097.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput
Troubleshooting with @samsrabin

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 15, 2024

Rerunning this test to generate a baseline, but stuck in the SHAREDLIB_BUILD phase, so I will kill it and try again next week. Besides, I will need to generate a ctsm5.3.014 baseline, so rerunning right now is redundant:
./create_test RXCROPMATURITYSKIPGEN_Ld1097.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.012 -g /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.012_hist_time_mid

@samsrabin
Copy link
Collaborator

I'm wondering if the "treat a file as instantaneous if its first variable is" might be premature to bring in here instead of #2445. Specifically, I think the time_bounds variable should for now be saved no matter what. Not having it messed up the RXCROPMATURITY test, and though I was able to work around it, others might not be.

@slevis-lmwg
Copy link
Contributor Author

[...] Specifically, I think the time_bounds variable should for now be saved no matter what. Not having it messed up the RXCROPMATURITY test, and though I was able to work around it, others might not be.

@olyson what do you think about @samsrabin's comment? Most concerning to me would be any vulnerability in the land diagnostic package.

@olyson
Copy link
Contributor

olyson commented Nov 21, 2024

I don't think the land diagnostics package uses time_bounds. But I may not understand the issue here. Seems like we could do a short simulation using this branch once stable and see if there are any problems? Also should check ILAMB. So maybe an I2000 case for a test.

@samsrabin
Copy link
Collaborator

samsrabin commented Nov 21, 2024

@olyson Okay, good that the land diagnostics package probably doesn't use it, but yes would be nice to check. However, I'm thinking that other people's scripts might rely on the presence of time_bounds, as mine did. Removing it at this stage seems premature because it is still possible to have both instantaneous and average/etc. variables on the same file. Users might wonder (as I did) why time_bounds disappeared just because they happened to put an instantaneous variable first in the hist_fincl list.

@samsrabin
Copy link
Collaborator

And actually, this comment goes for the "exact middle" vs. "end of" change as well. It seems arbitrary (and against the "principle of least astonishment") that the first variable in the hist_fincl list should affect this. My understanding was that this would be changed to "exact middle" for all history files for now, and we would just accept that being wrong for instantaneous variables.

@samsrabin
Copy link
Collaborator

A bonus from what I'm proposing: Always (a) including time_bounds and (b) setting time to the exact middle means that people always have what they need for postprocessing either instantaneous variables (just look at the second value in time_bounds) or averaged/etc. variables (either both values in time_bounds or the value in time).

@olyson
Copy link
Contributor

olyson commented Nov 21, 2024

The standard diagnostics package doesn't use time_bounds as far as I can tell. ILAMB may, it just needs to be tested. Always including time_bounds sounds fine to me.

@ekluzek
Copy link
Collaborator

ekluzek commented Nov 21, 2024

time_bounds is an expected part of the CF convention, so I do endorse using it for anything with time. That's likely why some tools might expect it to be there.

For instantaneous it should likely be the time bounds of the time-step that was output. So the previous time-step time first to the ending time-step time for the endpoint. You could have both being the same ending time-step, but that doesn't show that it is a model with a finite time-step.

Here's information on the CF Convention attributes. Look up "bounds"...

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#attribute-appendix

@samsrabin
Copy link
Collaborator

@ekluzek That link contains the following text, which implies that instantaneous files should actually not have time_bounds:

It is often the case that data values are not representative of single points in time and/or space, but rather of intervals or multidimensional cells. This convention defines a bounds attribute to specify the extent of intervals or cells.

@ekluzek
Copy link
Collaborator

ekluzek commented Nov 21, 2024

Good point @samsrabin. That convinces me we should remove it for I fields then. That would then be in line with the convention.

We don't similarly report on the grid cell bounds either (which could be done for 2D grids, but would be harder for unstructured grids), so we shouldn't for Instantaneous time fields either.

@billsacks
Copy link
Member

Earlier discussion led to the conclusion that we should not have time_bounds in instantaneous files (though CAM's mistakenly still does): ESCOMP/CAM#1166

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 21, 2024

I had a quick meeting with @samsrabin 40 minutes ago:
I proposed and he agreed to an alternate version of the if-statement that rightly concerned him. The alternate version eliminates the risk of wrongly labeling a tape "instantaneous" just because the first field is instantaneous.

Oh, good, looks as though we're removing time_bounds from instantaneous tapes, as originally planned :-)

Ok, so I'm testing the alternate if-statement with aux_clm right now and then I will push it to the PR.

  • I must still make the same change to the corresponding rtm/mosart file.

How I'm checking whether diffs are expected:

./cs.status.fails | grep -v PASS | grep -v 'wise bit-for' | grep -v 'd_1: DIF'
grep NORM */*cprnc.out | grep -v time | grep -v 'ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_intel.clm-FatesColdTwoStream' | grep -v 'P_P64x2_Lm13.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings'

OK aux_clm except failure reported in the following post.

@slevis-lmwg
Copy link
Contributor Author

@samsrabin back to this test
/glade/derecho/scratch/slevis/tests_1121-124652de/RXCROPMATURITYSKIPGEN_Ld1097.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput.GC.1121-124652de_int.gddgen
I get a new error. The problem and solution are not obvious to me from a quick look, but I'm happy to look together if you want.

@slevis-lmwg
Copy link
Contributor Author

@samsrabin I'm thinking please hold regarding my last post for now, because in my aux_clm tests with the two subsequent PRs, which included this one, the test passed.

@samsrabin samsrabin removed the PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete label Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability external issue needs to be addressed elsewhere (submodule); issue here for the sake of project tracking PR status: awaiting review Work on this PR is paused while waiting for review. size: small
Projects
Status: In progress - master/b4b-dev
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants