pyCECT test

CECT suite contains multiple tests is to compare the results of a set of new (modified) CESM or MPAS-A simulations against the accepted ensemble. An overall pass or fail is designated. Current functionality in the CECT suite includes:

CESM Atmosphere component (CAM):

CAM-ECT: examines yearly-average files from CAM
UF-CAM-ECT: examine history files from CAM

Both CAM-ECT and UF-CAM-ECT require a summary file generated by pyEnsSum.py. UF-CAM-ECT uses simulations of nine time-steps in length, while CAM-ECT uses yearly averages. The faster UF-CAM-ECT is always suggested to start with. (The CAM-ECT is typically only used in the case of an unexpected UF-CAM-ECT fail.) Three simulation runs from the new test environment are recommended for both of these tests. More information is available in:

Daniel J. Milroy, Allison H. Baker, Dorit M. Hammerling, and Elizabeth R. Jessup, “Nine time steps: ultra-fast statistical consistency testing of the Community Earth System Model (pyCECT v3.0)”, Geoscientific Model Development, 11, pp. 697-711, 2018.

https://gmd.copernicus.org/articles/11/697/2018/

CESM Ocean Component (POP):

POP-ECT: examines monthly-average files from POP

POP-ECT requires a summary file generated by pyEnsSumPop.py and uses monthly output, typically from a single year. One simulation run from the new test environment is needed. More information is available in:

A.H. Baker, Y. Hu, D.M. Hammerling, Y. Tseng, X. Hu, X. Huang, F.O. Bryan, and G. Yang, “Evaluating Statistical Consistency in the Ocean Model Component of the Community Earth System Model (pyCECT v2.0).” Geoscientific Model Development, 9, pp. 2391-2406, 2016.

https://gmd.copernicus.org/articles/9/2391/2016/

MPAS Atmosphere Component:

MPAS-ECT: examines history files from MPAS-A

MPAS-ECT requires a summary file generated by pyEnsSumMPAS.py and uses short simulations typically 18 timesteps in length. Three simulation runs from the new test environment are recommended.

Manuscript in preparation.

To use pyCECT:

On NCAR’s Derecho machine:

Example scripts are given in test_uf_cam_ect.sh , test_pop_CECT,sh, and test_mpas_CECT.sh.

Modify as needed and do:

qsub test_uf_cam_ect.sh or qsub test_pop_CECT.sh or qsub test_mpas_CECT.sh.

Note that the python environment is loaded in the script: module load conda conda activate npl
Otherwise you need these packages (see requirements.txt):
- numpy
- scipy
- netcdf4
- mpi4py
To see all options (and defaults):

python pyCECT.py -h

Notes and examples:

Options for all CECT approaches:

Required:
- To specify the summary file generated by pyEnsSum.py
  
  --sumfile ens.summary.nc
- To specifying the directory path that contains the run(s) to be evaluated:
  
  --indir /glade/campaign/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_test_file
Optional:

For verbose information:

--verbose

CAM-ECT and UF-CAM-ECT and MPAS-ECT specific options (and summary file generated by pyEnsSum.py or pyEnsSumMPAS.py)
- Note that CAM-ECT/UF-CAM-ECT is the default test.
  
  For MPAS, you MUST add --mpas
- The parameters setting the pass/fail criteria are all set by default (ie. sigMul, minPCFail, minRunFail, numRunFile, and nPC), but can be modified with command line params if desired. (See python pyCECT.py -h)
- If the specified indir contains more files than the number specified by
  
  --numRunFile <num>
  
  (default= 3), then <num> files will be chosen at random from that directory.
- The Ensemble Exhaustive Test (EET) is specified by
  
  --eet <num>
  
  This tool computes the failure rate of <num> tests taken <numRunFile> at a time. Therefore, when specifying --eet <num>, <num> must be greater than or equal to <numRunFile>.
- Please make sure that the timeslice that you are comparing to matches what has been collected by the summary file. The default is 0, corresponding to the most recent CESM releases, but note that older versions of CESM output the initial timestep at timeslice 0 and the annual average (CAM-ECT) or ninth timestep(CAM-UF-ECT) as timeslice 1. The example MPAS-ECT given int test_mpas_CECT.py uses timeslice 3, for example.
--tslice <num>
- To modify the number of PCs (principal components) used for the test. (The default is typically recommended.)
  
  --nPC <num>
- To modify the number of standard deviations away from the mean for the acceptance region. (The default is typically recommended.)
  
  --sigMul <num>
- To enable printing a sorted list of variables that fall outside of the global mean ensemble distribution in the case of a passing result (on by default for a failure):
  
  --printStdMean
- To save a netcdf file with scores and std global means from the test runs as well as ensemble information (called savefile.nc). (Note: This file can be helpful for doing further analysis in the case of a failure.):
  
  --saveResults
- Example for CAM-ECT and CAM-UF-ECT
  
  (Here we are modifying nPC and sigMul as this is from an older version of CAM than the current default.)
python pyCECT.py --sumfile /glade/campaign/cisl/asap/pycect_sample_data/cam_c1.2.2.1/summary_files/uf.ens.c1.2.2.1_fc5.ne30.nc --indir /glade/campaign/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_test_files --tslice 1 --nPC 50 --sigMul 2.0

Example using EET (note that EET takes longer to run - especially for a large number of tests):

python pyCECT.py --sumfile /glade/p/cisl/asap/pycect_sample_data/cam_c1.2.2.1/summary_files/uf.ens.c1.2.2.1_fc5.ne30.nc --indir /glade/p/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_test_files --tslice 1 --eet 10 --nPC 50 --sigMul 2.0

Example for MPAS-ECT

python pyCECT.py --sumfile /glade/campaign/cisl/asap/pycect_sample_data/mpas_a.v7.3/summary_files/mpas_sum.nc --indir /glade/campaign/cisl/asap/pycect_sample_data/mpas_a.v7.3/mpas_test_files --tslice 3 --mpas

POP-ECT specific options (and summary file generated by pyEnsSumPop.py)
- To use POP-ECT, you MUST add the following to enable this test (otherwise is will run UF-CAM-ECT/CAM-ECT):
--popens
- Be sure to use a POP-ECT summary file:
--sumfile /glade/campaign/cisl/asap/pycect_sample_data/pop_c2.0.b10/summary_files/pop.cesm2.0.b10.nc
- Directory path that contains the run(s) to be evaluated.
--indir /glade/campaign/cisl/asap/pycect_sample_data/pop_c2.0.b10/pop_test_files/C96
- The above directory may contain many POP history files that following the standard
  CESM-POP naming convention. To specific which file or files you wish to test, you simply specifying the test case file prefix (like a wildcard expansion).
  - To compare against all months in year 2 from the input directory above:
  --input_glob C96.pop.000.pop.h.0002
  - To compare only against month 12 in year 1:
  --input_glob C96.pop.000.pop.h.0001-12
  - (Note: if input_glob is not specified, all files in –indir will be compared)
  - (Note: the recommendation is to just compare year 1, month 12)
- Be sure to specify the json file that includes the variables which will be run the test on:
--jsonfile pop_ensemble.json
- The parameters setting the pass/fail criteria are all set by default (ie. pop_tol, pop_threshold) but may be modified:
  - Specifying test tolerance (the minimum Z-score threshold):
  --pop_tol 3.0
  - Specifying pop threshold (fraction of points that must satisfy the Z-score tolerance):
  --pop_threshold 0.9

Example:

python pyCECT.py --popens --sumfile /glade/campaign/cisl/asap/pycect_sample_data/pop_c2.0.b10/summary_files/pop.cesm2.0.b10.nc --indir /glade/campaign/cisl/asap/pycect_sample_data/pop_c2.0.b10/pop_test_files/C96 --jsonfile pop_ensemble.json --input_glob C96.pop.000.pop.h.0001-12