icclim.index()#

icclim exposes a main entry point with icclim.index(). It can be used to compute ECA&D indices, DCSC indices or generic indices. There are numerous arguments, but only a few are needed to compute simple indices. Our How to recipes are also a good start to have an idea on how to use icclim.index.

Note

Since version 5.2.0, icclim API now includes each individual index as a standalone function. Check ECA&D indices to see how to call them.

Compute climat indices#

icclim.index(**kwargs)[source]

Compute climate index.

This is the main entry point for icclim.

Warning

The user_index parameter is deprecated. Please use the generic indices API instead.

Parameters:

in_files (str | list[str] | Dataset | DataArray | InputDictionary) – Absolute path(s) to NetCDF dataset(s), including OPeNDAP URLs, or path to zarr store, or xarray.Dataset or xarray.DataArray.
index_name (str | StandardIndex) – Climate index name. For ECA&D index, case insensitive name used to lookup the index. For user index, it’s the name of the output variable.
var_name (str | list[str] | None) – optional Target variable name to process corresponding to in_files. If None (default) on ECA&D index, the variable is guessed based on the climate index wanted. Mandatory for a user index.
slice_mode (FrequencyLike | Frequency) – Type of temporal aggregation: The possibles values are {"year", "month", "DJF", "MAM", "JJA", "SON", "ONDJFM" or "AMJJAS", ("season", [1,2,3]), ("month", [1,2,3,])} (where season and month lists can be customized) or any valid pandas frequency. A season can also be defined between two exact dates: ("season", ("19 july", "14 august")). Spatially varying seasons can be defined by providing a tuple of two xarray.DataArray objects (start day-of-year and end day-of-year): (start_da, end_da). Default is “year”. See slice_mode for details.
time_range (list[datetime.datetime ] | list[str] | tuple[str, str] | None) – optional Temporal range: upper and lower bounds for temporal subsetting. If None, whole period of input files will be processed. The dates can either be given as instance of datetime.datetime or as string values. For strings, many format are accepted. Default is None.
out_file (str | None) – Output NetCDF file name (default: “icclim_out.nc” in the current directory). Default is “icclim_out.nc”. If the input in_files is a Dataset, out_file field is ignored. Use the function returned value instead to retrieve the computed value. If out_file already exists, icclim will overwrite it!
threshold (float | list[float] | None) – optional User defined threshold for certain indices. Default depend on the index, see their individual definition. When a list of threshold is provided, the index will be computed for each thresholds.
transfer_limit_Mbytes (float) – Deprecated, does not have any effect.
callback (Callable[[int], None]) – optional Progress bar printing. If None, progress bar will not be printed.
callback_percentage_start_value (int) – optional Initial value of percentage of the progress bar (default: 0).
callback_percentage_total (int) – optional Total percentage value (default: 100).
base_period_time_range (list[datetime.datetime ] | list[str] | tuple[str, str] | None) – optional Temporal range of the reference period. The dates can either be given as instance of datetime.datetime or as string values. It is used either: #. to compute percentiles if threshold is filled. When missing, the studied period is used to compute percentiles. The study period is either the dataset filtered by time_range or the whole dataset if time_range is missing. For day of year percentiles (doy_per), on extreme percentiles the overlapping period between base_period_time_range and the study period is bootstrapped. #. to compute a reference period for indices such as difference_of_mean (a.k.a anomaly) if a single variable is given in input.
doy_window_width (int) – optional Window width used to aggreagte day of year values when computing day of year percentiles (doy_per) Default: 5 (5 days).
min_spell_length (int) – optional Minimum spell duration to be taken into account when computing the sum_of_spell_lengths.
rolling_window_width (int) – optional Window width of the rolling window for indicators such as {max_of_rolling_sum, max_of_rolling_average, min_of_rolling_sum, min_of_rolling_average}
run_index (str | None) – optional The index to use for the run length encoding (e.g. “first”, “last”, “mid”). Default is “first”. Ignored for non spell indices.
only_leap_years (bool) – optional Option for February 29th (default: False).
ignore_Feb29th (bool) – optional Ignoring or not February 29th (default: False).
interpolation (str | QuantileInterpolation | None) – optional Interpolation method to compute percentile values: {"linear", "median_unbiased"} Default is “median_unbiased”, a.k.a type 8 or method 8. Ignored for non percentile based indices.
out_unit (str | None) – optional Output unit for certain indices: “days” or “%” (default: “days”).
netcdf_version (str | NetcdfVersion) – optional NetCDF version to create (default: “NETCDF3_CLASSIC”).
user_index (UserIndexDict) – optional A dictionary with parameters for user defined index. Deprecated, use generic indices instead. See Create your own index with user_index. Ignored for ECA&D indices.
save_thresholds (bool) – optional True if the thresholds should be saved within the resulting netcdf file (default: False).
date_event (bool) – When True the date of the event (such as when a maximum is reached) will be stored in coordinates variables. warning This option may significantly slow down computation.
logs_verbosity (str | Verbosity) – optional Configure how verbose icclim is. Possible values: {"LOW", "HIGH", "SILENT"} (default: “LOW”)
sampling_method (str) – Choose whether the output sampling configured in slice_mode is a groupby operation or a resample operation (as per xarray definitions). Possible values: {"groupby", "resample", "groupby_ref_and_resample_study"} (default: “resample”) groupby_ref_and_resample_study may only be used when computing the difference_of_means (a.k.a the anomaly).
indice_name (str | None) – DEPRECATED, use index_name instead.
user_indice (dict | None) – DEPRECATED, use user_index instead.
window_width (int) – DEPRECATED, use doy_window_width, min_spell_length or rolling_window_width instead.
save_percentile (bool) – DEPRECATED, use save_thresholds instead.
allow_partial_seasons (bool | "start" | "end") – Flag indicating whether to allow partial seasons to be included in the index calculation. - True: Unmasks both the first and last periods. - False: Masks any incomplete periods (standard behavior). - “start”: Unmasks only the first period. - “end”: Unmasks only the last period. Default is False.

Examples

Compute Summer Days (SU) from an in-memory xarray DataArray:

>>> import numpy as np, pandas as pd, xarray as xr, icclim
>>> time = pd.date_range("2000-01-01", periods=365, freq="D")
>>> tasmax = xr.DataArray(
...     np.full(365, 303.15),
...     coords={"time": time},
...     dims=["time"],
...     attrs={"units": "K"},
... )
>>> result = icclim.index(in_files=tasmax, index_name="SU", var_name="tasmax")
>>> int(result["SU"].isel(time=0).values)
365

Compute a generic index with spatially varying seasons:

>>> import numpy as np, pandas as pd, xarray as xr, icclim
>>> time = pd.date_range("2000-01-01", periods=366, freq="D")
>>> tas = xr.DataArray(
...     np.full((366, 1, 2), 300.0),
...     coords={"time": time, "lat": [45], "lon": [5, 10]},
...     dims=["time", "lat", "lon"],
...     attrs={"units": "K"},
... )
>>> # Pixel 1: season is JJA (doy 153 to 244 in leap year)
>>> # Pixel 2: season is SON (doy 245 to 335 in leap year)
>>> start = xr.DataArray(
...     [[153, 245]], dims=["lat", "lon"], coords={"lat": [45], "lon": [5, 10]}
... )
>>> end = xr.DataArray(
...     [[244, 335]], dims=["lat", "lon"], coords={"lat": [45], "lon": [5, 10]}
... )
>>> result = icclim.index(
...     in_files=tas,
...     index_name="TG",
...     slice_mode=(start, end),
... )
>>> # TG is the mean temperature over the season.
>>> # Since all values are 300K (26.85°C), the result should be ~26.85
>>> round(float(result["TG"].isel(time=0, lat=0, lon=0).values), 2)
26.85

Compute an index with an incomplete season at the end (Hellmann style):

>>> import numpy as np, pandas as pd, xarray as xr, icclim
>>> time = pd.date_range("2020-01-01", "2021-12-31", freq="D")
>>> tas = xr.DataArray(
...     np.full(len(time), 30.0),
...     coords={"time": time},
...     dims=["time"],
...     attrs={"units": "degC"},
... )
>>> slice_mode = ("season", ("1 november", "31 march"))
>>> result = icclim.index(
...     in_files=tas,
...     index_name="SU",
...     slice_mode=slice_mode,
...     allow_partial_seasons=True,
... )
>>> # The last season (2021-11-01 to 2022-03-31) is partial (61 days in 2021)
>>> int(result["SU"].values[-1])
61

Note

For the variable names see the correspondence table “index - source variable”

Below are some additional information about input parameters.

`in_files` and `var_name`#

The in_files parameter can be

A string path to a netCDF file or a zarr store
A list of strings to represent multiple netCDF files to combine
A xarray.Dataset
A xarray.DataArray
A python dictionary

var_name is an optional string, or a list of string, to clarify wich variables must be used from the input in_files. var_name can be omitted if the variables’ name can be guessed for standard indices.

For example Cold and Wet (CW()) index needs a daily mean temperature variable named ‘tas’ (or any of its aliases) and a precipitation variable ‘pr’ (or any of its aliases). See StandardVariable for a list of all the aliases of each standardized variable. So, if a ‘mean_temperatures.nc’ contains a tas variable and ‘precipitations.nc’ a pr variable, the following is sufficient to compute CW.

icclim.cw(in_files=["mean_temperatures.nc", "precipitations.nc"]).compute.CW

In case variables’ name cannot be guessed, you can explicitly name the variable you wish to read from the input file:

icclim.cw(in_files={"customTas": mean_temperatures.nc, "pr": "precipitations.nc"})

# equivalent to
icclim.cw(in_files=["mean_temperatures.nc", "precipitations.nc"], var_name=["custom_tas", "pr"])

The order in which variables are passed matters and must follow the input_variables property defined in the respective index of EcadIndexRegistry.

Starting with icclim 5.3, in_files can describe variable names, formerly set in var_name, as dictionary format. The dictionary keys are variable names and values are the usual in_files types (netCDF, zarr, Dataset, DataArray).

in_files = {"tasmax": "tasmax.nc", "pr": "precip.zarr"}

Moreover, this new dictionary syntax can be used to specify a different set of files for percentiles.

in_files = {"tasmax": "tasmax.nc", "thresholds": "tasmax-90p.zarr"}

The thresholds input should contain percentile thresholds that will be used in place of computing them. It allows to reuse pre-computed percentiles stored in a file. But the percentiles will not be bootstrapped.

Note

Percentiles can be saved with save_percentile parameter of icclim.index.

`slice_mode`#

The slice_mode parameter defines a desired temporal aggregation. Thus, each index can be calculated at annual, winter half-year, summer half-year, winter, spring, summer, autumn and monthly frequency:

Value (string)	Description
`year` (default)	annual
`month`	monthly (all months)
`ONDJFM`	winter half-year
`AMJJAS`	summer half-year
`DJF`	winter
`MAM`	spring
`JJA`	summer
`SON`	autumn
`['month', [4,5,11]]`	monthly sampling filtered
`['season', [4,5,6]]`	seasonal (1 value per season)
`['clipped_season', [4,5,6]]`	seasonal (1 value per season) spells starting before season start are not accounted
`3W`	A valid pandas frequency (3 weeks here)

The winter season (DJF) of 2000 is composed of December 2000, January 2001 and February 2001.

Likewise, the winter half-year (ONDJFM) of 2000 includes October 2000, November 2000, December 2000, January 2001, February 2001 and March 2001.

Monthly time series filter#

Monthly time series with months selected by user (the keyword can be either month or months):

# index will be computed only for April, May and November
slice_mode = ["month", [4, 5, 11]]
# index will be computed only for April
slice_mode = ["month", [4]]

User defined seasons#

You can either defined seasons aware of data outside their bounds (keyword season) or seasons which clip all data outside their bounds (keyword clipped_season).

The later is most useful on indices computing spells, if you want to totally ignore spells that could have started before your custom season.

slice_mode = ["season", [4, 5, 6, 7]]  # March to July un-clipped
slice_mode = ["clipped_season", [4, 5, 6, 7]]  # March to July clipped

slice_mode = ["season", [11, 12, 1]]  # November to January un-clipped
slice_mode = ["clipped_season", ([11, 12, 1])]  # November to January clipped

Additionally, you can define a season between two exact dates:

slice_mode = ["season", ["07-19", "08-14"]]

slice_mode = ["clipped_season", ["07-19", "08-14"]]

Spatially varying seasons#

For indices where the season changes across the grid (e.g., following a crop growth stage or a local climatic boundary), you can provide a tuple of two xarray.DataArray objects. These DataArrays must contain the start and end day-of-year for each grid point.

This feature is available for all indices in the generic framework.

# start_da and end_da are DataArrays with same spatial dimensions as input data
slice_mode = (start_da, end_da)

# You can also specify the resampling frequency (default is "YS")
slice_mode = ((start_da, end_da), "2YS")

`allow_partial_seasons`#

The allow_partial_seasons parameter (default False) determines whether incomplete seasons at the very beginning or the very end of the time series should be included in the results.

By default, icclim masks seasons that do not have daily data for their entire duration (reporting them as NaN). Enabling this flag preserves these partial periods, which is useful for near-real-time monitoring or for replicating datasets like the KNMI Hellmann values.

# Include the ongoing season at the end of the data
icclim.index(
    in_files=ds,
    index_name="SU",
    slice_mode=("season", ("1 november", "31 march")),
    allow_partial_seasons=True
)

Note

With 5.3.0 icclim now accepts pandas string frequency for slice_mode to resample the output data to a given frequency There are multiple combinations possible such as: “2YS-FEB” to resample data on two (2) years (Y) starting (S) in February (FEB). For further information, refer to pandas offset aliases.

`threshold`#

It is possible to set a user define threshold for the following indices:

SU (default threshold: 25°C)
CSU (default threshold: 25°C)
TR (default threshold: 20°C)
CSDI (default 10th percentile)
WSDI (default 90th percentile)
TX90p (default 90th percentile)
TG90p (default 90th percentile)
TN90p (default 90th percentile)
TX10p (default 10th percentile)
TG10p (default 10th percentile)
TN10p (default 10th percentile)

The threshold could be one value:

threshold = 30

or a list of values:

threshold = [20, 25, 30]

Note

thresholds should be a float, the unit is expected to be in degrees Celsius or a unit-less for percentiles.

`transfer_limit_Mbytes`#

!Deprecated

transfer_limit_Mbytes is now ignored and will be deleted in a futur version. See how to chunk data and parallelize computation to configure dask chunking.

`callback`#

!Deprecated

Callback can used to output a estimated progress of the calculus. However, when using dask, the calculus are done lazily at the very end of icclim’s process. Thus the values transmitted to callback are irrelevant with dask.

`ignore_Feb29th`#

If it is True, we kick out February 29th.

Computing percentile thresholds#

Percentile thresholds are used as thresholds for calculation of percentile-based indices and are computed from values inside a reference period, named base period which is usually 30 years (base_period_time_range parameter).

There are two methods for calculation of percentile thresholds:

1. For temperature indices, theses thresholds are computed for each calendar day from samples (5-day window centred on the calendar day in the base period) which depend on window_width, only_leap_years and ignore_Feb29th parameters.

In icclim these thresholds represent a dictionary with 365 (if ignore_Feb29th is True) or 366 (if ignore_Feb29th is False) calendar days as keys, and 2D arrays as values.

Note

A calendar day key of the dictionary is composed from the corresponding month and day, separated by a comma. For example, getting of the 2D array with percentiles for April 13th, looks like my_perc_dict[4,13].

The percentile thresholds are different for “in-base” years (years inside the base period) and “out-of-base” years. For “in-base” years, icclim uses the bootstrapping procedure, which is explained in this article: Avoiding Inhomogeneity in Percentile-Based Indices of Temperature Extremes (Zhang et al.) - see the resampling algorithm in the section 4. Removing the “jump”.

Warning

Computing of percentile thresholds with the bootstrapping procedure may take some time! For example, a 30-yr base period requires (30-1) times of computing percentiles for each “in-base” year!, i.e. 30*(30-1) times in total (+1 time without bootstrapping for “out-of-base” years).

2. For precipitation indices, the thresholds are computed from the set of wet days (i.e. days when daily precipitation amount >= 1.0 mm) in the base period. In icclim these thresholds represent an 2D array.

Both methods could use 2 types of interpolation.

The calc_percentiles.py module contains get_percentile_dict and get_percentile_arr functions for the described methods.

`window_width`#

The window width is used for getting samples for percentiles computing and is set to 5: percentiles-based indices use 5-day window. The window is centred on a certain calendar day, for example: - April 13th, we take the values for April 11th, April 12th, April 13th, April 14th and April 15th of each year of the base period. - January 1st, we take all days of December 30th, December 31st, January 1st, January 2nd and January 3rd.

Hence, for a base period of 30 years and 5-day window width for each calendar day (except February 29th), there are 150 values ( 30 * 5 ) to compute its percentile value.

`only_leap_years`#

The only_leap_years parameter selects which of two methods to use for calculating a percentile value for the calendar day of February 29th:

if True, we take only leap years, i.e. for example for the base period of 1980-1990 and 5-day window width, we take the values corresponding to the following dates:

1980-02-27, 1980-02-28, 1980-02-29, 1980-03-01, 1980-03-02,

1984-02-27, 1984-02-28, 1984-02-29, 1984-03-01, 1984-03-02,

1988-02-27, 1988-02-28, 1988-02-29, 1988-03-01, 1988-03-02

if False, for the same base period and window width, we have:

1980-02-27, 1980-02-28, 1980-02-29, 1980-03-01, 1980-03-02,

1981-02-27, 1981-02-28, 1981-03-01, 1981-03-02,

1982-02-27, 1982-02-28, 1982-03-01, 1982-03-02,

1983-02-27, 1983-02-28, 1983-03-01, 1983-03-02,

1984-02-27, 1984-02-28, 1984-02-29, 1984-03-01, 1984-03-02,

1985-02-27, 1985-02-28, 1985-03-01, 1985-03-02,

1986-02-27, 1986-02-28, 1986-03-01, 1986-03-02,

1987-02-27, 1987-02-28, 1987-03-01, 1987-03-02,

1988-02-27, 1988-02-28, 1988-02-29, 1988-03-01, 1988-03-02

1989-02-27, 1989-02-28, 1989-03-01, 1989-03-02,

1990-02-27, 1990-02-28, 1990-03-01, 1990-03-02

The second way is preferable, because we have more samples.

Warning

If ignore_Feb29th is True, the only_leap_years does not make sense!

`interpolation`#

Computing of a percentile value could use linear, also known as type 7 in other software or the interpolation proposed by Hyndman and Fan (1996), named in icclim as hyndman_fan interpolation, also known as type 8.

`out_unit`#

Percentile-based indices (TX10p, TX90p, TN10p, TN90p, TG10p, TG90p, R75p, R95p and R99p) could be returned as number of days (default) or as percentage of days (out_unit = “%”).

Custom indices#

Custom indices are now described in their own chapter: Create your own index with user_index

Correspondence table “index - source variable”#

Using common names for the source variable, icclim is able to lookup the proper variable in the given input to compute an index.

index	Source variable
TG, GD4, HD17, TG10p, TG90p	daily mean temperature
TN, TNx, TNn, TR, FD, CFD, TN10p, TN90p, CSDI	daily minimum temperature
TX, TXx, TXn, SU, CSU, ID, TX10p, TX90p, WSDI	daily maximum temperature
DTR, ETR, vDTR	daily maximum + daily minimum temperature
PRCPTOT, RR1, SDII, CWD, CDD, R10mm, R20mm, RX1day, RX5day, R75p, R75pTOT, R95p, R95pTOT, R99p, R99pTOT	daily precipitation flux (liquide phase)
SD, SD1, SD5cm, SD50cm	daily snowfall flux (solid phase)
CD, CW, WD, WW	daily mean temperature + daily precipitation flux (liquide phase)

icclim.index()#

Compute climat indices#

in_files and var_name#

slice_mode#

Monthly time series filter#

User defined seasons#

Spatially varying seasons#

allow_partial_seasons#

threshold#

transfer_limit_Mbytes#

callback#

ignore_Feb29th#

Computing percentile thresholds#

window_width#

only_leap_years#

interpolation#

out_unit#

Custom indices#

Correspondence table “index - source variable”#

`in_files` and `var_name`#

`slice_mode`#

`allow_partial_seasons`#

`threshold`#

`transfer_limit_Mbytes`#

`callback`#

`ignore_Feb29th`#

`window_width`#

`only_leap_years`#

`interpolation`#

`out_unit`#