icclim._core.input_parsing#

Module to parse input data and make it usable for icclim.

Module Contents#

class icclim._core.input_parsing.PercentileDataArray(data: Any = dtypes.NA, coords: collections.abc.Sequence[collections.abc.Sequence | pandas.Index | DataArray] | collections.abc.Mapping | None = None, dims: str | collections.abc.Iterable[collections.abc.Hashable] | None = None, name: collections.abc.Hashable | None = None, attrs: collections.abc.Mapping | None = None, indexes: collections.abc.Mapping[Any, xarray.core.indexes.Index] | None = None, fastpath: bool = False)[source]#

Wrap xarray DataArray for percentiles values.

classmethod is_compatible(source: xarray.DataArray) bool[source]#

Evaluate whether PecentileDataArray is conformant with expected fields.

A PercentileDataArray must have climatology_bounds attributes and either a quantile or percentiles coordinate, the window is not mandatory.

classmethod from_da(source: xarray.DataArray, climatology_bounds: list[str] | None = None) PercentileDataArray[source]#

Create a PercentileDataArray from a xarray.DataArray.

Parameters:
  • source (xr.DataArray) – A DataArray with its content containing percentiles values. It must also have a coordinate variable percentiles or quantile.

  • climatology_bounds (list[str]) – Optional. A List of size two which contains the period on which the percentiles were computed. See xclim.core.calendar.build_climatology_bounds to build this list from a DataArray.

Returns:

The initial source DataArray but wrap by PercentileDataArray class. The data is unchanged and only climatology_bounds attributes is overridden if q new value is given in inputs.

Return type:

PercentileDataArray

icclim._core.input_parsing.guess_var_names(ds: xarray.core.dataset.Dataset, var_names: str | collections.abc.Sequence[str] | None, standard_index: icclim._core.model.standard_index.StandardIndex | None) list[collections.abc.Hashable][source]#

Attempt to guess the variable names from the dataset and the standard index.

Parameters:
  • ds (Dataset) – The dataset to guess the variable names from.

  • var_names (str | Sequence[str] | None) – The variable names to use. If None, the function will attempt to guess the variable names.

  • standard_index (StandardIndex | None) – The standard index to use to guess the variable names.

Returns:

The list of guessed variable names.

Return type:

list[Hashable]

icclim._core.input_parsing.read_dataset(in_files: icclim._core.model.icclim_types.InFileBaseType, standard_var: icclim._core.model.standard_variable.StandardVariable | None = None, var_name: str | collections.abc.Sequence[str] | None = None) xarray.core.dataset.Dataset[source]#

Read a dataset from input files.

Parameters:
  • in_files (InFileBaseType) – The input files to read the dataset from. It can be a single file path, a list of file paths, a glob pattern, a netCDF file, or a Zarr store.

  • standard_var (StandardVariable | None, optional) – The standard variable to use for parsing the dataset, by default None.

  • var_name (str | Sequence[str] | None, optional) – The variable name(s) to extract from the dataset, by default None.

Returns:

The parsed dataset.

Return type:

Dataset

Raises:

NotImplementedError – If the format of in_files is not recognized.

Notes

This function supports reading datasets from various file formats, including netCDF and Zarr. It can handle single files, multiple files, and glob patterns. If standard_var is provided, it will use the specified standard variable for parsing the dataset. If var_name is provided, it will extract the specified variable(s) from the dataset.

Examples

>>> files = ["data1.nc", "data2.nc"]
>>> ds = read_dataset(files, standard_var="temperature", var_name="temp")
icclim._core.input_parsing.update_to_standard_coords(ds: xarray.core.dataset.Dataset) xarray.core.dataset.Dataset[source]#

Mutate input ds to use more icclim friendly coordinate names.

icclim._core.input_parsing.is_zarr_path(path: icclim._core.model.icclim_types.InFileBaseType) bool[source]#

Check if the input path is a Zarr store.

icclim._core.input_parsing.is_netcdf_path(path: icclim._core.model.icclim_types.InFileBaseType) bool[source]#

Check if the input path is a netCDF file.

icclim._core.input_parsing.is_glob_path(path: icclim._core.model.icclim_types.InFileBaseType) bool[source]#

Check if the input path is a glob pattern.

icclim._core.input_parsing.standardize_percentile_dim_name(per_da: xarray.core.dataarray.DataArray) xarray.core.dataarray.DataArray[source]#

Standardizes the name of the percentile dimension in the input DataArray.

Parameters:

per_da (DataArray) – The input DataArray containing percentile data.

Returns:

The input DataArray with the percentile dimension standardized.

Return type:

DataArray

Raises:

InvalidIcclimArgumentError – If the percentile data does not contain a recognizable percentiles dimension.

Notes

This function standardizes the name of the percentile dimension in the input DataArray to “percentiles”.

If the percentile dimension name contains the word “quantile”, the values in the “percentiles” coordinate are multiplied by 100.

icclim._core.input_parsing.get_date_to_iso_format(in_date: str | datetime.datetime) str[source]#

Get a date in ISO format from a string or a datetime object.

Parameters:

in_date (str | datetime) – A string representing a date or a datetime object.

Returns:

A string representing a date in ISO format.

Return type:

str

icclim._core.input_parsing.read_clim_bounds(climatology_bounds: collections.abc.Sequence[str, str] | None, per_da: xarray.DataArray) list[str][source]#

Read climatology bounds from input.

The climatology bounds represent the start and end dates of the climatology period.

Parameters:
  • climatology_bounds (sequence of str or None) – The climatology bounds as a sequence of two strings representing the start and end dates. If None, the climatology bounds will be retrieved from the climatology_bounds attribute of per_da.

  • per_da (xr.DataArray) – The input data array.

Returns:

A list of climatology bounds converted to ISO format.

Return type:

list of str

Raises:

InvalidIcclimArgumentError – If the length of climatology_bounds is not equal to 2.

Notes

If climatology_bounds is None, the function will attempt to retrieve the climatology bounds from the climatology_bounds attribute of per_da.

icclim._core.input_parsing.build_input_dict(in_files: icclim._core.model.icclim_types.InFileLike, var_names: collections.abc.Sequence[str] | None, threshold: icclim._core.model.threshold.Threshold | collections.abc.Sequence[icclim._core.model.threshold.Threshold] | None, standard_index: icclim._core.model.standard_index.StandardIndex | None) dict[str, icclim._core.model.in_file_dictionary.InFileDictionary][source]#

Build an input dictionary based on the provided input files and variables.

The input dictionary is used to map which input files correspond to which variables.

Parameters:
  • in_files (InFileLike) – The input files. It can be a dictionary where the keys represent the variable names and the values represent the file paths, or a single file path.

  • var_names (Sequence[str] | None) – The variable names. If in_files is a dictionary, this parameter must be None. Otherwise, it should be a sequence of variable names corresponding to the single file path.

  • threshold (Threshold | Sequence[Threshold] | None) – The threshold values. It can be a single threshold value, a sequence of threshold values, or None.

  • standard_index (StandardIndex | None) – The standard index. It can be a standard index value or None.

Returns:

The built input dictionary.

Return type:

dict[str, InFileDictionary]

Raises:

InvalidIcclimArgumentError – If var_names is not None when in_files is a dictionary.

Notes

  • If in_files is a dictionary, the dictionary keys are used as variable names.

  • If in_files is a dictionary and the dictionary values are also dictionaries, the input dictionary is returned as is.

  • If in_files is a dictionary and the dictionary values are file paths, the input dictionary is built using the file paths and variable names.

  • If in_files is a single file path and var_names is a single variable name, the input dictionary is built using the file path and variable name.

icclim._core.input_parsing.find_standard_vars(ds: xarray.core.dataset.Dataset) list[collections.abc.Hashable][source]#

Find standard variables in a dataset.

Parameters:

ds (Dataset) – The input dataset.

Returns:

A list of standard variables found in the dataset.

Return type:

list[Hashable]

icclim._core.input_parsing.guess_standard_variable(data: xarray.core.dataarray.DataArray) icclim._core.model.standard_variable.StandardVariable | None[source]#

Guesses the standard variable based on the metadata of data.

Parameters:

data (DataArray) – The input data.

Returns:

The guessed standard variable, or None if no standard variable is found.

Return type:

StandardVariable or None

Notes

StandardVariableRegistry is used as a lookup table to find the standard variable using the dataarray name or standard name attribute.

icclim._core.input_parsing.is_precipitation_amount(source: xarray.DataArray) bool[source]#

Return True if the source is a precipitation amount.

Parameters:

source (xr.DataArray) – A DataArray object.

Returns:

True if the source is a precipitation amount, False otherwise.

Return type:

bool

Notes

Using pint, the rate is a quantity with a dimensionality of [time]^-1.

icclim._core.input_parsing.build_studied_data(original_da: xarray.core.dataarray.DataArray, time_range: collections.abc.Sequence[datetime.datetime | str] | None, ignore_Feb29th: bool, default_units: str | None) xarray.core.dataarray.DataArray[source]#

Preprocesss the input data to select the period of interest.

Parameters:
  • original_da (DataArray) – The original data array.

  • time_range (Sequence[datetime | str] | None) – The time range to select from the data array. If None, the entire time range is used.

  • ignore_Feb29th (bool) – Whether to ignore February 29th when processing the data.

  • default_units (str | None) – The default units to use for the data array if it is uniteless. If None and the data array is uniteless, “units” attribute remains unset.

Returns:

The processed data array.

Return type:

DataArray

Raises:

InvalidIcclimArgumentError – If the given time_range is out of the dataset time period.

icclim._core.input_parsing.get_name_of_first_var(ds: xarray.core.dataset.Dataset) str[source]#

Get the name of the first variable in the given Dataset.

Parameters:

ds (Dataset) – The input Dataset.

Returns:

The name of the first variable in the Dataset.

Return type:

str

Raises:

IndexError – If the Dataset is empty.

icclim._core.input_parsing.is_dataset_path(query: icclim._core.model.icclim_types.InFileBaseType) bool[source]#

Check if the given query is a valid dataset path.

Parameters:

query (InFileBaseType) – The query to check. It can be a single path or a list/tuple of paths.

Returns:

True if the query is a valid dataset path, False otherwise.

Return type:

bool

Notes

A valid dataset path can be either a NetCDF path, a Zarr path, a glob path, or a list/tuple of valid paths.

icclim._core.input_parsing.build_reference_da(original_da: xarray.core.dataarray.DataArray, base_period_time_range: collections.abc.Sequence[datetime.datetime | str] | None, only_leap_years: bool, percentile_min_value: pint.Quantity | None) xarray.core.dataarray.DataArray[source]#

Build a reference DataArray to be used for percentile doy computation.

Parameters:
  • original_da (DataArray) – The DataArray used as a base.

  • base_period_time_range (Sequence[datetime | str] | None) – The period to slice in the base DataArray.

  • only_leap_years (bool) – Flag to only use leap years (years with 366 days).

  • percentile_min_value (Quantity | None) – Optional, if set will replace every value from the base DataArray that are below the percentile_min_value with np.nan.

icclim._core.input_parsing.get_dataarray_from_dataset(var_name: str | None, value: xarray.Dataset | str, standard_var: icclim._core.model.standard_variable.StandardVariable | None = None) xarray.DataArray[source]#

Extract a DataArray from a Dataset based on the provided variable name.

Parameters:
  • var_name (str or None) – The name of the variable to extract from the Dataset. If None, the function will try to guess the variable based on the Dataset’s contents.

  • value (xr.Dataset or str) – The Dataset object or the path to the Dataset file.

  • standard_var (StandardVariable) – The standard variable used to find a matching variable in the Dataset.

Returns:

The extracted DataArray.

Return type:

xr.DataArray

Raises:

InvalidIcclimArgumentError – If the variable name cannot be guessed and var_name is None.

Notes

This function can be used to extract a specific variable from a Dataset object or a Dataset file. If var_name is None, the function will try to guess the variable based on the Dataset’s contents.

icclim._core.input_parsing._guess_dataset_var_names(standard_index: icclim._core.model.standard_index.StandardIndex, ds: xarray.core.dataset.Dataset) list[collections.abc.Hashable][source]#

Try to guess the variable names.

The expected kind of variable of the index is used to guess the variable names.