qsprpred.data.processing package

Submodules

qsprpred.data.processing.applicability_domain module

class qsprpred.data.processing.applicability_domain.ApplicabilityDomain(threshold: float | None = None, direction: str | None = None)[source]

Bases: JSONSerializable, ABC

Define the applicability domain for a dataset.

A class to define the applicability domain for a dataset. A fitted applicability domain can be used to filter out molecules that are not in in the applicability domain or just to check if a molecule is in the applicability domain.

Initialize the applicability domain with a threshold.

Parameters:
  • threshold (float | None) – threshold value

  • direction (str | None) – direction of the threshold, should be set if threshold is set (“>”, “<”, “>=”, “<=”)

contains(X: DataFrame) DataFrame[source]

Check if the applicability domain contains the features.

Parameters:

X (pd.DataFrame) – array of features to check

Returns:

pd.Series of booleans indicating if the features are in the

applicability domain

Return type:

pd.Series

property direction: str

Return the direction of the threshold.

The direction should be ‘>’, ‘<’, ‘>=’, ‘<=’

abstract fit(X: DataFrame) None[source]

Fit the applicability domain model.

Parameters:

X (pd.DataFrame) – array of features to fit model on

abstract property fitted: bool

Return whether the applicability domain is fitted or not.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

abstract transform(X: DataFrame) DataFrame[source]

Transform the features to a score for the applicability domain.

The result could be a boolean array indicating if the features are in the applicability domain or a continous score indicating a measure of applicability (e.g., a probability or a distance).

Parameters:

X (pd.DataFrame) – array of features

Returns:

scores for the applicability domain

Return type:

pd.Series

class qsprpred.data.processing.applicability_domain.KNNApplicabilityDomain(k: int = 5, alpha: float | None = None, hard_threshold: float | None = None, scaling: str | None = 'robust', dist: str = 'euclidean', scaler_kwargs=None, n_jobs: int = 1, astype: str | None = 'float64')[source]

Bases: ApplicabilityDomain

Applicability domain defined using K-nearest neighbours.

This class is adapted from the KNNApplicabilityDomain class in the mlchemad package.

Create the k-Nearest Neighbor applicability domain.

Parameters:
  • k (int) – number of nearest neighbors

  • alpha (float) – ratio of inlier samples calculated from the training set; ignored if hard_threshold is set

  • hard_threshold (float) – samples with a distance greater or equal to this threshold will be considered outliers

  • scaling (str) – scaling method; must be one of ‘robust’, ‘minmax’, ‘maxabs’, ‘standard’ or None (default: ‘robust’)

  • dist (str) – kNN distance to be calculated (default: euclidean); one of {list(dist_fns.keys())}; jaccard is recommended for binary fingerprints.

  • scaler_kwargs (dict) – additional parameters to supply to the scaler

  • n_jobs (int) – number of parallel processes used to fit the kNN model

contains(X: DataFrame) DataFrame

Check if the applicability domain contains the features.

Parameters:

X (pd.DataFrame) – array of features to check

Returns:

pd.Series of booleans indicating if the features are in the

applicability domain

Return type:

pd.Series

property direction: str

Return the direction of the threshold.

The direction should be ‘>’, ‘<’, ‘>=’, ‘<=’

fit(X)[source]

Fit the applicability domain to the given feature matrix

Parameters:

X – feature matrix

property fitted: bool

Return whether the applicability domain is fitted or not.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X)[source]

Get the distance to the kNN neighbors for the given feature matrix

Parameters:

X – feature matrix

Returns:

array of distances to the kNN neighbors

class qsprpred.data.processing.applicability_domain.MLChemAD(applicability_domain: ApplicabilityDomain, astype: str | None = 'float64')[source]

Bases: ApplicabilityDomain

Define the applicability domain for a dataset using the MLChemAD package.

This class uses the MLChemAD package to filter out molecules that are not in the applicability domain. The MLChemAD package is available at https://github.com/OlivierBeq/MLChemAD

Variables:
  • applicabilityDomain (MLChemApplicabilityDomain) – applicability domain object

  • fitted (bool) – whether the applicability domain is fitted or not

Initialize the MLChemADFilter with the domain_type attribute.

Parameters:
  • applicability_domain (MLChemAD) – applicability domain object

  • astype (str | None) – type to cast the features to before fitting or checking the applicability domain

contains(X: DataFrame) DataFrame

Check if the applicability domain contains the features.

Parameters:

X (pd.DataFrame) – array of features to check

Returns:

pd.Series of booleans indicating if the features are in the

applicability domain

Return type:

pd.Series

property direction: str

Return the direction of the threshold.

The direction should be ‘>’, ‘<’, ‘>=’, ‘<=’

fit(X: DataFrame) None[source]

Fit the applicability domain model.

Parameters:

X (pd.DataFrame) – array of features to fit model on

property fitted: bool

Return whether the applicability domain is fitted or not.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame) DataFrame[source]

Check if the applicability domain contains the features.

Parameters:

X (pd.DataFrame) – array of features to check

Returns:

pd.Series of booleans indicating if the features are in the

applicability domain

Return type:

pd.Series

qsprpred.data.processing.data_filters module

Filters for QSPR Datasets.

To add a new filter: * Add a DataFilter subclass for your new filter

class qsprpred.data.processing.data_filters.CategoryFilter(prop: str, values: list[str], data_set: QSPRDataSet | None = None, keep: bool = False)[source]

Bases: DataFilter

To filter out values from column

Variables:
  • prop (str) – column based on which to filter.

  • values (list[str]) – filter values.

  • keep (bool) – whether to keep or discard values.

Initialize the CategoryFilter with the name, values and keep attributes.

Parameters:
  • prop (str) – column based on which to filter.

  • values (list) – list of values to filter from props.

  • data_set (QSPRDataSet) – dataset to filter.

  • keep (bool, optional) – whether to keep or discard the values. Defaults to False.

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: DataFrame | None = None) tuple[DataFrame, DataFrame | None][source]

Filter rows from dataframe.

Parameters:
  • X (pd.DataFrame) – dataframe to filter.

  • y (pd.DataFrame, optional) – output dataframe if the filtering method requires it

Returns:

filtered dataframe. pd.DataFrame: target dataframe.

Return type:

pd.DataFrame

class qsprpred.data.processing.data_filters.DataFilter(**kwargs)[source]

Bases: Step, DataSetDependent

Filter out some rows from a dataframe.

Initialize the step

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

abstract transform(X: DataFrame, y: DataFrame | None = None) DataFrame[source]

Remove rows from a dataframe.

Parameters:
  • X (pd.DataFrame) – dataframe to be standardized

  • y (pd.DataFrame, optional) – output dataframe if the standardization method requires it

class qsprpred.data.processing.data_filters.NaNFilter(features: list[str] | None = None, keep: bool = False)[source]

Bases: DataFilter

Step that removes rows containing NaN values in a specified column

Initialize the step with the columns to check for NaN values

If no columns are specified, all columns are checked for NaN values.

Parameters:
  • features (list[str] | None) – columns to check for NaN values

  • keep (bool) – whether to keep or discard rows with NaN values, if True only warn about NaN values, if False remove rows with NaN values

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame][source]

Remove rows containing NaN values in the specified columns

class qsprpred.data.processing.data_filters.OutlierFilter(ad: ApplicabilityDomain)[source]

Bases: DataFilter

Remove outliers based on an applicability domain

Initialize the OutlierFilter with an applicability domain from MLChemAD.

Parameters:

ad (MLChemAD | MLChemADApplicabilityDomain) – The applicability domain to use.

fit(X: DataFrame, y: None | DataFrame = None)[source]

Fit the applicability domain to the data.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame, optional) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: DataFrame | None = None) tuple[DataFrame, DataFrame | None][source]

Remove samples outside the applicability domain.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame, optional) – training targets

Returns:

filtered training data and targets

Return type:

tuple[pd.DataFrame, pd.DataFrame]

class qsprpred.data.processing.data_filters.RepeatsFilter(keep: str | bool = False, timecol: str | None = None, additional_cols: list[str] | None = None, data_set: QSPRDataSet | None = None)[source]

Bases: DataFilter

To filter out duplicate molecules based on descriptor values.

Variables:
  • keep (str) – For duplicate entries determines how properties are treated, if False remove both (/all) duplicate entries, if True keep them, if first, keep row of first entry (based on time), if last keep row of last entry based on time. options: ‘first’, ‘last’, True, False

  • timeCol (str, optional) – name of column containing time of publication used if keep is ‘first’ or ‘last’

  • additionalCols (list[str], optional) – additional columns to use for determining duplicates (e.g. proteinid, in case of PCM modelling), so that compounds with same X but different proteinid are not removed.

Initialize the RepeatsFilter with the keep, timecol and additional_cols attributes.

Parameters:
  • keep (str|bool, optional) – For duplicate entries determines how properties are treated, if False remove both (/all) duplicate entries, if True keep them, if first, keep row of first entry (based on time), if last keep row of last entry based on time. Defaults to False.

  • timecol (str, optional) – name of column containing time of publication used if keep is ‘first’ or ‘last’. Defaults to None.

  • additional_cols (list[str], optional) – additional columns to use for determining duplicates (e.g. proteinid, in case of PCM modelling), so that compounds with same X but different proteinid are not removed. Defaults to None.

  • data_set (QSPRDataSet, optional) – dataset to filter. Defaults to None.

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: DataFrame | None = None) tuple[DataFrame, DataFrame | None][source]

Filter rows from dataframe.

Parameters:
  • X (pandas dataframe) – dataframe to filter

  • y (pandas dataframe, optional) – output dataframe if the filtering method requires it

Returns:

filtered dataframe and target dataframe if provided.

Return type:

tuple[pd.DataFrame, pd.DataFrame | None]

qsprpred.data.processing.feature_filters module

Different filters to select features from trainingset.

To add a new feature filters: * Add a FeatureFilter subclass for your new filter

class qsprpred.data.processing.feature_filters.BorutaFilter(boruta_feat_selector: BorutaPy = None, seed: int | None = None)[source]

Bases: FeatureFilter, Randomized

Boruta filter from BorutaPy: Boruta all-relevant feature selection.

Variables:
  • featSelector (BorutaPy) – BorutaPy feature selector

  • droppedFeatures (pd.Index) – columns dropped by Boruta filter

  • seed (int) – Random state to use for shuffling and other random operations.

Initialize the BorutaFilter class.

Parameters:
  • boruta_feat_selector (BorutaPy, optional) – The BorutaPy feature selector. If not provided, a default BorutaPy instance will be created.

  • seed (int | None, optional) – Random state to use for shuffling and other random operations. If None, the random state set in the BorutaPy instance is used. Defaults to None.

fit(X: DataFrame, y: DataFrame)[source]

Fit the Boruta filter to the data.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) BorutaFilter[source]

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

property randomState: int

Get the random state for the object.

toFile(filename: str) str[source]

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: DataFrame | None = None) tuple[DataFrame, DataFrame][source]

Filter out uninformative features from a dataframe using BorutaPy.

Parameters:
  • X (pd.DataFrame) – dataframe to be filtered

  • y (pd.DataFrame, optional) – output dataframe if the filtering method requires it

Returns:

The filtered dataframe pd.DataFrame: The target dataframe

Return type:

pd.DataFrame

class qsprpred.data.processing.feature_filters.FeatureFilter(**kwargs)[source]

Bases: Step

Filter out uninformative featureNames from a dataframe.

Initialize the step

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

abstract transform(X: DataFrame, y: DataFrame | None = None) DataFrame[source]

Filter out uninformative features from a dataframe.

Parameters:
  • X (pd.DataFrame) – dataframe to be filtered

  • y (pd.DataFrame, optional) – output dataframe if the filtering method requires it

Returns:

The filtered pd.DataFrame

class qsprpred.data.processing.feature_filters.HighCorrelationFilter(th: float)[source]

Bases: FeatureFilter

Remove features with correlation higher than a given threshold.

Variables:
  • th (float) – threshold for correlation

  • high_corr_cols (pd.Index) – columns with high correlation (if fitted)

Initialize the step

fit(X: DataFrame, y: None | DataFrame = None)[source]

Find features with correlation higher than a given threshold.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame, optional) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: DataFrame | None = None) tuple[DataFrame, DataFrame][source]

Filter out high correlation features from a dataframe.

Parameters:
  • X (pd.DataFrame) – dataframe to be filtered

  • y (pd.DataFrame, optional) – output dataframe if the filtering method requires it

Returns:

The filtered dataframe pd.DataFrame: The target dataframe

Return type:

pd.DataFrame

class qsprpred.data.processing.feature_filters.LowVarianceFilter(th: float)[source]

Bases: FeatureFilter

Remove features with variance equal to or lower than a given threshold after MinMax scaling.

Variables:
  • th (float) – threshold for removing features

  • low_var_cols (pd.Index) – columns with low variance (if fitted)

Initialize the step

fit(X: DataFrame, y: None | DataFrame = None)[source]

Find features with variance equal to or lower than a given threshold after MinMax scaling.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame, optional) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: DataFrame | None = None) tuple[DataFrame, DataFrame][source]

Filter out low variance features from a dataframe.

Parameters:
  • X (pd.DataFrame) – dataframe to be filtered

  • y (pd.DataFrame, optional) – output dataframe if the filtering method requires it

Returns:

The filtered dataframe pd.DataFrame: The target dataframe

Return type:

pd.DataFrame

qsprpred.data.processing.feature_transformers module

This module is used for feature standardization and transformation in a pipeline.

class qsprpred.data.processing.feature_transformers.FeatureTransformer(**kwargs)[source]

Bases: Step

Base class for feature transformers

This class is used to standardize or transform feature sets in a pipeline. It should be subclassed to implement specific transformations.

Currently, only the SklearnStep class is implemented, which wraps a scikit-learn transformer.

Initialize the step

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

abstract transform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Apply the step to the dataset

Note. the step should not modify the original data

Parameters:
  • X (pd.DataFrame) – data to be transformed

  • y (pd.DataFrame) – target data to be transformed

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

class qsprpred.data.processing.feature_transformers.SklearnStep(transformer: BaseEstimator)[source]

Bases: FeatureTransformer

Step that wraps a scikit-learn transformer

For example, this can be used to wrap a scikit-learn StandardScaler

Variables:

transformer (BaseEstimator) – scikit-learn transformer to wrap, should have implementations of the fit and transform methods.

Initialize the SklearnStep

Parameters:

transformer (BaseEstimator) – scikit-learn transformer to wrap, should have implementations of the fit and transform methods.

fit(X: DataFrame, y: None | DataFrame = None)[source]

Fit the transformer to the data

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame | None) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None][source]

Transform the data using the transformer

Parameters:
  • X (pd.DataFrame) – data to be transformed

  • y (pd.DataFrame | None) – target data to be transformed

Returns:

transformed data pd.DataFrame | None: (transformed) target data

Return type:

pd.DataFrame

qsprpred.data.processing.imputers module

class qsprpred.data.processing.imputers.FeatureImputer(imputer: _BaseImputer, feature_properties: list[str] | None = None)[source]

Bases: Imputer

Initialize the feature imputer.

Parameters:
  • imputer (callable) – imputer function, e.g. from sklearn.impute, should have fit and transform methods

  • feature_properties (list[str], optional) – feature properties to impute, if None, all features will be imputed. Note that you can set either a DescriptorSet name or a list of feature names prefixed by the DescriptorSet name, e.g. [‘RDKitDesc’, ‘MorganFP_0’, ‘MorganFP_1’]

fit(X: DataFrame, y: DataFrame)[source]

Fit the imputer to the dataset

Parameters:
  • X (pd.DataFrame) – training data features

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

get_features_to_be_imputed(X: DataFrame) list[str][source]

Get the features that will be imputed.

Parameters:

X (pd.DataFrame) – features

Returns:

features to be imputed

Return type:

list[str]

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: DataFrame) tuple[DataFrame, DataFrame][source]

Impute values in the dataset.

Parameters:
  • X (pd.DataFrame) – features (to be imputed)

  • y (pd.DataFrame) – target data (to be imputed)

Returns:

(imputed) data pd.DataFrame: (imputed) target data

Return type:

pd.DataFrame

class qsprpred.data.processing.imputers.Imputer(**kwargs)[source]

Bases: Step

Initialize the step

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

abstract transform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame][source]

Impute values in the dataset.

Parameters:
  • X (pd.DataFrame) – features (to be imputed)

  • y (pd.DataFrame) – target data (to be imputed)

Returns:

(imputed) data pd.DataFrame: (imputed) target data

Return type:

pd.DataFrame

class qsprpred.data.processing.imputers.TargetImputer(imputer: _BaseImputer, target_properties: list[str] | None = None)[source]

Bases: Imputer

Initialize the target imputer.

Parameters:
  • imputer (callable) – imputer function, e.g. from sklearn.impute, should have fit and transform methods

  • target_properties (list[str], optional) – target properties to impute,

  • None (if)

  • imputed. (all targets will be)

fit(X: DataFrame, y: DataFrame)[source]

Fit the imputer to the dataset

Parameters:
  • X (pd.DataFrame) – training data features

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: DataFrame) tuple[DataFrame, DataFrame][source]

Impute values in the dataset.

Parameters:
  • X (pd.DataFrame) – features (to be imputed)

  • y (pd.DataFrame) – target data (to be imputed)

Returns:

(imputed) data pd.DataFrame: (imputed) target data

Return type:

pd.DataFrame

qsprpred.data.processing.mol_processor module

Abstract class that defines a simple callback interface to process molecules.

class qsprpred.data.processing.mol_processor.MolProcessor[source]

Bases: ABC

A callable that processes a list of molecules either specified as strings, RDKit molecules, or StoredMol instances. The processor can also accept additional properties related to the molecules if specified by the caller.

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

abstract property supportsParallel: bool

Whether the processor supports parallel processing.

class qsprpred.data.processing.mol_processor.MolProcessorWithID(id_prop: str | None = 'ID')[source]

Bases: MolProcessor, ABC

A processor that requires a unique identifier for each molecule. Callers are instructed to pass this property with the requiredProps attribute.

Variables:

idProp (str) – The name of the passed property that contains the molecule’s unique identifier.

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:

id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

iterMolsAndIDs(mols, props: dict[str, list] | None)[source]

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

abstract property supportsParallel: bool

Whether the processor supports parallel processing.

qsprpred.data.processing.pipeline module

class qsprpred.data.processing.pipeline.DatasetPipeline(feature_calculators: list[DescriptorSet] | None = None, steps: dict[str, Step | BaseEstimator] | None = None, fixed: list[str] | None = None, fit_on: dict[str, str] | None = None, apply_to: dict[str, str] | None = None, skip: list[str] | None = None, seed: int | None = None)[source]

Bases: Pipeline

Pipeline class for applying data preprocessing steps to a QSPRDataset.

Variables:
  • feature_calculators (list[DescriptorSet] | None) – List of feature calculators to apply to the dataset. If None, no feature calculators are applied.

  • originalfeatureNames (list[str] | None) – Original feature names in the dataset before applying the pipeline.

Initialize the DatasetPipeline

Parameters:
  • feature_calculators (list[DescriptorSet] | None) – List of feature calculators to apply to the dataset.

  • steps (dict[str, Step | BaseEstimator]) – Dictionary of named steps in the pipeline, if the step is a scikit-learn transformer, it will be wrapped in a SklearnStep.

  • fixed (list[str]) – List of step names that should not be fitted, only transformed

  • fit_on (dict[str, str]) – Settings for which data a step should be fitted on. Either ‘train’, ‘test’ or ‘both’, if not specified the step is fitted on the training data.

  • apply_to (dict[str, str]) – Settings for which data a step should be applied to. Either ‘train’, ‘test’ or ‘both’, if not specified the step is applied to both.

  • skip (list[str]) – List of step names to skip

  • seed (int | None) – Random state for the pipeline

addSkip(name: str)

Add a step to the skip list

Parameters:

name (str) – name of the step to skip

addStep(name: str, step: Step, fit_on: str = 'train', apply_to: str = 'both', fixed: bool = False)

Add a step to the pipeline

Parameters:
  • name (str) – name of the step

  • step (Step) – step to add to the pipeline

  • fit_on (str) – whether to fit the step on ‘train’, ‘test’ or ‘both’

  • apply_to (str) – whether to apply the step on ‘train’, ‘test’ or ‘both’

  • fixed (bool) – whether the step should be fixed and not fitted

apply(X_train: DataFrame, y_train: DataFrame | None = None, X_test: DataFrame | None = None, y_test: DataFrame | None = None, fit: bool = True) tuple[DataFrame, DataFrame | None, DataFrame | None, DataFrame | None]

Apply the pipeline to the data

If fit is True, the pipeline is fitted to the training data and then applied to the train and test data. If fit is False, the pipeline is only applied to the data.

Parameters:
  • X_train (pd.DataFrame) – training data to apply the pipeline to

  • y_train (pd.DataFrame | None) – training target data to apply the pipeline to

  • X_test (pd.DataFrame | None) – test data to apply the pipeline to

  • y_test (pd.DataFrame | None) – test target data to apply the pipeline to

  • fit (bool) – whether to fit the pipeline

Returns:

transformed training data y_train (pd.DataFrame | None): transformed training targets X_test (pd.DataFrame | None): transformed test data y_test (pd.DataFrame | None): transformed test targets

Return type:

X_train (pd.DataFrame)

applyOnDataSet(dataset: QSPRTable, split: DataSplit | None = None, fit: bool = True, seed: int | None = None) Generator[tuple[DataFrame, DataFrame, DataFrame, DataFrame] | tuple[DataFrame, DataFrame], None, None][source]

Apply the pipeline to the dataset

Note. the random state of the dataset is used to randomize the pipeline

when the seed of feature calculators, splits or steps is not set.

Parameters:
  • dataset (QSPRTable) – dataset to apply the pipeline to

  • split (DataSplit) – split to apply to the dataset

  • seed (int | None) – seed to randomize the pipeline, if None, the random state of the dataset is used

  • fit (bool) – whether to fit the pipeline

Yields:

X_train (pd.DataFrame) – transformed training data y_train (pd.DataFrame): transformed training targets X_test (pd.DataFrame | None): transformed test data if split is not None y_test (pd.DataFrame | None): transformed test targets if split is not None

property fitted: bool

Check if the pipeline is fitted

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

orderSteps(order: list[str])

Order the steps in the pipeline

Parameters:

order (list[str]) – list of step names in the desired order

property randomState: int | None

Get the random state for the object.

removeSkip(name: str)

Remove a step from the skip list

Parameters:

name (str) – name of the step to remove from the skip list

removeStep(name: str)

Remove a step from the pipeline

Parameters:

name (str) – name of the step to remove

property skip: list[str]

Get the steps to skip

The steps to skip are not fitted or transformed, but are still present in the pipeline.

Returns:

list of step names to skip

Return type:

list[str]

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.data.processing.pipeline.Pipeline(steps: dict[str, Step | BaseEstimator] | None = None, fixed: list[str] | None = None, fit_on: dict[str, str] | None = None, apply_to: dict[str, str] | None = None, skip: list[str] | None = None, seed: int | None = None)[source]

Bases: Randomized, JSONSerializable

Pipeline class for for sequentially applying data preprocessing steps.

Variables:
  • steps (dict[str, Step | BaseEstimator]) – Dictionary of named steps in the pipeline, if the step is a scikit-learn transformer, it will be wrapped in a SklearnStep.

  • fixed (list[str]) – List of step names that should not be fitted, only transformed

  • fitOn (dict[str, str]) – Settings for which data a step should be fitted on. Either ‘train’, ‘test’ or ‘both’, if not specified the step is fitted on the training data.

  • applyTo (dict[str, str]) – Settings for which data a step should be applied to. Either ‘train’, ‘test’ or ‘both’, if not specified the step is applied to both.

  • randomState (int | None) – Random state for the pipeline

  • skip (list[str]) – List of step names to skip

  • fitted (bool) – Whether the pipeline is fitted

Initialize the Pipeline

Parameters:
  • steps (dict[str, Step | BaseEstimator]) – Dictionary of named steps in the pipeline, if the step is a scikit-learn transformer, it will be wrapped in a SklearnStep.

  • fixed (list[str]) – List of step names that should not be fitted, only transformed

  • fit_on (dict[str, str]) – Settings for which data a step should be fitted on. Either ‘train’, ‘test’ or ‘both’, if not specified the step is fitted on the training data.

  • apply_to (dict[str, str]) – Settings for which data a step should be applied to. Either ‘train’, ‘test’ or ‘both’, if not specified the step is applied to both.

  • skip (list[str]) – List of step names to skip

  • seed (int | None) – Random state for the pipeline

addSkip(name: str)[source]

Add a step to the skip list

Parameters:

name (str) – name of the step to skip

addStep(name: str, step: Step, fit_on: str = 'train', apply_to: str = 'both', fixed: bool = False)[source]

Add a step to the pipeline

Parameters:
  • name (str) – name of the step

  • step (Step) – step to add to the pipeline

  • fit_on (str) – whether to fit the step on ‘train’, ‘test’ or ‘both’

  • apply_to (str) – whether to apply the step on ‘train’, ‘test’ or ‘both’

  • fixed (bool) – whether the step should be fixed and not fitted

apply(X_train: DataFrame, y_train: DataFrame | None = None, X_test: DataFrame | None = None, y_test: DataFrame | None = None, fit: bool = True) tuple[DataFrame, DataFrame | None, DataFrame | None, DataFrame | None][source]

Apply the pipeline to the data

If fit is True, the pipeline is fitted to the training data and then applied to the train and test data. If fit is False, the pipeline is only applied to the data.

Parameters:
  • X_train (pd.DataFrame) – training data to apply the pipeline to

  • y_train (pd.DataFrame | None) – training target data to apply the pipeline to

  • X_test (pd.DataFrame | None) – test data to apply the pipeline to

  • y_test (pd.DataFrame | None) – test target data to apply the pipeline to

  • fit (bool) – whether to fit the pipeline

Returns:

transformed training data y_train (pd.DataFrame | None): transformed training targets X_test (pd.DataFrame | None): transformed test data y_test (pd.DataFrame | None): transformed test targets

Return type:

X_train (pd.DataFrame)

property fitted: bool

Check if the pipeline is fitted

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

orderSteps(order: list[str])[source]

Order the steps in the pipeline

Parameters:

order (list[str]) – list of step names in the desired order

property randomState: int | None

Get the random state for the object.

removeSkip(name: str)[source]

Remove a step from the skip list

Parameters:

name (str) – name of the step to remove from the skip list

removeStep(name: str)[source]

Remove a step from the pipeline

Parameters:

name (str) – name of the step to remove

property skip: list[str]

Get the steps to skip

The steps to skip are not fitted or transformed, but are still present in the pipeline.

Returns:

list of step names to skip

Return type:

list[str]

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

qsprpred.data.processing.step module

class qsprpred.data.processing.step.DummyStep(**kwargs)[source]

Bases: Step

Dummy step that does nothing

Initialize the step

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None][source]

Just return the input data

Parameters:
  • X (pd.DataFrame) – data to be transformed

  • y (pd.DataFrame | None) – target data to be transformed

Returns:

unchanged data pd.DataFrame | None: unchanged target data

Return type:

pd.DataFrame

class qsprpred.data.processing.step.Shuffle(seed: int | None = None)[source]

Bases: Step, Randomized

Step that shuffles the data

Variables:

randomState (int | None) – Seed to randomize the shuffle.

Initialize the shuffle step

Parameters:

seed (int | None) – Seed to randomize the shuffle.

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

property randomState: int | None

Get the random state for the object.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None][source]

Shuffle the data

Parameters:
  • X (pd.DataFrame) – data to be shuffled

  • y (pd.DataFrame | None) – target data to be shuffled

Returns:

shuffled data pd.DataFrame | None: shuffled target data

Return type:

pd.DataFrame

class qsprpred.data.processing.step.Step(**kwargs)[source]

Bases: JSONSerializable

A data preprocessing step that can be applied to a dataset

Initialize the step

fit(X: DataFrame, y: None | DataFrame = None)[source]

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None][source]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

abstract transform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None][source]

Apply the step to the dataset

Note. the step should not modify the original data

Parameters:
  • X (pd.DataFrame) – data to be transformed

  • y (pd.DataFrame) – target data to be transformed

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

qsprpred.data.processing.target_transformers module

class qsprpred.data.processing.target_transformers.Discretizer(target: str, th: list[float] | float)[source]

Bases: TargetTransformer

Discretizes the target data into bins.

Note. using this step in a pipeline may break the subsequent model training as the discretizer does not update the targetProperties of the dataset. It is recommended to use the makeClassification method of the dataset instead, see the documentation of the QSPRDataSet class.

Variables:
  • target (str) – name of the target property to be discretized

  • th (list[float]) – thresholds for the bins

  • le (LabelEncoder) – label encoder for multi-class discretization, only used if more than one threshold is provided.

Initialize the discretizer.

Parameters:
  • target (str) – name of the target property to be discretized

  • th (list[float] | float) – thresholds for the bins. If a single float is provided, it will be used as a single threshold. If a list is provided, it should contain at least one value.

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getIntervals(discrete_values: Series) Series[source]

Transform the discretized values to intervals.

Parameters:

discrete_values (pd.Series) – discretized values

Returns:

intervals corresponding to the discretized values

Return type:

pd.Series

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: DataFrame | None = None) tuple[DataFrame, DataFrame | None][source]

Discretize the target data into bins.

Parameters:
  • X (pd.DataFrame) – features

  • y (pd.DataFrame | None) – target data to be discretized

Returns:

data pd.DataFrame | None: (discretized) target data

Return type:

pd.DataFrame

class qsprpred.data.processing.target_transformers.SimpleTargetTransformer(target: str, transformation: Literal['log10', 'log2', 'log', 'sqrt', 'cbrt', 'exp', 'square', 'cube', 'reciprocal'])[source]

Bases: TargetTransformer

Applies a simple transformation to the target data.

Variables:
  • transform_dict (dict) – dictionary of available transformations

  • transformer (callable) – numpy function

Initialize the SklearnStep

Parameters:
  • target (str) – name of the target property to be transformed

  • transformation (str) – transformer function, should be

  • log10 (one of)

  • log2

  • log

  • sqrt

  • cbrt

  • exp

  • square

  • cube

  • reciprocal

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getInverseTransformer() callable[source]

Get the inverse transformer function

Returns:

inverse transformer function

Return type:

callable

getTransformer() callable[source]

Get the transformer function

Returns:

transformer function

Return type:

callable

inverseTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None][source]

Inverse transform the data using the inverse transformer

Parameters:
  • X (pd.DataFrame) – data to be transformed

  • y (pd.DataFrame | None) – target data to be transformed

Returns:

transformed data pd.DataFrame | None: (transformed) target data

Return type:

pd.DataFrame

inverse_transform_dict = {'cbrt': <function SimpleTargetTransformer.<lambda>>, 'cube': <function SimpleTargetTransformer.<lambda>>, 'exp': <function SimpleTargetTransformer.<lambda>>, 'log': <function SimpleTargetTransformer.<lambda>>, 'log10': <function SimpleTargetTransformer.<lambda>>, 'log2': <function SimpleTargetTransformer.<lambda>>, 'reciprocal': <function SimpleTargetTransformer.<lambda>>, 'sqrt': <function SimpleTargetTransformer.<lambda>>, 'square': <function SimpleTargetTransformer.<lambda>>}
toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None][source]

Transform the data using the transformer

Parameters:
  • X (pd.DataFrame) – data to be transformed

  • y (pd.DataFrame | None) – target data to be transformed

Returns:

transformed data pd.DataFrame | None: (transformed) target data

Return type:

pd.DataFrame

transform_dict = {'cbrt': <function SimpleTargetTransformer.<lambda>>, 'cube': <function SimpleTargetTransformer.<lambda>>, 'exp': <function SimpleTargetTransformer.<lambda>>, 'log': <function SimpleTargetTransformer.<lambda>>, 'log10': <function SimpleTargetTransformer.<lambda>>, 'log2': <function SimpleTargetTransformer.<lambda>>, 'reciprocal': <function SimpleTargetTransformer.<lambda>>, 'sqrt': <function SimpleTargetTransformer.<lambda>>, 'square': <function SimpleTargetTransformer.<lambda>>}
class qsprpred.data.processing.target_transformers.TargetTransformer(**kwargs)[source]

Bases: Step

Initialize the step

fit(X: DataFrame, y: None | DataFrame = None)

Fit the step to the dataset

If the step requires fitting to the data, this method should be implemented.

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

fitTransform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame | None]

Fit the step to the dataset and apply it

Parameters:
  • X (pd.DataFrame) – training data

  • y (pd.DataFrame) – training targets

Returns:

transformed data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

property fitted: bool

Check if the step is fitted

Returns:

True if the step is fitted, False otherwise

Return type:

bool

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

abstract transform(X: DataFrame, y: None | DataFrame = None) tuple[DataFrame, DataFrame][source]

Transform the target data.

Parameters:
  • X (pd.DataFrame) – features

  • y (pd.DataFrame) – target data (to be transformed)

Returns:

data pd.DataFrame: (transformed) target data

Return type:

pd.DataFrame

qsprpred.data.processing.tests module

class qsprpred.data.processing.tests.TestApplicabilityDomain(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase

Test the applicability domain.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Create a small test dataset with MorganFP descriptors.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testApplicabilityDomain()[source]

Test the applicability domain fitting, transforming and serialization.

testContinousAD()[source]

Test the applicability domain for continuous data.

class qsprpred.data.processing.tests.TestDataFilters(methodName='runTest')[source]

Bases: QSPRTestCase, StepCheckMixIn

Test the data filters, which filter the dataset based on properties.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkFitTransform(step: Step, dataset: QSPRTable, fromfile=False) Tuple[DataFrame, DataFrame | None]

Check basic step fit and transform functionality.

checkStep(step: Step, dataset: QSPRTable) Tuple[DataFrame, DataFrame | None]

Check basic step functionality and serialization.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Hook method for setting up the test fixture before exercising it.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testCategoryFilter()[source]

Test the category filter that drops values from a dataset property.

testNaNFilter()[source]

Test the NaN filter, which drops rows with NaN values from dataset.

testOutlierFilter()[source]

Test the outlier filter, which removes outliers from the dataset.

testRepeatsFilter()[source]

Test the duplicate filter, which drops rows with identical descriptors from dataset.

class qsprpred.data.processing.tests.TestDatasetPipeline(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase

Test the dataset pipeline.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Create a small test dataset for the dataset pipeline.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testApply()[source]

Test the dataset pipeline apply method.

testSerialization()[source]

Test the dataset pipeline serialization.

class qsprpred.data.processing.tests.TestDummyStep(methodName='runTest')[source]

Bases: QSPRTestCase, StepCheckMixIn

Test the dummy step

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkFitTransform(step: Step, dataset: QSPRTable, fromfile=False) Tuple[DataFrame, DataFrame | None]

Check basic step fit and transform functionality.

checkStep(step: Step, dataset: QSPRTable) Tuple[DataFrame, DataFrame | None]

Check basic step functionality and serialization.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Create a small test dataset with random descriptors.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testDummyStep()[source]

Test the dummy step.

class qsprpred.data.processing.tests.TestFeatureFilters(methodName='runTest')[source]

Bases: QSPRTestCase, StepCheckMixIn

Tests to check if the feature filters work on their own.

Note: This also tests the DataframeDescriptorSet, as it is used to add test descriptors.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkFitTransform(step: Step, dataset: QSPRTable, fromfile=False) Tuple[DataFrame, DataFrame | None]

Check basic step fit and transform functionality.

checkStep(step: Step, dataset: QSPRTable) Tuple[DataFrame, DataFrame | None]

Check basic step functionality and serialization.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
recalculateWithMultiIndex()[source]

Change the dataset to have a multi-index.

run(result=None)
setUp()[source]

Set up the small test Dataframe.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testBorutaFilter = None
testBorutaFilter_0(**kw)

Test the Boruta filter, which removes the features which are statistically as [with use_index_cols=True] relevant as random features.

testBorutaFilter_1(**kw)

Test the Boruta filter, which removes the features which are statistically as [with use_index_cols=False] relevant as random features.

testDefaultDescriptorAdd()[source]

Test adding without index columns.

testHighCorrelationFilter = None
testHighCorrelationFilter_0(**kw)

Test the high correlation filter, which drops features with a correlation [with use_index_cols=True] above a threshold.

testHighCorrelationFilter_1(**kw)

Test the high correlation filter, which drops features with a correlation [with use_index_cols=False] above a threshold.

testLowVarianceFilter = None
testLowVarianceFilter_0(**kw)

Test the low variance filter, which drops features with a variance below [with use_index_cols=True] a threshold.

Parameters:

use_index_cols (bool) – If True, a multi-index is used for the dataset.

testLowVarianceFilter_1(**kw)

Test the low variance filter, which drops features with a variance below [with use_index_cols=False] a threshold.

Parameters:

use_index_cols (bool) – If True, a multi-index is used for the dataset.

class qsprpred.data.processing.tests.TestFeatureTransformers(methodName='runTest')[source]

Bases: QSPRTestCase, StepCheckMixIn

Test the sklearn step which wraps a sklearn transformer.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkFitTransform(step: Step, dataset: QSPRTable, fromfile=False) Tuple[DataFrame, DataFrame | None]

Check basic step fit and transform functionality.

checkStep(step: Step, dataset: QSPRTable) Tuple[DataFrame, DataFrame | None]

Check basic step functionality and serialization.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Create a small test dataset with random descriptors.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testSklearnStep()[source]

Test the sklearn step.

class qsprpred.data.processing.tests.TestImputers(methodName='runTest')[source]

Bases: QSPRTestCase, StepCheckMixIn

Test the sklearn step which wraps a sklearn imputer.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkFitTransform(step: Step, dataset: QSPRTable, fromfile=False) Tuple[DataFrame, DataFrame | None]

Check basic step fit and transform functionality.

checkStep(step: Step, dataset: QSPRTable) Tuple[DataFrame, DataFrame | None]

Check basic step functionality and serialization.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Create a small test dataset with random descriptors.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testFeatureImputer()[source]

Test the feature imputer step.

testTargetImputer()[source]

Test the target imputer step.

class qsprpred.data.processing.tests.TestMolProcessor(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

class TestingProcessor(id_prop)[source]

Bases: MolProcessor

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel

Whether the processor supports parallel processing.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Hook method for setting up the test fixture before exercising it.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testMolProcess = None
testMolProcess_00_1_50_None_True_None_None(**kw)
testMolProcess_01_1_50_None_True_None__a_1_(**kw)
testMolProcess_02_1_50_None_True__1_2__None(**kw)
testMolProcess_03_1_50_None_True__1_2___a_1_(**kw)
testMolProcess_04_1_50_None_False_None_None(**kw)
testMolProcess_05_1_50_None_False_None__a_1_(**kw)
testMolProcess_06_1_50_None_False__1_2__None(**kw)
testMolProcess_07_1_50_None_False__1_2___a_1_(**kw)
testMolProcess_08_1_50__fu_CL__True_None_None(**kw)
testMolProcess_09_1_50__fu_CL__True_None__a_1_(**kw)
testMolProcess_10_1_50__fu_CL__True__1_2__None(**kw)
testMolProcess_11_1_50__fu_CL__True__1_2___a_1_(**kw)
testMolProcess_12_1_50__fu_CL__False_None_None(**kw)
testMolProcess_13_1_50__fu_CL__False_None__a_1_(**kw)
testMolProcess_14_1_50__fu_CL__False__1_2__None(**kw)
testMolProcess_15_1_50__fu_CL__False__1_2___a_1_(**kw)
testMolProcess_16_1_50__SMILES__True_None_None(**kw)
testMolProcess_17_1_50__SMILES__True_None__a_1_(**kw)
testMolProcess_18_1_50__SMILES__True__1_2__None(**kw)
testMolProcess_19_1_50__SMILES__True__1_2___a_1_(**kw)
testMolProcess_20_1_50__SMILES__False_None_None(**kw)
testMolProcess_21_1_50__SMILES__False_None__a_1_(**kw)
testMolProcess_22_1_50__SMILES__False__1_2__None(**kw)
testMolProcess_23_1_50__SMILES__False__1_2___a_1_(**kw)
testMolProcess_24_1_None_None_True_None_None(**kw)
testMolProcess_25_1_None_None_True_None__a_1_(**kw)
testMolProcess_26_1_None_None_True__1_2__None(**kw)
testMolProcess_27_1_None_None_True__1_2___a_1_(**kw)
testMolProcess_28_1_None_None_False_None_None(**kw)
testMolProcess_29_1_None_None_False_None__a_1_(**kw)
testMolProcess_30_1_None_None_False__1_2__None(**kw)
testMolProcess_31_1_None_None_False__1_2___a_1_(**kw)
testMolProcess_32_1_None__fu_CL__True_None_None(**kw)
testMolProcess_33_1_None__fu_CL__True_None__a_1_(**kw)
testMolProcess_34_1_None__fu_CL__True__1_2__None(**kw)
testMolProcess_35_1_None__fu_CL__True__1_2___a_1_(**kw)
testMolProcess_36_1_None__fu_CL__False_None_None(**kw)
testMolProcess_37_1_None__fu_CL__False_None__a_1_(**kw)
testMolProcess_38_1_None__fu_CL__False__1_2__None(**kw)
testMolProcess_39_1_None__fu_CL__False__1_2___a_1_(**kw)
testMolProcess_40_1_None__SMILES__True_None_None(**kw)
testMolProcess_41_1_None__SMILES__True_None__a_1_(**kw)
testMolProcess_42_1_None__SMILES__True__1_2__None(**kw)
testMolProcess_43_1_None__SMILES__True__1_2___a_1_(**kw)
testMolProcess_44_1_None__SMILES__False_None_None(**kw)
testMolProcess_45_1_None__SMILES__False_None__a_1_(**kw)
testMolProcess_46_1_None__SMILES__False__1_2__None(**kw)
testMolProcess_47_1_None__SMILES__False__1_2___a_1_(**kw)
testMolProcess_48_2_50_None_True_None_None(**kw)
testMolProcess_49_2_50_None_True_None__a_1_(**kw)
testMolProcess_50_2_50_None_True__1_2__None(**kw)
testMolProcess_51_2_50_None_True__1_2___a_1_(**kw)
testMolProcess_52_2_50_None_False_None_None(**kw)
testMolProcess_53_2_50_None_False_None__a_1_(**kw)
testMolProcess_54_2_50_None_False__1_2__None(**kw)
testMolProcess_55_2_50_None_False__1_2___a_1_(**kw)
testMolProcess_56_2_50__fu_CL__True_None_None(**kw)
testMolProcess_57_2_50__fu_CL__True_None__a_1_(**kw)
testMolProcess_58_2_50__fu_CL__True__1_2__None(**kw)
testMolProcess_59_2_50__fu_CL__True__1_2___a_1_(**kw)
testMolProcess_60_2_50__fu_CL__False_None_None(**kw)
testMolProcess_61_2_50__fu_CL__False_None__a_1_(**kw)
testMolProcess_62_2_50__fu_CL__False__1_2__None(**kw)
testMolProcess_63_2_50__fu_CL__False__1_2___a_1_(**kw)
testMolProcess_64_2_50__SMILES__True_None_None(**kw)
testMolProcess_65_2_50__SMILES__True_None__a_1_(**kw)
testMolProcess_66_2_50__SMILES__True__1_2__None(**kw)
testMolProcess_67_2_50__SMILES__True__1_2___a_1_(**kw)
testMolProcess_68_2_50__SMILES__False_None_None(**kw)
testMolProcess_69_2_50__SMILES__False_None__a_1_(**kw)
testMolProcess_70_2_50__SMILES__False__1_2__None(**kw)
testMolProcess_71_2_50__SMILES__False__1_2___a_1_(**kw)
testMolProcess_72_2_None_None_True_None_None(**kw)
testMolProcess_73_2_None_None_True_None__a_1_(**kw)
testMolProcess_74_2_None_None_True__1_2__None(**kw)
testMolProcess_75_2_None_None_True__1_2___a_1_(**kw)
testMolProcess_76_2_None_None_False_None_None(**kw)
testMolProcess_77_2_None_None_False_None__a_1_(**kw)
testMolProcess_78_2_None_None_False__1_2__None(**kw)
testMolProcess_79_2_None_None_False__1_2___a_1_(**kw)
testMolProcess_80_2_None__fu_CL__True_None_None(**kw)
testMolProcess_81_2_None__fu_CL__True_None__a_1_(**kw)
testMolProcess_82_2_None__fu_CL__True__1_2__None(**kw)
testMolProcess_83_2_None__fu_CL__True__1_2___a_1_(**kw)
testMolProcess_84_2_None__fu_CL__False_None_None(**kw)
testMolProcess_85_2_None__fu_CL__False_None__a_1_(**kw)
testMolProcess_86_2_None__fu_CL__False__1_2__None(**kw)
testMolProcess_87_2_None__fu_CL__False__1_2___a_1_(**kw)
testMolProcess_88_2_None__SMILES__True_None_None(**kw)
testMolProcess_89_2_None__SMILES__True_None__a_1_(**kw)
testMolProcess_90_2_None__SMILES__True__1_2__None(**kw)
testMolProcess_91_2_None__SMILES__True__1_2___a_1_(**kw)
testMolProcess_92_2_None__SMILES__False_None_None(**kw)
testMolProcess_93_2_None__SMILES__False_None__a_1_(**kw)
testMolProcess_94_2_None__SMILES__False__1_2__None(**kw)
testMolProcess_95_2_None__SMILES__False__1_2___a_1_(**kw)
class qsprpred.data.processing.tests.TestPipeline(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase

Test the dataset pipeline.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Create a small test dataset with random descriptors.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testAddStep()[source]

Test the pipeline add step method.

testApply()[source]

Test the pipeline apply method.

testApplyWithApplyTo()[source]

Test the pipeline apply method with apply_to argument.

testApplyWithFitOn()[source]

Test the pipeline apply method with fit_on argument.

testApplyWithFixedSteps()[source]

Test the pipeline apply method with fixed steps.

testOrderSteps()[source]

Test the pipeline order steps method.

testRemoveStep()[source]

Test the pipeline remove step method.

testSerialization()[source]

Test the pipeline serialization.

testSkipping()[source]

Test the pipeline skipping steps.

class qsprpred.data.processing.tests.TestShuffle(methodName='runTest')[source]

Bases: QSPRTestCase, StepCheckMixIn

Test the shuffle step in the pipeline.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkFitTransform(step: Step, dataset: QSPRTable, fromfile=False) Tuple[DataFrame, DataFrame | None]

Check basic step fit and transform functionality.

checkStep(step: Step, dataset: QSPRTable) Tuple[DataFrame, DataFrame | None]

Check basic step functionality and serialization.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Create a small test dataset with random descriptors.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testShuffle()[source]

Test the shuffle step.

class qsprpred.data.processing.tests.TestTargetTransformers(methodName='runTest')[source]

Bases: QSPRTestCase, StepCheckMixIn

Test the sklearn step which wraps a sklearn transformer for targets.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkFitTransform(step: Step, dataset: QSPRTable, fromfile=False) Tuple[DataFrame, DataFrame | None]

Check basic step fit and transform functionality.

checkStep(step: Step, dataset: QSPRTable) Tuple[DataFrame, DataFrame | None]

Check basic step functionality and serialization.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Create a small test dataset with random descriptors.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testDiscretizer()[source]

Test the discretizer step.

testSimpleTargetTransformer()[source]

Test the simple target transformer.

qsprpred.data.processing.tests.getCombos()[source]

Module contents