qsprpred.extra.models package

Submodules

qsprpred.extra.models.pcm module

Specialized models for proteochemometric models (PCM).

class qsprpred.extra.models.pcm.PCMModel(base_dir: str, alg: Type | None = None, name: str | None = None, parameters: dict | None = None, autoload=True, random_state: int | None = None)[source]

Bases: QSPRModel, ABC

Base class for PCM models.

Extension of QSPRModel for proteochemometric models (PCM). It modifies the predictMols method to handle PCM descriptors and specification of protein ids.

Initialize a QSPR model instance.

If the model is loaded from file, the data set is not required. Note that the data set is required for fitting and optimization.

Parameters:
  • base_dir (str) – base directory of the model, the model files are stored in a subdirectory {baseDir}/{outDir}/

  • alg (Type) – estimator class

  • name (str) – name of the model

  • parameters (dict) – dictionary of algorithm specific parameters

  • autoload (bool) – if True, the estimator is loaded from the serialized file if it exists, otherwise a new instance of alg is created

  • random_state (int) – Random state to use for shuffling and other random operations.

checkData(ds: QSPRDataset, exception: bool = True) bool

Check if the model has a data set.

Parameters:
  • ds (QSPRDataset) – data set to check

  • exception (bool) – if true, an exception is raised if no data is set

Returns:

True if data is set, False otherwise (if exception is False)

Return type:

bool

property classPath: str

Return the fully classified path of the model.

Returns:

class path of the model

Return type:

str

cleanFiles()

Clean up the model files.

Removes the model directory and all its contents.

convertToNumpy(X: DataFrame | ndarray | QSPRDataset, y: DataFrame | ndarray | QSPRDataset | None = None) tuple[numpy.ndarray, numpy.ndarray] | ndarray

Convert the given data matrix and target matrix to np.ndarray format.

Parameters:
  • X (pd.DataFrame, np.ndarray, QSPRDataset) – data matrix

  • y (pd.DataFrame, np.ndarray, QSPRDataset) – target matrix

Returns:

data matrix and/or target matrix in np.ndarray format

createPredictionDatasetFromMols(mols: list[str], protein_id: str, smiles_standardizer: str | Callable = 'chembl', n_jobs: int = 1, fill_value: float = nan) tuple[qsprpred.extra.data.tables.pcm.PCMDataSet, numpy.ndarray][source]

Create a prediction data set of compounds using a PCM model given as a list of SMILES strings and a protein identifier. The protein identifier is used to calculate the protein descriptors.

Parameters:
  • mols (list[str]) – List of SMILES strings.

  • protein_id (str) – Protein identifier.

  • smiles_standardizer (str | Callable, optional) – Smiles standardizer. Defaults to “chembl”.

  • n_jobs (int, optional) – Number of parallel jobs. Defaults to 1.

  • fill_value (float, optional) – Value to fill missing features with. Defaults to np.nan.

Returns:

Dataset with the features calculated for the molecules.

Return type:

PCMDataSet

abstract fit(X: DataFrame | ndarray, y: DataFrame | ndarray, estimator: Any = None, mode: EarlyStoppingMode = EarlyStoppingMode.NOT_RECORDING, monitor: FitMonitor = None, **kwargs) Any | tuple[Any, int] | None

Fit the model to the given data matrix or QSPRDataset.

Note. convertToNumpy can be called here, to convert the input data to

np.ndarray format.

Note. if no estimator is given, the estimator instance of the model is used.

Note. if a model supports early stopping, the fit function should have the

early_stopping decorator and the mode argument should be used to set the early stopping mode. If the model does not support early stopping, the mode argument is ignored.

Parameters:
  • X (pd.DataFrame, np.ndarray) – data matrix to fit

  • y (pd.DataFrame, np.ndarray) – target matrix to fit

  • estimator (Any) – estimator instance to use for fitting

  • mode (EarlyStoppingMode) – early stopping mode

  • monitor (FitMonitor) – monitor for the fitting process, if None, the base monitor is used

  • kwargs – additional arguments to pass to the fit method of the estimator

Returns:

fitted estimator instance int: in case of early stopping, the number of iterations

after which the model stopped training

Return type:

Any

fitDataset(ds: QSPRDataset, monitor=None, mode=EarlyStoppingMode.OPTIMAL, save_model=True, save_data=False, **kwargs) str

Train model on the whole attached data set.

** IMPORTANT ** For models that supportEarlyStopping, CrossValAssessor should be run first, so that the average number of epochs from the cross-validation with early stopping can be used for fitting the model.

Parameters:
  • ds (QSPRDataset) – data set to fit this model on

  • monitor (FitMonitor) – monitor for the fitting process, if None, the base monitor is used

  • mode (EarlyStoppingMode) – early stopping mode for models that support early stopping, by default fit the ‘optimal’ number of epochs previously stopped at in model assessment on train or test set, to avoid the use of extra data for a validation set.

  • save_model (bool) – save the model to file

  • save_data (bool) – save the supplied dataset to file

  • kwargs – additional arguments to pass to fit

Returns:

path to the saved model, if save_model is True

Return type:

str

classmethod fromFile(filename: str) QSPRModel

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getParameters(new_parameters) dict | None

Get the model parameters combined with the given parameters.

If both the model and the given parameters contain the same key, the value from the given parameters is used.

Parameters:

new_parameters (dict) – dictionary of new parameters to add

Returns:

dictionary of model parameters

Return type:

dict

static handleInvalidsInPredictions(mols: list[str], predictions: ndarray | list[numpy.ndarray], failed_mask: ndarray) ndarray

Replace invalid predictions with None.

Parameters:
  • mols (MoleculeTable) – molecules for which the predictions were made

  • predictions (np.ndarray) – predictions made by the model

  • failed_mask (np.ndarray) – boolean mask of failed predictions

Returns:

predictions with invalids replaced by None

Return type:

np.ndarray

initFromDataset(data: QSPRDataset | None)
initRandomState(random_state)

Set random state if applicable. Defaults to random state of dataset if no random state is provided,

Parameters:

random_state (int) – Random state to use for shuffling and other random operations.

property isMultiTask: bool

Return if model is a multitask model, taken from the data set or deserialized from file if the model is loaded without data.

Returns:

True if model is a multitask model

Return type:

bool

abstract loadEstimator(params: dict | None = None) object

Initialize estimator instance with the given parameters.

If params is None, the default parameters will be used.

Parameters:

params (dict) – algorithm parameters

Returns:

initialized estimator instance

Return type:

object

abstract loadEstimatorFromFile(params: dict | None = None) object

Load estimator instance from file and apply the given parameters.

Parameters:

params (dict) – algorithm parameters

Returns:

initialized estimator instance

Return type:

object

classmethod loadParamsGrid(fname: str, optim_type: str, model_types: str) ndarray

Load parameter grids for bayes or grid search parameter optimization from json file.

Parameters:
  • fname (str) – file name of json file containing array with three columns containing modeltype, optimization type (grid or bayes) and model type

  • optim_type (str) – optimization type (grid or bayes)

  • model_types (list of str) – model type for hyperparameter optimization (e.g. RF)

Returns:

array with three columns containing modeltype, optimization type (grid or bayes) and model type

Return type:

np.ndarray

property metaFile: str
property optimalEpochs: int | None

Return the optimal number of epochs for early stopping.

Returns:

optimal number of epochs

Return type:

int | None

property outDir: str

Return output directory of the model, the model files are stored in this directory ({baseDir}/{name}).

Returns:

output directory of the model

Return type:

str

property outPrefix: str

Return output prefix of the model files.

The model files are stored with this prefix (i.e. {outPrefix}_meta.json).

Returns:

output prefix of the model files

Return type:

str

abstract predict(X: DataFrame | ndarray | QSPRDataset, estimator: Any = None) ndarray

Make predictions for the given data matrix or QSPRDataset.

Note. convertToNumpy can be called here, to convert the input data to np.ndarray format.

Note. if no estimator is given, the estimator instance of the model

is used.

Parameters:
  • X (pd.DataFrame, np.ndarray, QSPRDataset) – data matrix to predict

  • estimator (Any) – estimator instance to use for fitting

Returns:

2D array containing the predictions, where each row corresponds to a sample in the data and each column to a target property

Return type:

np.ndarray

predictDataset(dataset: QSPRDataset, use_probas: bool = False) ndarray | list[numpy.ndarray]

Make predictions for the given dataset.

Parameters:
  • dataset – a QSPRDataset instance

  • use_probas – use probabilities if this is a classification model

Returns:

an array of predictions or a list of arrays of predictions (for classification models with use_probas=True)

Return type:

np.ndarray | list[np.ndarray]

predictMols(mols: list[str], protein_id: str, use_probas: bool = False, smiles_standardizer: str | Callable = 'chembl', n_jobs: int = 1, fill_value: float = nan) ndarray[source]

Predict the target properties of a list of molecules using a PCM model. The protein identifier is used to calculate the protein descriptors for a target of interest.

Parameters:
  • mols (list[str]) – List of SMILES strings.

  • protein_id (str) – Protein identifier.

  • use_probas (bool, optional) – Whether to return class probabilities. Defaults to False.

  • smiles_standardizer (str | Callable, optional) – Smiles standardizer. Defaults to “chembl”.

  • n_jobs (int, optional) – Number of parallel jobs. Defaults to 1.

  • fill_value (float, optional) – Value to fill missing features with. Defaults to np.nan.

Returns:

Array of predictions.

Return type:

np.ndarray

abstract predictProba(X: DataFrame | ndarray | QSPRDataset, estimator: Any = None) list[numpy.ndarray]

Make predictions for the given data matrix or QSPRDataset, but use probabilities for classification models. Does not work with regression models.

Note. convertToNumpy can be called here, to convert the input data to np.ndarray format.

Note. if no estimator is given, the estimator instance of the model

is used.

Parameters:
  • X (pd.DataFrame, np.ndarray, QSPRDataset) – data matrix to make predict

  • estimator (Any) – estimator instance to use for fitting

Returns:

a list of 2D arrays containing the probabilities for each class, where each array corresponds to a target property, each row to a sample in the data and each column to a class

Return type:

list[np.ndarray]

save(save_estimator=False)

Save model to file.

Parameters:

save_estimator (bool) – Explicitly save the estimator to file, if True. Note that some models may save the estimator by default even if this argument is False.

Returns:

absolute path to the metafile of the saved model str:

absolute path to the saved estimator, if include_estimator is True

Return type:

str

abstract saveEstimator() str

Save the underlying estimator to file.

Returns:

absolute path to the saved estimator

Return type:

path (str)

setParams(params: dict | None, reset_estimator: bool = True)

Set model parameters. The estimator is also updated with the new parameters if ‘reload_estimator’ is True.

Parameters:
  • params (dict) – dictionary of model parameters or None to reset the parameters

  • reset_estimator (bool) – if True, the estimator is reinitialized with the new parameters

abstract property supportsEarlyStopping: bool

Return if the model supports early stopping.

Returns:

True if the model supports early stopping

Return type:

bool

property task: ModelTasks

Return the task of the model, taken from the data set or deserialized from file if the model is loaded without data.

Returns:

task of the model

Return type:

ModelTasks

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON()
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.extra.models.pcm.SklearnPCMModel(base_dir: str, alg=None, name: str | None = None, parameters: dict | None = None, autoload: bool = True, random_state: int | None = None)[source]

Bases: SklearnModel, PCMModel

Wrapper for sklearn models for PCM.

Just replaces some methods in SklearnModel with those in PCMModel.

Initialize SklearnModel model.

Parameters:
  • base_dir (str) – base directory for model

  • alg (Type) – sklearn model class

  • name (str) – customized model name

  • parameters (dict) – model parameters

  • autoload (bool) – load model from file

  • random_state (int) – seed for the random state

checkData(ds: QSPRDataset, exception: bool = True) bool

Check if the model has a data set.

Parameters:
  • ds (QSPRDataset) – data set to check

  • exception (bool) – if true, an exception is raised if no data is set

Returns:

True if data is set, False otherwise (if exception is False)

Return type:

bool

property classPath: str

Return the fully classified path of the model.

Returns:

class path of the model

Return type:

str

cleanFiles()

Clean up the model files.

Removes the model directory and all its contents.

convertToNumpy(X: DataFrame | ndarray | QSPRDataset, y: DataFrame | ndarray | QSPRDataset | None = None) tuple[numpy.ndarray, numpy.ndarray] | ndarray

Convert the given data matrix and target matrix to np.ndarray format.

Parameters:
  • X (pd.DataFrame, np.ndarray, QSPRDataset) – data matrix

  • y (pd.DataFrame, np.ndarray, QSPRDataset) – target matrix

Returns:

data matrix and/or target matrix in np.ndarray format

createPredictionDatasetFromMols(mols: list[str], protein_id: str, smiles_standardizer: str | Callable = 'chembl', n_jobs: int = 1, fill_value: float = nan) tuple[qsprpred.extra.data.tables.pcm.PCMDataSet, numpy.ndarray]

Create a prediction data set of compounds using a PCM model given as a list of SMILES strings and a protein identifier. The protein identifier is used to calculate the protein descriptors.

Parameters:
  • mols (list[str]) – List of SMILES strings.

  • protein_id (str) – Protein identifier.

  • smiles_standardizer (str | Callable, optional) – Smiles standardizer. Defaults to “chembl”.

  • n_jobs (int, optional) – Number of parallel jobs. Defaults to 1.

  • fill_value (float, optional) – Value to fill missing features with. Defaults to np.nan.

Returns:

Dataset with the features calculated for the molecules.

Return type:

PCMDataSet

fit(X: DataFrame | ndarray, y: DataFrame | ndarray, estimator: Any = None, mode: Any = None, monitor: None = None, **kwargs)

Fit the model to the given data matrix or QSPRDataset.

Note. convertToNumpy can be called here, to convert the input data to

np.ndarray format.

Note. if no estimator is given, the estimator instance of the model is used.

Note. if a model supports early stopping, the fit function should have the

early_stopping decorator and the mode argument should be used to set the early stopping mode. If the model does not support early stopping, the mode argument is ignored.

Parameters:
  • X (pd.DataFrame, np.ndarray) – data matrix to fit

  • y (pd.DataFrame, np.ndarray) – target matrix to fit

  • estimator (Any) – estimator instance to use for fitting

  • mode (EarlyStoppingMode) – early stopping mode

  • monitor (FitMonitor) – monitor for the fitting process, if None, the base monitor is used

  • kwargs – additional arguments to pass to the fit method of the estimator

Returns:

fitted estimator instance int: in case of early stopping, the number of iterations

after which the model stopped training

Return type:

Any

fitDataset(ds: QSPRDataset, monitor=None, mode=EarlyStoppingMode.OPTIMAL, save_model=True, save_data=False, **kwargs) str

Train model on the whole attached data set.

** IMPORTANT ** For models that supportEarlyStopping, CrossValAssessor should be run first, so that the average number of epochs from the cross-validation with early stopping can be used for fitting the model.

Parameters:
  • ds (QSPRDataset) – data set to fit this model on

  • monitor (FitMonitor) – monitor for the fitting process, if None, the base monitor is used

  • mode (EarlyStoppingMode) – early stopping mode for models that support early stopping, by default fit the ‘optimal’ number of epochs previously stopped at in model assessment on train or test set, to avoid the use of extra data for a validation set.

  • save_model (bool) – save the model to file

  • save_data (bool) – save the supplied dataset to file

  • kwargs – additional arguments to pass to fit

Returns:

path to the saved model, if save_model is True

Return type:

str

classmethod fromFile(filename: str) QSPRModel

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getParameters(new_parameters) dict | None

Get the model parameters combined with the given parameters.

If both the model and the given parameters contain the same key, the value from the given parameters is used.

Parameters:

new_parameters (dict) – dictionary of new parameters to add

Returns:

dictionary of model parameters

Return type:

dict

static handleInvalidsInPredictions(mols: list[str], predictions: ndarray | list[numpy.ndarray], failed_mask: ndarray) ndarray

Replace invalid predictions with None.

Parameters:
  • mols (MoleculeTable) – molecules for which the predictions were made

  • predictions (np.ndarray) – predictions made by the model

  • failed_mask (np.ndarray) – boolean mask of failed predictions

Returns:

predictions with invalids replaced by None

Return type:

np.ndarray

initFromDataset(data: QSPRDataset | None)
initRandomState(random_state)

Set random state if applicable. Defaults to random state of dataset if no random state is provided,

Parameters:

random_state (int) – Random state to use for shuffling and other random operations.

property isMultiTask: bool

Return if model is a multitask model, taken from the data set or deserialized from file if the model is loaded without data.

Returns:

True if model is a multitask model

Return type:

bool

loadEstimator(params: dict | None = None) Any

Load estimator from alg and params.

Parameters:

params (dict) – parameters

loadEstimatorFromFile(params: dict | None = None, fallback_load: bool = True)

Load estimator from file.

Parameters:
  • params (dict) – parameters

  • fallback_load (bool) – if True, init estimator from alg and params if no estimator found at path

classmethod loadParamsGrid(fname: str, optim_type: str, model_types: str) ndarray

Load parameter grids for bayes or grid search parameter optimization from json file.

Parameters:
  • fname (str) – file name of json file containing array with three columns containing modeltype, optimization type (grid or bayes) and model type

  • optim_type (str) – optimization type (grid or bayes)

  • model_types (list of str) – model type for hyperparameter optimization (e.g. RF)

Returns:

array with three columns containing modeltype, optimization type (grid or bayes) and model type

Return type:

np.ndarray

property metaFile: str
property optimalEpochs: int | None

Return the optimal number of epochs for early stopping.

Returns:

optimal number of epochs

Return type:

int | None

property outDir: str

Return output directory of the model, the model files are stored in this directory ({baseDir}/{name}).

Returns:

output directory of the model

Return type:

str

property outPrefix: str

Return output prefix of the model files.

The model files are stored with this prefix (i.e. {outPrefix}_meta.json).

Returns:

output prefix of the model files

Return type:

str

predict(X: DataFrame | ndarray | QSPRDataset, estimator: Any = None)

See QSPRModel.predict.

predictDataset(dataset: QSPRDataset, use_probas: bool = False) ndarray | list[numpy.ndarray]

Make predictions for the given dataset.

Parameters:
  • dataset – a QSPRDataset instance

  • use_probas – use probabilities if this is a classification model

Returns:

an array of predictions or a list of arrays of predictions (for classification models with use_probas=True)

Return type:

np.ndarray | list[np.ndarray]

predictMols(mols: list[str], protein_id: str, use_probas: bool = False, smiles_standardizer: str | Callable = 'chembl', n_jobs: int = 1, fill_value: float = nan) ndarray

Predict the target properties of a list of molecules using a PCM model. The protein identifier is used to calculate the protein descriptors for a target of interest.

Parameters:
  • mols (list[str]) – List of SMILES strings.

  • protein_id (str) – Protein identifier.

  • use_probas (bool, optional) – Whether to return class probabilities. Defaults to False.

  • smiles_standardizer (str | Callable, optional) – Smiles standardizer. Defaults to “chembl”.

  • n_jobs (int, optional) – Number of parallel jobs. Defaults to 1.

  • fill_value (float, optional) – Value to fill missing features with. Defaults to np.nan.

Returns:

Array of predictions.

Return type:

np.ndarray

predictProba(X: DataFrame | ndarray | QSPRDataset, estimator: Any = None)

See QSPRModel.predictProba.

save(save_estimator=False)

Save model to file.

Parameters:

save_estimator (bool) – Explicitly save the estimator to file, if True. Note that some models may save the estimator by default even if this argument is False.

Returns:

absolute path to the metafile of the saved model str:

absolute path to the saved estimator, if include_estimator is True

Return type:

str

saveEstimator() str

See QSPRModel.saveEstimator.

setParams(params: dict | None, reset_estimator: bool = True)

Set model parameters. The estimator is also updated with the new parameters if ‘reload_estimator’ is True.

Parameters:
  • params (dict) – dictionary of model parameters or None to reset the parameters

  • reset_estimator (bool) – if True, the estimator is reinitialized with the new parameters

property supportsEarlyStopping: bool

Whether the model supports early stopping or not.

property task: ModelTasks

Return the task of the model, taken from the data set or deserialized from file if the model is loaded without data.

Returns:

task of the model

Return type:

ModelTasks

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON()
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

qsprpred.extra.models.random module

class qsprpred.extra.models.random.MedianDistributionAlgorithm[source]

Bases: RandomDistributionAlgorithm

fit(y_df: DataFrame)[source]
from_dict(loaded_dict)[source]
get_probas(X_test: ndarray)[source]
to_dict()[source]
class qsprpred.extra.models.random.RandomDistributionAlgorithm[source]

Bases: ABC

abstract fit(y_df: DataFrame)[source]
abstract from_dict(loaded_dict)[source]
abstract get_probas(X_test: ndarray)[source]
abstract to_dict()[source]
class qsprpred.extra.models.random.RandomModel(base_dir: str, alg: RandomDistributionAlgorithm, name: str | None = None, parameters: dict | None = None, autoload=True, random_state: int | None = None)[source]

Bases: QSPRModel

Initialize a QSPR model instance.

If the model is loaded from file, the data set is not required. Note that the data set is required for fitting and optimization.

Parameters:
  • base_dir (str) – base directory of the model, the model files are stored in a subdirectory {baseDir}/{outDir}/

  • name (str) – name of the model

  • parameters (dict) – dictionary of algorithm specific parameters

  • autoload (bool) – if True, the estimator is loaded from the serialized file if it exists, otherwise a new instance of alg is created

checkData(ds: QSPRDataset, exception: bool = True) bool

Check if the model has a data set.

Parameters:
  • ds (QSPRDataset) – data set to check

  • exception (bool) – if true, an exception is raised if no data is set

Returns:

True if data is set, False otherwise (if exception is False)

Return type:

bool

property classPath: str

Return the fully classified path of the model.

Returns:

class path of the model

Return type:

str

cleanFiles()

Clean up the model files.

Removes the model directory and all its contents.

convertToNumpy(X: DataFrame | ndarray | QSPRDataset, y: DataFrame | ndarray | QSPRDataset | None = None) tuple[numpy.ndarray, numpy.ndarray] | ndarray

Convert the given data matrix and target matrix to np.ndarray format.

Parameters:
  • X (pd.DataFrame, np.ndarray, QSPRDataset) – data matrix

  • y (pd.DataFrame, np.ndarray, QSPRDataset) – target matrix

Returns:

data matrix and/or target matrix in np.ndarray format

createPredictionDatasetFromMols(mols: list[str | rdkit.Chem.rdchem.Mol], smiles_standardizer: str | Callable[[str], str] = 'chembl', n_jobs: int = 1, fill_value: float = nan) tuple[qsprpred.data.tables.qspr.QSPRDataset, numpy.ndarray]

Create a QSPRDataset instance from a list of SMILES strings.

Parameters:
  • mols (list[str | Mol]) – list of SMILES strings

  • smiles_standardizer (str, callable) – smiles standardizer to use

  • n_jobs (int) – number of parallel jobs to use

  • fill_value (float) – value to fill for missing features

Returns:

a tuple containing the QSPRDataset instance and a boolean mask indicating which molecules failed to be processed

Return type:

tuple

fit(X: DataFrame | ndarray | QSPRDataset, y: DataFrame | ndarray | QSPRDataset, estimator: Type[RandomDistributionAlgorithm] = None, mode: EarlyStoppingMode = None, monitor: FitMonitor | None = None, **kwargs) RandomDistributionAlgorithm[source]

Fit the model to the given data matrix or QSPRDataset.

Parameters:
  • X (pd.DataFrame, np.ndarray, QSPRDataset) – data matrix to fit

  • y (pd.DataFrame, np.ndarray, QSPRDataset) – target matrix to fit

  • estimator (Any) – estimator instance to use for fitting

  • mode (EarlyStoppingMode) – early stopping mode, unused

  • monitor (FitMonitor) – monitor instance to track the fitting process, unused

  • kwargs – additional keyword arguments for the fit function

Returns:

fitted estimator instance

Return type:

(RandomDistributionAlgorithm)

fitDataset(ds: QSPRDataset, monitor=None, mode=EarlyStoppingMode.OPTIMAL, save_model=True, save_data=False, **kwargs) str

Train model on the whole attached data set.

** IMPORTANT ** For models that supportEarlyStopping, CrossValAssessor should be run first, so that the average number of epochs from the cross-validation with early stopping can be used for fitting the model.

Parameters:
  • ds (QSPRDataset) – data set to fit this model on

  • monitor (FitMonitor) – monitor for the fitting process, if None, the base monitor is used

  • mode (EarlyStoppingMode) – early stopping mode for models that support early stopping, by default fit the ‘optimal’ number of epochs previously stopped at in model assessment on train or test set, to avoid the use of extra data for a validation set.

  • save_model (bool) – save the model to file

  • save_data (bool) – save the supplied dataset to file

  • kwargs – additional arguments to pass to fit

Returns:

path to the saved model, if save_model is True

Return type:

str

classmethod fromFile(filename: str) QSPRModel

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getParameters(new_parameters) dict | None

Get the model parameters combined with the given parameters.

If both the model and the given parameters contain the same key, the value from the given parameters is used.

Parameters:

new_parameters (dict) – dictionary of new parameters to add

Returns:

dictionary of model parameters

Return type:

dict

static handleInvalidsInPredictions(mols: list[str], predictions: ndarray | list[numpy.ndarray], failed_mask: ndarray) ndarray

Replace invalid predictions with None.

Parameters:
  • mols (MoleculeTable) – molecules for which the predictions were made

  • predictions (np.ndarray) – predictions made by the model

  • failed_mask (np.ndarray) – boolean mask of failed predictions

Returns:

predictions with invalids replaced by None

Return type:

np.ndarray

initFromDataset(data: QSPRDataset | None)
initRandomState(random_state)

Set random state if applicable. Defaults to random state of dataset if no random state is provided,

Parameters:

random_state (int) – Random state to use for shuffling and other random operations.

property isMultiTask: bool

Return if model is a multitask model, taken from the data set or deserialized from file if the model is loaded without data.

Returns:

True if model is a multitask model

Return type:

bool

loadEstimator(params: dict | None = None) object[source]

Initialize estimator instance with the given parameters.

If params is None, the default parameters will be used.

Parameters:

params (dict) – algorithm parameters

Returns:

initialized estimator instance

Return type:

object

loadEstimatorFromFile(params: dict | None = None, fallback_load=True) object[source]

Load estimator instance from file and apply the given parameters.

Parameters:

params (dict) – algorithm parameters

Returns:

initialized estimator instance

Return type:

object

classmethod loadParamsGrid(fname: str, optim_type: str, model_types: str) ndarray

Load parameter grids for bayes or grid search parameter optimization from json file.

Parameters:
  • fname (str) – file name of json file containing array with three columns containing modeltype, optimization type (grid or bayes) and model type

  • optim_type (str) – optimization type (grid or bayes)

  • model_types (list of str) – model type for hyperparameter optimization (e.g. RF)

Returns:

array with three columns containing modeltype, optimization type (grid or bayes) and model type

Return type:

np.ndarray

property metaFile: str
property optimalEpochs: int | None

Return the optimal number of epochs for early stopping.

Returns:

optimal number of epochs

Return type:

int | None

property outDir: str

Return output directory of the model, the model files are stored in this directory ({baseDir}/{name}).

Returns:

output directory of the model

Return type:

str

property outPrefix: str

Return output prefix of the model files.

The model files are stored with this prefix (i.e. {outPrefix}_meta.json).

Returns:

output prefix of the model files

Return type:

str

predict(X: DataFrame | ndarray | QSPRDataset, estimator: Any = None) ndarray[source]

Make predictions for the given data matrix or QSPRDataset.

Parameters:
  • X (pd.DataFrame, np.ndarray, QSPRDataset) – data matrix to predict

  • estimator (Any) – estimator instance to use for fitting

Returns:

2D array containing the predictions, where each row corresponds to a sample in the data and each column to a target property

Return type:

np.ndarray

predictDataset(dataset: QSPRDataset, use_probas: bool = False) ndarray | list[numpy.ndarray]

Make predictions for the given dataset.

Parameters:
  • dataset – a QSPRDataset instance

  • use_probas – use probabilities if this is a classification model

Returns:

an array of predictions or a list of arrays of predictions (for classification models with use_probas=True)

Return type:

np.ndarray | list[np.ndarray]

predictMols(mols: List[str | Mol], use_probas: bool = False, smiles_standardizer: str | callable = 'chembl', n_jobs: int = 1, fill_value: float = nan, use_applicability_domain: bool = False) ndarray | list[numpy.ndarray]

Make predictions for the given molecules.

Parameters:
  • mols (List[str | Mol]) – list of SMILES strings

  • use_probas (bool) – use probabilities for classification models

  • smiles_standardizer – either chembl, old, or a partial function that reads and standardizes smiles.

  • n_jobs – Number of jobs to use for parallel processing.

  • fill_value – Value to use for missing values in the feature matrix.

  • use_applicability_domain – Use applicability domain to return if a molecule is within the applicability domain of the model.

Returns:

an array of predictions or a list of arrays of predictions

(for classification models with use_probas=True)

np.ndarray[bool]: boolean mask indicating which molecules fall

within the applicability domain of the model

Return type:

np.ndarray | list[np.ndarray]

predictProba(X: DataFrame | ndarray | QSPRDataset, estimator: Any = None)[source]

Make predictions for the given data matrix or QSPRDataset, but use probabilities for classification models. Does not work with regression models.

Note. convertToNumpy can be called here, to convert the input data to np.ndarray format.

Note. if no estimator is given, the estimator instance of the model

is used.

Parameters:
  • X (pd.DataFrame, np.ndarray, QSPRDataset) – data matrix to make predict

  • estimator (Any) – estimator instance to use for fitting

Returns:

a list of 2D arrays containing the probabilities for each class, where each array corresponds to a target property, each row to a sample in the data and each column to a class

Return type:

list[np.ndarray]

save(save_estimator=False)

Save model to file.

Parameters:

save_estimator (bool) – Explicitly save the estimator to file, if True. Note that some models may save the estimator by default even if this argument is False.

Returns:

absolute path to the metafile of the saved model str:

absolute path to the saved estimator, if include_estimator is True

Return type:

str

saveEstimator() str[source]

Save the underlying estimator to file.

Returns:

path to the saved estimator

Return type:

path (str)

setParams(params: dict | None, reset_estimator: bool = True)

Set model parameters. The estimator is also updated with the new parameters if ‘reload_estimator’ is True.

Parameters:
  • params (dict) – dictionary of model parameters or None to reset the parameters

  • reset_estimator (bool) – if True, the estimator is reinitialized with the new parameters

property supportsEarlyStopping: bool

Check if the model supports early stopping.

Returns:

whether the model supports early stopping or not

Return type:

(bool)

property task: ModelTasks

Return the task of the model, taken from the data set or deserialized from file if the model is loaded without data.

Returns:

task of the model

Return type:

ModelTasks

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON()
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.extra.models.random.RatioDistributionAlgorithm(random_state=None)[source]

Bases: RandomDistributionAlgorithm

Categorical distribution using ratio of categories as probabilities

Values of X are irrelevant, only distribution of y is used

Variables:
  • ratios (pd.DataFrame) – ratio of each category in y

  • random_state (int) – random state for reproducibility

fit(y_df: DataFrame)[source]

Calculate ratio of each category in y_df and store as probability distribution

from_dict(loaded_dict)[source]
get_probas(X_test: ndarray)[source]

Get probabilities of each category for each sample in X_test

to_dict()[source]
class qsprpred.extra.models.random.ScipyDistributionAlgorithm(distribution: ~scipy.stats._distn_infrastructure.rv_continuous = <scipy.stats._continuous_distns.norm_gen object>, params={}, random_state=None)[source]

Bases: RandomDistributionAlgorithm

fit(y_df: DataFrame)[source]
from_dict(loaded_dict)[source]
get_probas(X_test: ndarray)[source]
to_dict()[source]

qsprpred.extra.models.tests module

Test module for testing extra models.

class qsprpred.extra.models.tests.ModelDataSetsMixInExtras[source]

Bases: ModelDataSetsPathMixIn, DataSetsMixInExtras

This class holds the tests for testing models in extras.

clearGenerated()

Remove the directories that are used for testing.

createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • preparation_settings (dict) – dictionary containing preparation settings

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)

Create a small dataset for testing purposes.

Parameters:
  • name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.

  • target_props (list[TargetProperty] | list[dict], optional) – target properties.

  • preparation_settings (dict | None, optional) – preparation settings. Defaults to None.

  • protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.

  • random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

classmethod getAllDescriptors() list[qsprpred.data.descriptors.sets.DescriptorSet]

Return a list of all available molecule descriptor sets.

Returns:

list of MoleculeDescriptorSet objects

Return type:

list

classmethod getAllProteinDescriptors() list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]

Return a list of all available protein descriptor sets.

Returns:

list of ProteinDescriptorSet objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Return the default descriptor calculator combo.

static getDefaultPrep()

Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)
getPCMDF() DataFrame

Return a test dataframe with PCM data.

Returns:

dataframe with PCM data

Return type:

pd.DataFrame

getPCMSeqProvider() Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

Return a function that provides sequences for given accessions.

Returns:

function that provides sequences for given accessions

Return type:

Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() DataFrame

Return a test dataframe with PCM targets and their sequences.

Returns:

dataframe with PCM targets and their sequences

Return type:

pd.DataFrame

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

setUpPaths()

Set up the test environment.

tearDown()

Remove all files and directories that are used for testing.

validate_split(dataset)

Check if the split has the data it should have after splitting.

class qsprpred.extra.models.tests.RandomBaseModelTestCase(methodName='runTest')[source]

Bases: ModelDataSetsMixInExtras, ModelCheckMixIn, QSPRTestCase

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkOptimization(model: QSPRModel, ds: QSPRDataset, optimizer: HyperparameterOptimization)
clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • preparation_settings (dict) – dictionary containing preparation settings

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)

Create a small dataset for testing purposes.

Parameters:
  • name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.

  • target_props (list[TargetProperty] | list[dict], optional) – target properties.

  • preparation_settings (dict | None, optional) – preparation settings. Defaults to None.

  • protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.

  • random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

fitTest(model: QSPRModel, ds: QSPRDataset)

Test model fitting, optimization and evaluation.

Parameters:
classmethod getAllDescriptors() list[qsprpred.data.descriptors.sets.DescriptorSet]

Return a list of all available molecule descriptor sets.

Returns:

list of MoleculeDescriptorSet objects

Return type:

list

classmethod getAllProteinDescriptors() list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]

Return a list of all available protein descriptor sets.

Returns:

list of ProteinDescriptorSet objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Return the default descriptor calculator combo.

static getDefaultPrep()

Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)
getModel(name: str, alg: ~qsprpred.extra.models.random.ScipyDistributionAlgorithm | ~qsprpred.extra.models.random.RatioDistributionAlgorithm = <class 'qsprpred.extra.models.random.ScipyDistributionAlgorithm'>, parameters: dict | None = None, random_state: int | None = None)[source]

Initialize dataset and model.

Parameters:
  • name (str) – Name of the model.

  • alg (Type | None) – Algorithm class.

  • parameters (dict | None) – Parameters to use.

  • random_state (int | None) – Random seed to use.

Returns:

Initialized model.

Return type:

RandomModel

getPCMDF() DataFrame

Return a test dataframe with PCM data.

Returns:

dataframe with PCM data

Return type:

pd.DataFrame

getPCMSeqProvider() Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

Return a function that provides sequences for given accessions.

Returns:

function that provides sequences for given accessions

Return type:

Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() DataFrame

Return a test dataframe with PCM targets and their sequences.

Returns:

dataframe with PCM targets and their sequences

Return type:

pd.DataFrame

getParamGrid(model: QSPRModel, grid: str) dict

Get the parameter grid for a model.

Parameters:
  • model (QSPRModel) – The model to get the parameter grid for.

  • grid (str) – The grid type to get the parameter grid for.

Returns:

The parameter grid.

Return type:

dict

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

property gridFile
id()
longMessage = True
maxDiff = 640
predictorTest(model: QSPRModel, dataset: QSPRDataset, comparison_model: QSPRModel | None = None, expect_equal_result=True, **pred_kwargs)

Test model predictions.

Checks if the shape of the predictions is as expected and if the predictions of the predictMols function are consistent with the predictions of the predict/predictProba functions. Also checks if the predictions of the model are the same as the predictions of the comparison model if given.

Parameters:
  • model (QSPRModel) – The model to make predictions with.

  • dataset (QSPRDataset) – The dataset to make predictions for.

  • comparison_model (QSPRModel) – another model to compare the predictions with.

  • expect_equal_result (bool) – Whether the expected result should be equal or not equal to the predictions of the comparison model.

  • **pred_kwargs – Extra keyword arguments to pass to the predictor’s predictMols method.

run(result=None)
setUp()[source]

Hook method for setting up the test fixture before exercising it.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Set up the test environment.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

validate_split(dataset)

Check if the split has the data it should have after splitting.

class qsprpred.extra.models.tests.TestPCM(methodName='runTest')[source]

Bases: ModelDataSetsMixInExtras, ModelCheckMixIn, QSPRTestCase

Test class for testing PCM models.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkOptimization(model: QSPRModel, ds: QSPRDataset, optimizer: HyperparameterOptimization)
clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • preparation_settings (dict) – dictionary containing preparation settings

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)

Create a small dataset for testing purposes.

Parameters:
  • name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.

  • target_props (list[TargetProperty] | list[dict], optional) – target properties.

  • preparation_settings (dict | None, optional) – preparation settings. Defaults to None.

  • protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.

  • random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

fitTest(model: QSPRModel, ds: QSPRDataset)

Test model fitting, optimization and evaluation.

Parameters:
classmethod getAllDescriptors() list[qsprpred.data.descriptors.sets.DescriptorSet]

Return a list of all available molecule descriptor sets.

Returns:

list of MoleculeDescriptorSet objects

Return type:

list

classmethod getAllProteinDescriptors() list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]

Return a list of all available protein descriptor sets.

Returns:

list of ProteinDescriptorSet objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Return the default descriptor calculator combo.

static getDefaultPrep()

Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)
getModel(name: str, alg: Type | None = None, parameters: dict | None = None, random_state: int | None = None)[source]

Initialize dataset and model.

Parameters:
  • name (str) – Name of the model.

  • alg (Type | None) – Algorithm class.

  • parameters (dict | None) – Parameters to use.

  • random_state (int | None) – Random seed to use.

Returns:

Initialized model.

Return type:

SklearnPCMModel

getPCMDF() DataFrame

Return a test dataframe with PCM data.

Returns:

dataframe with PCM data

Return type:

pd.DataFrame

getPCMSeqProvider() Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

Return a function that provides sequences for given accessions.

Returns:

function that provides sequences for given accessions

Return type:

Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() DataFrame

Return a test dataframe with PCM targets and their sequences.

Returns:

dataframe with PCM targets and their sequences

Return type:

pd.DataFrame

getParamGrid(model: QSPRModel, grid: str) dict

Get the parameter grid for a model.

Parameters:
  • model (QSPRModel) – The model to get the parameter grid for.

  • grid (str) – The grid type to get the parameter grid for.

Returns:

The parameter grid.

Return type:

dict

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

property gridFile
id()
longMessage = True
maxDiff = 640
predictorTest(model: QSPRModel, dataset: QSPRDataset, comparison_model: QSPRModel | None = None, expect_equal_result=True, **pred_kwargs)

Test model predictions.

Checks if the shape of the predictions is as expected and if the predictions of the predictMols function are consistent with the predictions of the predict/predictProba functions. Also checks if the predictions of the model are the same as the predictions of the comparison model if given.

Parameters:
  • model (QSPRModel) – The model to make predictions with.

  • dataset (QSPRDataset) – The dataset to make predictions for.

  • comparison_model (QSPRModel) – another model to compare the predictions with.

  • expect_equal_result (bool) – Whether the expected result should be equal or not equal to the predictions of the comparison model.

  • **pred_kwargs – Extra keyword arguments to pass to the predictor’s predictMols method.

run(result=None)
setUp()[source]

Hook method for setting up the test fixture before exercising it.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Set up the test environment.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testFittingPCM = None
testFittingPCM_0_XGBR(**kw)

Test model training for regression models [with _=’XGBR’, props=[{‘name’: ‘pchembl_value_Median’…asks.REGRESSION: ‘REGRESSION’>}], model_name=’XGBR’, model_class=<class ‘xgboost.sklearn.XGBRegressor’>, random_state=[None]].

Parameters:
  • _ – Name of the test.

  • props (list[TargetProperty | dict]) – List of target properties.

  • model_name (str) – Name of the model.

  • model_class (Type) – Class of the model.

testFittingPCM_1_XGBR(**kw)

Test model training for regression models [with _=’XGBR’, props=[{‘name’: ‘pchembl_value_Median’…asks.REGRESSION: ‘REGRESSION’>}], model_name=’XGBR’, model_class=<class ‘xgboost.sklearn.XGBRegressor’>, random_state=[1, 42]].

Parameters:
  • _ – Name of the test.

  • props (list[TargetProperty | dict]) – List of target properties.

  • model_name (str) – Name of the model.

  • model_class (Type) – Class of the model.

testFittingPCM_2_XGBR(**kw)

Test model training for regression models [with _=’XGBR’, props=[{‘name’: ‘pchembl_value_Median’…asks.REGRESSION: ‘REGRESSION’>}], model_name=’XGBR’, model_class=<class ‘xgboost.sklearn.XGBRegressor’>, random_state=[42, 42]].

Parameters:
  • _ – Name of the test.

  • props (list[TargetProperty | dict]) – List of target properties.

  • model_name (str) – Name of the model.

  • model_class (Type) – Class of the model.

testFittingPCM_3_PLSR(**kw)

Test model training for regression models [with _=’PLSR’, props=[{‘name’: ‘pchembl_value_Median’…asks.REGRESSION: ‘REGRESSION’>}], model_name=’PLSR’, model_class=<class ‘sklearn.cross_decomposition._pls.PLSRegression’>, random_state=[None]].

Parameters:
  • _ – Name of the test.

  • props (list[TargetProperty | dict]) – List of target properties.

  • model_name (str) – Name of the model.

  • model_class (Type) – Class of the model.

testFittingPCM_4_XGBC(**kw)

Test model training for regression models [with _=’XGBC’, props=[{‘name’: ‘pchembl_value_Median’…S: ‘SINGLECLASS’>, ‘th’: [6.5]}], model_name=’XGBC’, model_class=<class ‘xgboost.sklearn.XGBClassifier’>, random_state=[None]].

Parameters:
  • _ – Name of the test.

  • props (list[TargetProperty | dict]) – List of target properties.

  • model_name (str) – Name of the model.

  • model_class (Type) – Class of the model.

testFittingPCM_5_XGBC(**kw)

Test model training for regression models [with _=’XGBC’, props=[{‘name’: ‘pchembl_value_Median’…S: ‘SINGLECLASS’>, ‘th’: [6.5]}], model_name=’XGBC’, model_class=<class ‘xgboost.sklearn.XGBClassifier’>, random_state=[21, 42]].

Parameters:
  • _ – Name of the test.

  • props (list[TargetProperty | dict]) – List of target properties.

  • model_name (str) – Name of the model.

  • model_class (Type) – Class of the model.

testFittingPCM_6_XGBC(**kw)

Test model training for regression models [with _=’XGBC’, props=[{‘name’: ‘pchembl_value_Median’…S: ‘SINGLECLASS’>, ‘th’: [6.5]}], model_name=’XGBC’, model_class=<class ‘xgboost.sklearn.XGBClassifier’>, random_state=[42, 42]].

Parameters:
  • _ – Name of the test.

  • props (list[TargetProperty | dict]) – List of target properties.

  • model_name (str) – Name of the model.

  • model_class (Type) – Class of the model.

validate_split(dataset)

Check if the split has the data it should have after splitting.

class qsprpred.extra.models.tests.TestRandomModelClassification(methodName='runTest')[source]

Bases: RandomBaseModelTestCase

Test the RandomModel class for regression models.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkOptimization(model: QSPRModel, ds: QSPRDataset, optimizer: HyperparameterOptimization)
clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • preparation_settings (dict) – dictionary containing preparation settings

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)

Create a small dataset for testing purposes.

Parameters:
  • name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.

  • target_props (list[TargetProperty] | list[dict], optional) – target properties.

  • preparation_settings (dict | None, optional) – preparation settings. Defaults to None.

  • protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.

  • random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

fitTest(model: QSPRModel, ds: QSPRDataset)

Test model fitting, optimization and evaluation.

Parameters:
classmethod getAllDescriptors() list[qsprpred.data.descriptors.sets.DescriptorSet]

Return a list of all available molecule descriptor sets.

Returns:

list of MoleculeDescriptorSet objects

Return type:

list

classmethod getAllProteinDescriptors() list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]

Return a list of all available protein descriptor sets.

Returns:

list of ProteinDescriptorSet objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Return the default descriptor calculator combo.

static getDefaultPrep()

Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)
getModel(name: str, alg: ~qsprpred.extra.models.random.ScipyDistributionAlgorithm | ~qsprpred.extra.models.random.RatioDistributionAlgorithm = <class 'qsprpred.extra.models.random.ScipyDistributionAlgorithm'>, parameters: dict | None = None, random_state: int | None = None)

Initialize dataset and model.

Parameters:
  • name (str) – Name of the model.

  • alg (Type | None) – Algorithm class.

  • parameters (dict | None) – Parameters to use.

  • random_state (int | None) – Random seed to use.

Returns:

Initialized model.

Return type:

RandomModel

getPCMDF() DataFrame

Return a test dataframe with PCM data.

Returns:

dataframe with PCM data

Return type:

pd.DataFrame

getPCMSeqProvider() Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

Return a function that provides sequences for given accessions.

Returns:

function that provides sequences for given accessions

Return type:

Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() DataFrame

Return a test dataframe with PCM targets and their sequences.

Returns:

dataframe with PCM targets and their sequences

Return type:

pd.DataFrame

getParamGrid(model: QSPRModel, grid: str) dict

Get the parameter grid for a model.

Parameters:
  • model (QSPRModel) – The model to get the parameter grid for.

  • grid (str) – The grid type to get the parameter grid for.

Returns:

The parameter grid.

Return type:

dict

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

property gridFile
id()
longMessage = True
maxDiff = 640
predictorTest(model: QSPRModel, dataset: QSPRDataset, comparison_model: QSPRModel | None = None, expect_equal_result=True, **pred_kwargs)

Test model predictions.

Checks if the shape of the predictions is as expected and if the predictions of the predictMols function are consistent with the predictions of the predict/predictProba functions. Also checks if the predictions of the model are the same as the predictions of the comparison model if given.

Parameters:
  • model (QSPRModel) – The model to make predictions with.

  • dataset (QSPRDataset) – The dataset to make predictions for.

  • comparison_model (QSPRModel) – another model to compare the predictions with.

  • expect_equal_result (bool) – Whether the expected result should be equal or not equal to the predictions of the comparison model.

  • **pred_kwargs – Extra keyword arguments to pass to the predictor’s predictMols method.

run(result=None)
setUp()

Hook method for setting up the test fixture before exercising it.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Set up the test environment.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testClassificationBasicFit = None
testClassificationBasicFit_0_RandomModel_SINGLECLASS(**kw)

Test model training for classification models [with _=’RandomModel_SINGLECLASS’, task=<TargetTasks.SINGLECLASS: ‘SINGLECLASS’>, th=[6.5], model_name=’RandomModel’, model_class=<class ‘qsprpred.extra.models.ra…dom.RatioDistributionAlgorithm’>, random_state=[None]].

testClassificationBasicFit_1_RandomModel_SINGLECLASS(**kw)

Test model training for classification models [with _=’RandomModel_SINGLECLASS’, task=<TargetTasks.SINGLECLASS: ‘SINGLECLASS’>, th=[6.5], model_name=’RandomModel’, model_class=<class ‘qsprpred.extra.models.ra…dom.RatioDistributionAlgorithm’>, random_state=[42, 42]].

testClassificationBasicFit_2_RandomModel_MULTICLASS(**kw)

Test model training for classification models [with _=’RandomModel_MULTICLASS’, task=<TargetTasks.MULTICLASS: ‘MULTICLASS’>, th=[0, 2, 10, 1100], model_name=’RandomModel’, model_class=<class ‘qsprpred.extra.models.ra…dom.RatioDistributionAlgorithm’>, random_state=[None]].

testClassificationBasicFit_3_RandomModel_MULTICLASS(**kw)

Test model training for classification models [with _=’RandomModel_MULTICLASS’, task=<TargetTasks.MULTICLASS: ‘MULTICLASS’>, th=[0, 2, 10, 1100], model_name=’RandomModel’, model_class=<class ‘qsprpred.extra.models.ra…dom.RatioDistributionAlgorithm’>, random_state=[42, 42]].

validate_split(dataset)

Check if the split has the data it should have after splitting.

class qsprpred.extra.models.tests.TestRandomModelClassificationMultiTask(methodName='runTest')[source]

Bases: RandomBaseModelTestCase

Test the SklearnModel class for multi-task classification models.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkOptimization(model: QSPRModel, ds: QSPRDataset, optimizer: HyperparameterOptimization)
clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • preparation_settings (dict) – dictionary containing preparation settings

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)

Create a small dataset for testing purposes.

Parameters:
  • name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.

  • target_props (list[TargetProperty] | list[dict], optional) – target properties.

  • preparation_settings (dict | None, optional) – preparation settings. Defaults to None.

  • protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.

  • random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

fitTest(model: QSPRModel, ds: QSPRDataset)

Test model fitting, optimization and evaluation.

Parameters:
classmethod getAllDescriptors() list[qsprpred.data.descriptors.sets.DescriptorSet]

Return a list of all available molecule descriptor sets.

Returns:

list of MoleculeDescriptorSet objects

Return type:

list

classmethod getAllProteinDescriptors() list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]

Return a list of all available protein descriptor sets.

Returns:

list of ProteinDescriptorSet objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Return the default descriptor calculator combo.

static getDefaultPrep()

Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)
getModel(name: str, alg: ~qsprpred.extra.models.random.ScipyDistributionAlgorithm | ~qsprpred.extra.models.random.RatioDistributionAlgorithm = <class 'qsprpred.extra.models.random.ScipyDistributionAlgorithm'>, parameters: dict | None = None, random_state: int | None = None)

Initialize dataset and model.

Parameters:
  • name (str) – Name of the model.

  • alg (Type | None) – Algorithm class.

  • parameters (dict | None) – Parameters to use.

  • random_state (int | None) – Random seed to use.

Returns:

Initialized model.

Return type:

RandomModel

getPCMDF() DataFrame

Return a test dataframe with PCM data.

Returns:

dataframe with PCM data

Return type:

pd.DataFrame

getPCMSeqProvider() Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

Return a function that provides sequences for given accessions.

Returns:

function that provides sequences for given accessions

Return type:

Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() DataFrame

Return a test dataframe with PCM targets and their sequences.

Returns:

dataframe with PCM targets and their sequences

Return type:

pd.DataFrame

getParamGrid(model: QSPRModel, grid: str) dict

Get the parameter grid for a model.

Parameters:
  • model (QSPRModel) – The model to get the parameter grid for.

  • grid (str) – The grid type to get the parameter grid for.

Returns:

The parameter grid.

Return type:

dict

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

property gridFile
id()
longMessage = True
maxDiff = 640
predictorTest(model: QSPRModel, dataset: QSPRDataset, comparison_model: QSPRModel | None = None, expect_equal_result=True, **pred_kwargs)

Test model predictions.

Checks if the shape of the predictions is as expected and if the predictions of the predictMols function are consistent with the predictions of the predict/predictProba functions. Also checks if the predictions of the model are the same as the predictions of the comparison model if given.

Parameters:
  • model (QSPRModel) – The model to make predictions with.

  • dataset (QSPRDataset) – The dataset to make predictions for.

  • comparison_model (QSPRModel) – another model to compare the predictions with.

  • expect_equal_result (bool) – Whether the expected result should be equal or not equal to the predictions of the comparison model.

  • **pred_kwargs – Extra keyword arguments to pass to the predictor’s predictMols method.

run(result=None)
setUp()

Hook method for setting up the test fixture before exercising it.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Set up the test environment.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testClassificationMultiTaskFit = None
testClassificationMultiTaskFit_0_RandomModel(**kw)

Test model training for multitask classification models [with _=’RandomModel’, model_name=’RandomModel’, model_class=<class ‘qsprpred.extra.models.ra…dom.RatioDistributionAlgorithm’>, random_state=[None]].

testClassificationMultiTaskFit_1_RandomModel(**kw)

Test model training for multitask classification models [with _=’RandomModel’, model_name=’RandomModel’, model_class=<class ‘qsprpred.extra.models.ra…dom.RatioDistributionAlgorithm’>, random_state=[42, 42]].

validate_split(dataset)

Check if the split has the data it should have after splitting.

class qsprpred.extra.models.tests.TestRandomModelRegression(methodName='runTest')[source]

Bases: RandomBaseModelTestCase

Test the RandomModel class for regression models.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkOptimization(model: QSPRModel, ds: QSPRDataset, optimizer: HyperparameterOptimization)
clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • preparation_settings (dict) – dictionary containing preparation settings

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)

Create a small dataset for testing purposes.

Parameters:
  • name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.

  • target_props (list[TargetProperty] | list[dict], optional) – target properties.

  • preparation_settings (dict | None, optional) – preparation settings. Defaults to None.

  • protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.

  • random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

fitTest(model: QSPRModel, ds: QSPRDataset)

Test model fitting, optimization and evaluation.

Parameters:
classmethod getAllDescriptors() list[qsprpred.data.descriptors.sets.DescriptorSet]

Return a list of all available molecule descriptor sets.

Returns:

list of MoleculeDescriptorSet objects

Return type:

list

classmethod getAllProteinDescriptors() list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]

Return a list of all available protein descriptor sets.

Returns:

list of ProteinDescriptorSet objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Return the default descriptor calculator combo.

static getDefaultPrep()

Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)
getModel(name: str, alg: ~qsprpred.extra.models.random.ScipyDistributionAlgorithm | ~qsprpred.extra.models.random.RatioDistributionAlgorithm = <class 'qsprpred.extra.models.random.ScipyDistributionAlgorithm'>, parameters: dict | None = None, random_state: int | None = None)

Initialize dataset and model.

Parameters:
  • name (str) – Name of the model.

  • alg (Type | None) – Algorithm class.

  • parameters (dict | None) – Parameters to use.

  • random_state (int | None) – Random seed to use.

Returns:

Initialized model.

Return type:

RandomModel

getPCMDF() DataFrame

Return a test dataframe with PCM data.

Returns:

dataframe with PCM data

Return type:

pd.DataFrame

getPCMSeqProvider() Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

Return a function that provides sequences for given accessions.

Returns:

function that provides sequences for given accessions

Return type:

Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() DataFrame

Return a test dataframe with PCM targets and their sequences.

Returns:

dataframe with PCM targets and their sequences

Return type:

pd.DataFrame

getParamGrid(model: QSPRModel, grid: str) dict

Get the parameter grid for a model.

Parameters:
  • model (QSPRModel) – The model to get the parameter grid for.

  • grid (str) – The grid type to get the parameter grid for.

Returns:

The parameter grid.

Return type:

dict

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

property gridFile
id()
longMessage = True
maxDiff = 640
predictorTest(model: QSPRModel, dataset: QSPRDataset, comparison_model: QSPRModel | None = None, expect_equal_result=True, **pred_kwargs)

Test model predictions.

Checks if the shape of the predictions is as expected and if the predictions of the predictMols function are consistent with the predictions of the predict/predictProba functions. Also checks if the predictions of the model are the same as the predictions of the comparison model if given.

Parameters:
  • model (QSPRModel) – The model to make predictions with.

  • dataset (QSPRDataset) – The dataset to make predictions for.

  • comparison_model (QSPRModel) – another model to compare the predictions with.

  • expect_equal_result (bool) – Whether the expected result should be equal or not equal to the predictions of the comparison model.

  • **pred_kwargs – Extra keyword arguments to pass to the predictor’s predictMols method.

run(result=None)
setUp()

Hook method for setting up the test fixture before exercising it.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Set up the test environment.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testRegressionBasicFit = None
testRegressionBasicFit_0_RandomModel(**kw)
testRegressionBasicFit_1_RandomModel(**kw)
testRegressionBasicFit_2_RandomModel(**kw)
testRegressionMultiTaskFit = None
testRegressionMultiTaskFit_0_RandomModel(**kw)

Test model training for multitask regression models [with model_name=’RandomModel’, random_state=[None]].

testRegressionMultiTaskFit_1_RandomModel(**kw)

Test model training for multitask regression models [with model_name=’RandomModel’, random_state=[1, 42]].

testRegressionMultiTaskFit_2_RandomModel(**kw)

Test model training for multitask regression models [with model_name=’RandomModel’, random_state=[42, 42]].

validate_split(dataset)

Check if the split has the data it should have after splitting.

Module contents