qsprpred.extra.gpu.models package
Submodules
qsprpred.extra.gpu.models.base module
- class qsprpred.extra.gpu.models.base.QSPRModelGPU(base_dir: str, alg: Type | None = None, name: str | None = None, parameters: dict | None = None, autoload=True, random_state: int | None = None)[source]
-
Initialize a QSPR model instance.
If the model is loaded from file, the data set is not required. Note that the data set is required for fitting and optimization.
- Parameters:
base_dir (str) – base directory of the model, the model files are stored in a subdirectory
{baseDir}/{outDir}/alg (Type) – estimator class
name (str) – name of the model
parameters (dict) – dictionary of algorithm specific parameters
autoload (bool) – if
True, the estimator is loaded from the serialized file if it exists, otherwise a new instance of alg is createdrandom_state (int) – Random state to use for shuffling and other random operations.
- property applicabilityDomain: Any
Return the applicability domain of the model.
- Returns:
applicability domain of the model
- Return type:
Any
- checkData(ds: QSPRDataSet, exception: bool = True) bool
Check if the model has a data set.
- Parameters:
ds (QSPRDataSet) – data set to check
exception (bool) – if true, an exception is raised if no data is set
- Returns:
True if data is set, False otherwise (if exception is False)
- Return type:
- property classPath: str
Return the fully classified path of the model.
- Returns:
class path of the model
- Return type:
- cleanFiles()
Clean up the model files.
Removes the model directory and all its contents.
- convertToNumpy(X: DataFrame | ndarray, y: DataFrame | ndarray | None = None) tuple[ndarray, ndarray] | ndarray
Convert the given data matrix and target matrix to np.ndarray format.
- Parameters:
X (pd.DataFrame, np.ndarray) – data matrix if a
QSPRDataSetinstance is given, the features and targets are extracted from the data set and returnedy (pd.DataFrame, np.ndarray) – target matrix
- Returns:
data matrix and/or target matrix in np.ndarray format
- createPredictionDatasetFromMols(mols: Iterable[str | Mol], n_jobs: int = 1) tuple[QSPRTable, ndarray]
Create a
QSPRTableinstance from a list of SMILES strings.
- abstract fit(X: DataFrame | ndarray, y: DataFrame | ndarray, estimator: Any = None, mode: EarlyStoppingMode = EarlyStoppingMode.NOT_RECORDING, monitor: FitMonitor = None, **kwargs) Any | tuple[Any, int] | None
Fit the model to the given data matrix or
QSPRDataSet.- Note. convertToNumpy can be called here, to convert the input data to
np.ndarray format.
Note. if no estimator is given, the estimator instance of the model is used.
- Note. if a model supports early stopping, the fit function should have the
early_stoppingdecorator and the mode argument should be used to set the early stopping mode. If the model does not support early stopping, the mode argument is ignored.
- Parameters:
X (pd.DataFrame, np.ndarray) – data matrix to fit
y (pd.DataFrame, np.ndarray) – target matrix to fit
estimator (Any) – estimator instance to use for fitting
mode (EarlyStoppingMode) – early stopping mode
monitor (FitMonitor) – monitor for the fitting process, if None, the base monitor is used
kwargs – additional arguments to pass to the fit method of the estimator
- Returns:
fitted estimator instance int: in case of early stopping, the number of iterations
after which the model stopped training
- Return type:
Any
- fitDataset(ds: QSPRDataSet, pipeline: DatasetPipeline | None = None, monitor=None, mode=EarlyStoppingMode.OPTIMAL, save_model=True, save_data=False, **kwargs) str
Train model on the whole attached data set.
** IMPORTANT ** For models that supportEarlyStopping,
Assessorshould be run first, so that the average number of epochs from the cross-validation with early stopping can be used for fitting the model.- Parameters:
ds (QSPRDataSet) – data set to fit this model on
pipeline (DatasetPipeline) – pipeline to use for fitting
monitor (FitMonitor) – monitor for the fitting process, if None, the base monitor is used
mode (EarlyStoppingMode) – early stopping mode for models that support early stopping, by default fit the ‘optimal’ number of epochs previously stopped at in model assessment on train or test set, to avoid the use of extra data for a validation set.
save_model (bool) – save the model to file
save_data (bool) – save the supplied dataset to file
kwargs – additional arguments to pass to fit
- Returns:
path to the saved model, if
save_modelis True- Return type:
- getParameters(new_parameters: dict | None = None) dict | None
Get the model parameters combined with the given parameters.
If both the model and the given parameters contain the same key, the value from the given parameters is used.
- static handleInvalidsInPredictions(num_mols: int, predictions: ndarray | list[ndarray], failed_mask: ndarray) ndarray
Replace invalid predictions with None.
- Parameters:
num_mols (int) – molecules for which the predictions were made
predictions (np.ndarray) – predictions made by the model
failed_mask (np.ndarray) – boolean mask of failed predictions
- Returns:
predictions with invalids replaced by None
- Return type:
np.ndarray
- initFromData(data: QSPRDataSet | None, pipeline: DatasetPipeline | None)
Initialize the model from a data set and pipeline.
- Parameters:
data (QSPRDataSet) – data set to initialize the model with
pipeline (DatasetPipeline) – pipeline to use for feature calculation
- initRandomState(random_state)
Set random state if applicable. Defaults to random state of dataset if no random state is provided,
- Parameters:
random_state (int) – Random state to use for shuffling and other random operations.
- property isMultiTask: bool
Return if model is a multitask model, taken from the data set or deserialized from file if the model is loaded without data.
- Returns:
True if model is a multitask model
- Return type:
- abstract loadEstimator(params: dict | None = None) object
Initialize estimator instance with the given parameters.
If
paramsisNone, the default parameters will be used.
- abstract loadEstimatorFromFile(params: dict | None = None) object
Load estimator instance from file and apply the given parameters.
- classmethod loadParamsGrid(fname: str, optim_type: str, model_types: str) ndarray
Load parameter grids for bayes or grid search parameter optimization from json file.
- Parameters:
- Returns:
array with three columns containing modeltype, optimization type (grid or bayes) and model type
- Return type:
np.ndarray
- property optimalEpochs: int | None
Return the optimal number of epochs for early stopping.
- Returns:
optimal number of epochs
- Return type:
int | None
- property outDir: str
Return output directory of the model, the model files are stored in this directory (
{baseDir}/{name}).- Returns:
output directory of the model
- Return type:
- property outPrefix: str
Return output prefix of the model files.
The model files are stored with this prefix (i.e.
{outPrefix}_meta.json).- Returns:
output prefix of the model files
- Return type:
- abstract predict(X: DataFrame | ndarray | QSPRDataSet, estimator: Any = None) ndarray
Make predictions for the given data matrix or
QSPRDataSet.Note. convertToNumpy can be called here, to convert the input data to np.ndarray format.
- Note. if no estimator is given, the estimator instance of the model
is used.
- Parameters:
X (pd.DataFrame, np.ndarray, QSPRDataSet) – data matrix to predict
estimator (Any) – estimator instance to use for fitting
- Returns:
2D array containing the predictions, where each row corresponds to a sample in the data and each column to a target property
- Return type:
np.ndarray
- predictDataset(dataset: QSPRDataSet, use_probas: bool = False) ndarray | list[ndarray]
Make predictions for the given dataset.
- Parameters:
dataset – a
QSPRDataSetinstanceuse_probas – use probabilities if this is a classification model
- Returns:
an array of predictions or a list of arrays of predictions (for classification models with use_probas=True)
- Return type:
np.ndarray | list[np.ndarray]
- predictMols(mols: Iterable[str | Mol], use_probas: bool = False, n_jobs: int = 1, use_applicability_domain: bool = False) ndarray | list[ndarray]
Make predictions for the given molecules.
- Parameters:
- Returns:
- an array of predictions or a list of arrays of predictions
(for classification models with use_probas=True)
- np.ndarray[bool]: boolean mask indicating which molecules fall
within the applicability domain of the model
- Return type:
np.ndarray | list[np.ndarray]
- abstract predictProba(X: DataFrame | ndarray | QSPRDataSet, estimator: Any = None) list[ndarray]
Make predictions for the given data matrix or
QSPRDataSet, but use probabilities for classification models. Does not work with regression models.Note. convertToNumpy can be called here, to convert the input data to np.ndarray format.
- Note. if no estimator is given, the estimator instance of the model
is used.
- Parameters:
X (pd.DataFrame, np.ndarray, QSPRDataSet) – data matrix to make predict
estimator (Any) – estimator instance to use for fitting
- Returns:
a list of 2D arrays containing the probabilities for each class, where each array corresponds to a target property, each row to a sample in the data and each column to a class
- Return type:
list[np.ndarray]
- save(save_estimator=False)
Save model to file.
- Parameters:
save_estimator (bool) – Explicitly save the estimator to file, if
True. Note that some models may save the estimator by default even if this argument isFalse.- Returns:
absolute path to the metafile of the saved model str:
absolute path to the saved estimator, if
include_estimatorisTrue- Return type:
- abstract saveEstimator() str
Save the underlying estimator to file.
- Returns:
absolute path to the saved estimator
- Return type:
path (str)
- setParams(params: dict | None, reset_estimator: bool = True)
Set model parameters. The estimator is also updated with the new parameters if ‘reload_estimator’ is
True.
- abstract property supportsEarlyStopping: bool
Return if the model supports early stopping.
- Returns:
True if the model supports early stopping
- Return type:
- property task: ModelTasks
Return the task of the model, taken from the data set or deserialized from file if the model is loaded without data.
- Returns:
task of the model
- Return type: