drugex.training package
Subpackages
- drugex.training.explorers package
- drugex.training.generators package
- Submodules
- drugex.training.generators.graph_transformer module
- drugex.training.generators.interfaces module
- drugex.training.generators.sequence_rnn module
- drugex.training.generators.sequence_transformer module
- drugex.training.generators.utils module
- Module contents
- drugex.training.scorers package
- Submodules
- drugex.training.scorers.interfaces module
- drugex.training.scorers.modifiers module
- drugex.training.scorers.properties module
- drugex.training.scorers.qsprpred module
- drugex.training.scorers.ra_scorer module
- drugex.training.scorers.sascorer module
- drugex.training.scorers.similarity module
- drugex.training.scorers.smiles module
- Module contents
Submodules
drugex.training.environment module
environment
Created by: Martin Sicho On: 06.06.22, 16:51
- class drugex.training.environment.DrugExEnvironment(scorers, thresholds=None, reward_scheme=None)[source]
Bases:
Environment
Original implementation of the environment scoring strategy for DrugEx v3.
- getScores(smiles, frags=None, no_multifrag_smiles=True)[source]
This method is used to get the scores from the scorers and to check molecule validity and desireability.
- Parameters:
- Returns:
preds – Dataframe with the scores from the scorers and the validity and desireability of the molecules.
- Return type:
pd.DataFrame
drugex.training.interfaces module
interfaces
Created by: Martin Sicho On: 01.06.22, 11:29
- class drugex.training.interfaces.Environment(scorers, thresholds=None, reward_scheme=None)[source]
Bases:
ModelEvaluator
Definition of the generic environment class for DrugEx. Reference implementation is
DrugExEnvironment
.- getRewards(smiles, frags=None)[source]
Calculate the single value as the reward for each molecule used for reinforcement learning.
- getScorerKeys()[source]
Get the keys of the scorers.
- Returns:
List of keys of the scorers.
- Return type:
- class drugex.training.interfaces.Model(device=device(type='cuda'), use_gpus=(0,))[source]
Bases:
Module
,ModelProvider
,ABC
Generic base class for all PyTorch models in DrugEx. Manages the GPU or CPU gpus available to the model.
- abstract attachToGPUs(gpus)[source]
Use this method to handle a request to change the used GPUs. This method is automatically called when the class is instantiated, but may need to be called again in subclasses to move all data to the required devices.
Subclasses should also make sure to set “self.device” to the currently used device and “self.gpus” to GPU ids of the currently used GPUs
- Parameters:
gpus (tuple) – Tuple of GPU ids to use.
- abstract fit(train_loader, valid_loader, epochs=1000, monitor=None, **kwargs)[source]
Train and validate the model with a given training and validation loader (see
DataSet
and its implementations docs to learn how to generate them).- Parameters:
train_loader (torch.utils.data.DataLoader) – The training data loader.
valid_loader (torch.utils.data.DataLoader) – The validation data loader.
epochs (int, optional) – The number of epochs to train the model for.
monitor (TrainingMonitor, optional) – A
TrainingMonitor
instance to monitor the training process.**kwargs – Additional keyword arguments to pass to the training loop.
- class drugex.training.interfaces.ModelEvaluator[source]
Bases:
ABC
A simple function to score a model based on the generated molecules and input fragments if applicable.
- class drugex.training.interfaces.ModelProvider[source]
Bases:
ABC
Any instance that contains a DrugEx
Model
or its serialized form (i.e a state dictionary).
- class drugex.training.interfaces.RewardScheme[source]
Bases:
ABC
Reward scheme that enables ranking of molecules based on the calculated objectives and other criteria.
- class drugex.training.interfaces.TrainingMonitor[source]
Bases:
ModelProvider
,ABC
Interface used to monitor model training.
- abstract saveModel(model)[source]
Save the state dictionary of the
Model
instance currently being trained or serialize the model any other way.- Parameters:
model (Model) – The model to save.
- abstract savePerformanceInfo(performance_dict, df_smiles=None)[source]
Save the performance data for the current epoch.
- Parameters:
performance_dict (dict) – A dictionary with the performance data.
df_smiles (pd.DataFrame) – A DataFrame with the SMILES of the molecules generated in the current epoch.
- abstract saveProgress(current_step=None, current_epoch=None, total_steps=None, total_epochs=None, *args, **kwargs)[source]
Notifies the monitor of the current progress of the training.
- Parameters:
current_step (int, optional) – The current training step (i.e. batch).
current_epoch (int, optional) – The current epoch.
total_steps (int, optional) – The total number of training steps.
total_epochs (int, optional) – The total number of epochs.
*args – Additional arguments depending on the model type.
**kwargs – Additional keyword arguments depending on the model type.
drugex.training.monitors module
monitors
Created by: Martin Sicho On: 02.06.22, 13:59
- class drugex.training.monitors.FileMonitor(path, save_smiles=False, reset_directory=False)[source]
Bases:
TrainingMonitor
A simple
TrainingMonitor
implementation with file outputs.- savePerformanceInfo(performance_dict, df_smiles=None)[source]
Save the performance data for the current epoch.
- Parameters:
performance_dict (dict) – A dictionary with the performance data.
df_smiles (pd.DataFrame) – A DataFrame with the SMILES of the molecules generated in the current epoch.
- class drugex.training.monitors.NullMonitor[source]
Bases:
TrainingMonitor
- saveModel(model)[source]
Save the state dictionary of the
Model
instance currently being trained or serialize the model any other way.- Parameters:
model (Model) – The model to save.
- savePerformanceInfo(performance_dict, df_smiles=None)[source]
Save the performance data for the current epoch.
- Parameters:
performance_dict (dict) – A dictionary with the performance data.
df_smiles (pd.DataFrame) – A DataFrame with the SMILES of the molecules generated in the current epoch.
- saveProgress(current_step=None, current_epoch=None, total_steps=None, total_epochs=None, *args, **kwargs)[source]
Notifies the monitor of the current progress of the training.
- Parameters:
current_step (int, optional) – The current training step (i.e. batch).
current_epoch (int, optional) – The current epoch.
total_steps (int, optional) – The total number of training steps.
total_epochs (int, optional) – The total number of epochs.
*args – Additional arguments depending on the model type.
**kwargs – Additional keyword arguments depending on the model type.
drugex.training.rewards module
rewards
Created by: Martin Sicho On: 26.06.22, 18:07
- class drugex.training.rewards.ParetoCrowdingDistance[source]
Bases:
ParetoRewardScheme
Reward scheme that uses the NSGA-II crowding distance ranking strategy to rank the solutions in the same Pareto frontier.
Paper: Deb, Kalyanmoy, et al. “A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation 6.2 (2002): 182-197.”
- getMoleculeRank(fronts, smiles=None, scores=None)[source]
Crowding distance algorithm to rank the solutions in the same pareto frontier.
- Parameters:
fronts (list) –
list
of Pareto fronts. Each front is alist
of indices of the molecules in the Pareto front.smiles (list) – List of SMILES sequence to be ranked (not used in the calculation -> just a requirement of the interface because some ranking strategies need it)”
scores (np.ndarray) – matrix of scores for the multiple objectives
- Returns:
rank – Indices of the SMILES sequences ranked with the NSGA-II crowding distance method from worst to best
- Return type:
np.array
- class drugex.training.rewards.ParetoRewardScheme[source]
Bases:
RewardScheme
,ABC
- class drugex.training.rewards.ParetoTanimotoDistance(distance_metric: str = 'min')[source]
Bases:
ParetoRewardScheme
Reward scheme that uses the Tanimoto distance ranking strategy to rank the solutions in the same Pareto frontier.
- getFPs(smiles)[source]
Calculate fingerprints for a list of molecules.
- Parameters:
smiles – smiles to calculate fingerprints for
- Returns:
list of RDKit fingerprints
- class drugex.training.rewards.WeightedSum[source]
Bases:
RewardScheme
Reward scheme that uses the weighted sum ranking strategy to rank the solutions.
drugex.training.tests module
tests
Created by: Martin Sicho On: 31.05.22, 10:20
- class drugex.training.tests.MockScorer(modifier=None)[source]
Bases:
Scorer
- class drugex.training.tests.TestModelMonitor(submonitors=None)[source]
Bases:
TrainingMonitor
- saveModel(model)[source]
Save the state dictionary of the
Model
instance currently being trained or serialize the model any other way.- Parameters:
model (Model) – The model to save.
- savePerformanceInfo(performance_dict, df_smiles=None)[source]
Save the performance data for the current epoch.
- Parameters:
performance_dict (dict) – A dictionary with the performance data.
df_smiles (pd.DataFrame) – A DataFrame with the SMILES of the molecules generated in the current epoch.
- saveProgress(current_step=None, current_epoch=None, total_steps=None, total_epochs=None, *args, **kwargs)[source]
Notifies the monitor of the current progress of the training.
- Parameters:
current_step (int, optional) – The current training step (i.e. batch).
current_epoch (int, optional) – The current epoch.
total_steps (int, optional) – The total number of training steps.
total_epochs (int, optional) – The total number of epochs.
*args – Additional arguments depending on the model type.
**kwargs – Additional keyword arguments depending on the model type.
- class drugex.training.tests.TrainingTestCase(methodName='runTest')[source]
Bases:
TestCase
- BATCH_SIZE = 8
- MAX_SMILES = 16
- N_EPOCHS = 2
- N_PROC = 2
- SEED = 42
- finetuning_file = '/home/sichom/projects/DrugEx/drugex/training/test_data/A2AR_raw_small.txt'
- static getRandomFile()[source]
Generate a random temporary file and return its path.
- Returns:
The path to the temporary file
- Return type:
- getTestEnvironment(scheme=None)[source]
Get the testing environment
- Parameters:
scheme (RewardScheme) – The reward scheme to use. If None, the default ParetoTanimotoDistance is used.
- Return type:
- pretraining_file = '/home/sichom/projects/DrugEx/drugex/training/test_data/ZINC_raw_small.txt'
- scorers = [<drugex.training.scorers.properties.Property object>, <drugex.training.tests.MockScorer object>]
- setUpSmilesFragData()[source]
Create inputs for the fragment-based SMILES models.
- Returns:
The tuple of (pretraining training dataloader, pretraining test dataloader, finetuning training dataloader, finetuning test dataloader, vocabulary)
- Return type:
- test_data_dir = '/home/sichom/projects/DrugEx/drugex/training/test_data'
- test_graph_transformer_scaffold()[source]
Test RL with fragment-based graph transformer model with scaffold input.
- test_sequence_transformer_scaffold()[source]
Test RL with fragment-based sequence transformer model with scaffold input.
- thresholds = [0.5, 0.99]
Module contents
__init__.py
Created by: Martin Sicho On: 31.05.22, 10:20