drugex.training.scorers package

Submodules

drugex.training.scorers.interfaces module

class drugex.training.scorers.interfaces.ScoreModifier[source]

Bases: ABC

Defines a function to modify a score value.

class drugex.training.scorers.interfaces.Scorer(modifier=None)[source]

Bases: ABC

Used by the Environment to calculate customized scores.

abstract getKey()[source]

getModifiedScores(scores)[source]

Modify the scores with the given ScoreModifier.

Parameters:: scores (np.array) – The scores to modify.
Returns:: The modified scores.
Return type:: np.array

getModifier()[source]

abstract getScores(mols, frags=None)[source]

Returns the raw scores for the input molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to be scored.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

setModifier(modifier)[source]

drugex.training.scorers.modifiers module

class drugex.training.scorers.modifiers.AbsoluteScore(target_value: float)[source]

Bases: ScoreModifier

Score modifier that has a maximum at a given target value, and decreases linearly with increasing distance from the target value.

class drugex.training.scorers.modifiers.Chained(modifiers: List[ScoreModifier])[source]

Bases: ScoreModifier

Calls several modifiers one after the other, for instance:: score = modifier3(modifier2(modifier1(raw_score)))

class drugex.training.scorers.modifiers.ClippedScore(upper_x: float, lower_x=0.0, high_score=1.0, low_score=0.0)[source]

Bases: ScoreModifier

Clips a score between specified low and high scores, and does a linear interpolation in between. The function looks like this:

upper_x < lower_x lower_x < upper_x

__________ ____________

/

/: __________ _________/

This class works as follows: First the input is mapped onto a linear interpolation between both specified points. Then the generated values are clipped between low and high scores.

class drugex.training.scorers.modifiers.Gaussian(mu: float, sigma: float)[source]

Bases: ScoreModifier

Score modifier that reproduces a Gaussian bell shape.

class drugex.training.scorers.modifiers.Linear(slope=1.0)[source]

Bases: ScoreModifier

Score modifier that multiplies the score by a scalar (default: 1, i.e. do nothing).

class drugex.training.scorers.modifiers.MinMaxGaussian(mu: float, sigma: float, minimize=False)[source]

Bases: ScoreModifier

Score modifier that reproduces a half Gaussian bell shape. For minimize==True, the function is 1.0 for x <= mu and decreases to zero for x > mu. For minimize==False, the function is 1.0 for x >= mu and decreases to zero for x < mu.

class drugex.training.scorers.modifiers.SmoothClippedScore(upper_x: float, lower_x=0.0, high_score=1.0, low_score=0.0)[source]

Bases: ScoreModifier

Smooth variant of ClippedScore. Implemented as a logistic function that has the same steepness as ClippedScore in the center of the logistic function.

class drugex.training.scorers.modifiers.SmoothHump(lower_x: float, upper_x: float, sigma: float)[source]

Bases: ScoreModifier

Score modifier that reproduces a smooth bump function. The function is 1.0 for x between (lower_x, upper_x) and decreases to zero with a half Gaussian for x < lower_x and x > upper_x.

class drugex.training.scorers.modifiers.Squared(target_value: float, coefficient=1.0)[source]

Bases: ScoreModifier

Score modifier that has a maximum at a given target value, and decreases quadratically with increasing distance from the target value.

class drugex.training.scorers.modifiers.ThresholdedLinear(threshold: float)[source]

Bases: ScoreModifier

Returns a value of min(input, threshold)/threshold.

drugex.training.scorers.properties module

properties

Created by: Martin Sicho On: 06.06.22, 20:17

class drugex.training.scorers.properties.AtomCounter(element: str, modifier=None)[source]

Bases: Scorer

getKey()[source]

getScores(mols, frags=None)[source]

Count the number of atoms of a given type in the molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to score.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

class drugex.training.scorers.properties.Isomer(formula: str, mean_func='geometric', modifier=None)[source]

Bases: Scorer

Scoring function for closeness to a molecular formula. The score penalizes deviations from the required number of atoms for each element type, and for the total number of atoms. F.i., if the target formula is C2H4, the scoring function is the average of three contributions: - number of C atoms with a Gaussian modifier with mu=2, sigma=1 - number of H atoms with a Gaussian modifier with mu=4, sigma=1 - total number of atoms with a Gaussian modifier with mu=6, sigma=2

getKey()[source]

getScores(mols: list, frags=None) → array[source]

Get the scores for the molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to score.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

static parse_molecular_formula(formula: str)[source]

Parse a molecular formulat to get the element types and counts.

Parameters:: formula (str) – The molecular formula to parse.
Returns:: results – A list of tuples containing element types and number of occurrences.
Return type:: list of tuples

scoring_functions(formula: str)[source]

Create the scoring functions for the molecular formula.

Parameters:

formula (str) – The molecular formula to score against.

Returns:

objs (list of Scorer objects) – The scoring functions for each element type.
mods (list of ScoreModifier objects) – The modifiers for each scoring function.

class drugex.training.scorers.properties.LigandEfficiency(qsar_scorer, modifier=None)[source]

Bases: Scorer

Calculates the ligand efficiency of a molecule: LE = 1.4 * pChEMBL / nAtoms

getKey()[source]

getScores(mols: List[str], frags=None)[source]

Returns the raw scores for the input molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to be scored.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

class drugex.training.scorers.properties.LipophilicEfficiency(qsar_scorer, modifier=None)[source]

Bases: Scorer

Calculates the lipophilic efficiency of a molecule: LiPE = pChEMBL value - logP

getKey()[source]

getScores(mols: List[str], frags=None)[source]

Returns the raw scores for the input molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to be scored.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

class drugex.training.scorers.properties.Property(prop='MW', modifier=None)[source]

Bases: Scorer

getKey()[source]

getScores(mols, frags=None)[source]

Returns the raw scores for the input molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to be scored.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

class drugex.training.scorers.properties.Scaffold(smart, is_match, modifier=None)[source]

Bases: Scorer

getKey()[source]

getScores(mols, frags=None)[source]

Get the scores for the molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to score.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

class drugex.training.scorers.properties.Uniqueness(modifier=None)[source]

Bases: Scorer

Calculates the ratio of occurence of a molecule in a set of molecules

getKey()[source]

getScores(mols: List[str], frags=None)[source]

Returns the raw scores for the input molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to be scored.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

drugex.training.scorers.qsprpred module

qsprpred

Created by: Martin Sicho On: 17.02.23, 13:44

class drugex.training.scorers.qsprpred.QSPRPredScorer(model, invalids_score=0.0, modifier=None, **kwargs)[source]

Bases: Scorer

getKey()[source]

getScores(mols, frags=None)[source]

Returns the raw scores for the input molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to be scored.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

drugex.training.scorers.ra_scorer module

drugex.training.scorers.sascorer module

drugex.training.scorers.sascorer.calculateScore(m)[source]

drugex.training.scorers.sascorer.numBridgeheadsAndSpiro(mol, ri=None)[source]

drugex.training.scorers.sascorer.readFragmentScores(name='fpscores')[source]

drugex.training.scorers.similarity module

similarity

Created by: Sohvi Luukkonen On: 07.10.22, 15:05

class drugex.training.scorers.similarity.FraggleSimilarity(smiles: str, trevsky_th: float = 0.8, modifier=None)[source]

Bases: Scorer

Scoring function for similarity to a reference molecule. Fraggle similarity from python source for an implementation of the fraggle similarity algorithm developed at GSK and described in this RDKit UGMpresentation: https://github.com/rdkit/UGM_2013/blob/master/Presentations/Hussain.Fraggle.pdf

getKey()[source]

getScores(mols, frags=None)[source]

Calculate the Fraggle similarity scores for a list of molecules.

Parameters:

mols (list of rdkit molecules) – List of molecules to be scored.
frags (list of rdkit molecules, optional) – List of fragments used to generate molecules. Not used in this scorer, by default None

Returns:

scores – Array of scores.

Return type:

np.array

class drugex.training.scorers.similarity.GraphEditInverseDistance(smiles, modifier=None)[source]

Bases: Scorer

Scoring function for similarity to a reference molecule. Inverse of Graph Edit distance between two molecular graphs.

WARNING: Extremly slow! TODO : See, if possible to speed up

getKey()[source]

getScores(mols, frags=None)[source]

Returns the raw scores for the input molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to be scored.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

get_graph(mol)[source]

class drugex.training.scorers.similarity.TverskyFingerprintSimilarity(smiles: str, fp_type: str, alpha: float = 1.0, beta: float = 1.0, modifier=None)[source]

Bases: Scorer

Scoring function for similarity to a reference molecule. Tversky similarity between fingerprints. If both alpha and beta are set to 1, reduces to Tanimoto similarity.

getKey()[source]

getScores(mols, frags=None)[source]

Get Tversky similarity scores for a list of molecules.

Parameters:

mols (List[str]) – A list of SMILES strings representing molecules.
frags (List[str], optional) – A list of fragments used to generate the molecules. This is not used by this scorer.

class drugex.training.scorers.similarity.TverskyGraphSimilarity(smiles: str, alpha: float = 1.0, beta: str = 1.0, modifier=None)[source]

Bases: Scorer

Scoring function for similarity to a reference molecule. Tversky similarity between graphs. If both alpha and beta are set to 1, reduces to Tanimoto similarity.

getKey()[source]

getScores(mols, frags=None)[source]

Calculate the Tversky graph similarity scores for a list of molecules.

Parameters:

mols (list of rdkit molecules) – The molecules to be scored.
frags (list of rdkit molecules, optional) – The fragments used to generate the molecules, by default None.

Returns:

scores – The scores for the molecules.

Return type:

np.array

drugex.training.scorers.smiles module

scorers

Created by: Martin Sicho On: 03.06.22, 13:28

class drugex.training.scorers.smiles.SmilesChecker[source]

Bases: object

static checkSmiles(smiles, frags=None, no_multifrag_smiles=True)[source]

This method is used to check the validity of the SMILES strings and to check if they contain given fragments.

Parameters:

smiles (list of str) – List of SMILES strings to check.
frags (list of str, optional) – List of SMILES strings of fragments to check for.
no_multifrag_smiles (bool, optional) – If True, SMILES strings that contain more than one fragment will be marked as invalid.

Returns:

scores – Dataframe with the validity and accuracy of the SMILES strings.

Return type:

pd.DataFrame

Module contents

__init__.py

Created by: Martin Sicho On: 06.06.22, 20:00