qsprpred.data.chem.standardizers package
Submodules
qsprpred.data.chem.standardizers.base module
- exception qsprpred.data.chem.standardizers.base.ChemStandardizationException[source]
Bases:
ExceptionException raised when standardization fails.
- add_note(note, /)
Add a note to the exception
- args
- with_traceback(tb, /)
Set self.__traceback__ to tb and return self.
- class qsprpred.data.chem.standardizers.base.ChemStandardizer[source]
Bases:
ABCStandardizer to convert SMILES to a standardized form.
This class defines an interface of a uniquely identifiable standardizer. The
getIDmethod should return a unique identifier for the standardizer based on its settings. Standardizes that have the same ID should produce the same standardized form for a given SMILES.The main method of the class is
convertSMILES, which should convert a given SMILES to a standardized form based on the settings of the standardizer.- abstract convertSMILES(smiles: str) str | None[source]
Convert the SMILES to a standardized form.
- Parameters:
smiles (str) – SMILES to be converted
- Returns:
The standardized SMILES string or
Noneif standardization fails or the molecule is deemed invalid.- Return type:
str | None
- Raises:
ChemStandardizationException – if standardization fails, but the upstream code should be notified and handle the exception.
- abstract classmethod fromSettings(settings: dict) ChemStandardizer[source]
Create a new standardizer from a settings dictionary.
- classmethod fromSettingsFile(path: str) ChemStandardizer[source]
Load the standardizer from a settings file in JSON format.
- Parameters:
path (str) – Path to the settings file.
- Returns:
The standardizer loaded from the settings file.
- Return type:
- getHashID() str[source]
Get the hash ID of the standardizer. This is simply the MD5 hash of the unique identifier of the standardizer.
- Returns:
The hash ID of the standardizer
- Return type:
- abstract getID() str[source]
Return the unique identifier of the standardizer. This method should return a unique identifier based on the settings of the standardizer.
Two standardizers with the same settings should have the same ID and produce the same standardized form for a given SMILES.
- Returns:
The unique identifier of the standardizer.
- Return type:
- class qsprpred.data.chem.standardizers.base.Standardizable[source]
Bases:
ABCInterface for objects that use chemical standardization with `
ChemStandardizerobjects.- abstract applyStandardizer(standardizer: ChemStandardizer)[source]
Apply a standardizer to the SMILES in the store.
- Parameters:
standardizer (ChemStandardizer) – The standardizer to apply
- abstract property standardizer: ChemStandardizer
Get the standardizer used by the store.
- Returns:
The standardizer used by the store.
- Return type:
qsprpred.data.chem.standardizers.check_smiles module
- class qsprpred.data.chem.standardizers.check_smiles.CheckSmilesValid(id_prop: str | None = 'ID')[source]
Bases:
MolProcessorWithIDProcessor to check the validity of the SMILES.
Initialize the processor with the name of the property that contains the molecule’s unique identifier.
- Parameters:
id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.
- iterMolsAndIDs(mols, props: dict[str, list] | None)
Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.
- class qsprpred.data.chem.standardizers.check_smiles.ValidationStandardizer[source]
Bases:
ChemStandardizerStandardizer that checks the validity of the SMILES by attempting to sanitize the molecule using RDKit.
- Variables:
checker (CheckSmilesValid) – Processor to check the validity of the SMILES
Initialize the standardizer.
- Raises:
ValueError – If the SMILES is invalid
- classmethod fromSettings(settings: dict) ValidationStandardizer[source]
Create a standardizer from settings. In this case, the settings are ignored.
- Parameters:
settings (dict) – Settings of the standardizer
- Returns:
The standardizer created from settings
- Return type:
- classmethod fromSettingsFile(path: str) ChemStandardizer
Load the standardizer from a settings file in JSON format.
- Parameters:
path (str) – Path to the settings file.
- Returns:
The standardizer loaded from the settings file.
- Return type:
- getHashID() str
Get the hash ID of the standardizer. This is simply the MD5 hash of the unique identifier of the standardizer.
- Returns:
The hash ID of the standardizer
- Return type:
- getID()[source]
Return the unique identifier of the standardizer. In this case, it is just “ValidationStandardizer”. There are no settings to consider.
- property settings
Settings of the standardizer. Empty in this case since there is nothing to set except the default settings.
qsprpred.data.chem.standardizers.chembl module
- class qsprpred.data.chem.standardizers.chembl.ChemblStandardizer(isomeric_smiles: bool = True, sanitize: bool = True)[source]
Bases:
ChemStandardizerStandardizer using the ChEMBL standardizer.
- Variables:
Initialize the ChEMBL standardizer.
- Parameters:
- classmethod fromSettings(settings: dict) ChemblStandardizer[source]
Create a standardizer from settings.
- Parameters:
settings (dict) – Settings of the standardizer
- Returns:
The standardizer created from settings
- Return type:
- classmethod fromSettingsFile(path: str) ChemStandardizer
Load the standardizer from a settings file in JSON format.
- Parameters:
path (str) – Path to the settings file.
- Returns:
The standardizer loaded from the settings file.
- Return type:
- getHashID() str
Get the hash ID of the standardizer. This is simply the MD5 hash of the unique identifier of the standardizer.
- Returns:
The hash ID of the standardizer
- Return type:
qsprpred.data.chem.standardizers.naive module
- class qsprpred.data.chem.standardizers.naive.NaiveStandardizer[source]
Bases:
ChemStandardizerNaive standardizer
Briefly, the standardization process involves disconnecting metals, normalizing, removing salts (largest fragment) and charges. See
qsprpred.data.chem.standardizers.naive.standardize_molfor more details.- classmethod fromSettings(settings: dict) NaiveStandardizer[source]
Create a naive standardizer from settings. In this case, the settings are ignored.
- Parameters:
settings (dict) – settings of the standardizer
- Returns:
a naive standardizer
- Return type:
- classmethod fromSettingsFile(path: str) ChemStandardizer
Load the standardizer from a settings file in JSON format.
- Parameters:
path (str) – Path to the settings file.
- Returns:
The standardizer loaded from the settings file.
- Return type:
- getHashID() str
Get the hash ID of the standardizer. This is simply the MD5 hash of the unique identifier of the standardizer.
- Returns:
The hash ID of the standardizer
- Return type:
- qsprpred.data.chem.standardizers.naive.standardize_mol(mol) str | None[source]
Standardizes SMILES and removes fragments
Standardizes SMILES using RDKit MolStandardize to disconnect metals, normalize, remove salts (largest fragment), and uncharge. Followed by a second round of disconnecting metals and normalizing. Finally, the SMILES is canonicalized.
- Parameters:
mol (rdkit.Chem.rdchem.Mol) – RDKit molecule object
- Returns:
Standardized SMILES or None if SMILES could not be standardized or if SMILES does not contain carbon or contains salts after standardization
- Return type:
(str | None)
qsprpred.data.chem.standardizers.papyrus module
- class qsprpred.data.chem.standardizers.papyrus.PapyrusStandardizer(keep_stereo: bool = True, canonize: bool = True, mixture_handling: Literal['keep_largest', 'filter', 'keep'] = 'keep_largest', remove_additional_salts: bool = True, remove_additional_metals: bool = True, filter_inorganic: bool = False, filter_non_small_molecule: bool = True, small_molecule_min_mw: float = 200, small_molecule_max_mw: float = 800, canonicalize_tautomer: bool = True, tautomer_max_tautomers: int = 4294967295, extra_organic_atoms: list | None = None, extra_metals: list | None = None, extra_salts: list | None = None, uncharge: bool = True)[source]
Bases:
ChemStandardizerPapyrus standardizer
Uses Papyrus (>v05.6) standardization protecol to standardize SMILES.
Béquignon, O.J.M., Bongers, B.J., Jespers, W. et al. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J Cheminform 15, 3 (2023). https://doi.org/10.1186/s13321-022-00672-x
- Variables:
settings (dict) – Settings of the standardizer
Initialize Papyrus standardizer
- Parameters:
keep_stereo (bool, optional) – Keep stereochemistry.
canonize (bool, optional) – Canonicalize SMILES.
mixture_handling (Literal["keep_largest", "filter", "keep"], optional) – How to handle mixtures. Defaults to “keep_largest”.
remove_additional_salts (bool, optional) – Removes a custom set of fragments if present in the molecule object.
remove_additional_metals (bool, optional) – Removes metal fragments if present in the molecule object. Ignored if remove_additional_salts is set to False.
filter_inorganic (bool, optional) – Filter inorganic molecules.
filter_non_small_molecule (bool, optional) – Filter non-small molecules.
small_molecule_min_mw (float, optional) – Minimum molecular weight of small molecules.
small_molecule_max_mw (float, optional) – Maximum molecular weight of small molecules.
canonicalize_tautomer (bool, optional) – Canonicalize tautomers.
tautomer_max_tautomers (int, optional) – Maximum number of tautomers to consider by the tautomer search algorithm (<2^32).
extra_organic_atoms (list, optional) – Extra organic atoms to consider in addition to the default set (Papyrus_standardizer.ORGANIC_ATOMS).
extra_metals (list, optional) – Extra metals to consider in addition to the default set (Papyrus_standardizer.METALS).
extra_salts (list, optional) – Extra salts to consider in addition to the default set (Papyrus_standardizer.SALTS).
uncharge (bool, optional) – Uncharge molecules.
- convertSMILES(smiles: str, verbose: bool = False) str | None[source]
Standardize SMILES using Papyrus standardization protocol.
- fromSettings(settings: dict) PapyrusStandardizer[source]
Create a Papyrus standardizer from settings.
- Parameters:
settings (dict) – settings of the standardizer
- Returns:
a Papyrus standardizer
- Return type:
- classmethod fromSettingsFile(path: str) ChemStandardizer
Load the standardizer from a settings file in JSON format.
- Parameters:
path (str) – Path to the settings file.
- Returns:
The standardizer loaded from the settings file.
- Return type:
- getHashID() str
Get the hash ID of the standardizer. This is simply the MD5 hash of the unique identifier of the standardizer.
- Returns:
The hash ID of the standardizer
- Return type:
Module contents
- class qsprpred.data.chem.standardizers.ChemStandardizer[source]
Bases:
ABCStandardizer to convert SMILES to a standardized form.
This class defines an interface of a uniquely identifiable standardizer. The
getIDmethod should return a unique identifier for the standardizer based on its settings. Standardizes that have the same ID should produce the same standardized form for a given SMILES.The main method of the class is
convertSMILES, which should convert a given SMILES to a standardized form based on the settings of the standardizer.- abstract convertSMILES(smiles: str) str | None[source]
Convert the SMILES to a standardized form.
- Parameters:
smiles (str) – SMILES to be converted
- Returns:
The standardized SMILES string or
Noneif standardization fails or the molecule is deemed invalid.- Return type:
str | None
- Raises:
ChemStandardizationException – if standardization fails, but the upstream code should be notified and handle the exception.
- abstract classmethod fromSettings(settings: dict) ChemStandardizer[source]
Create a new standardizer from a settings dictionary.
- classmethod fromSettingsFile(path: str) ChemStandardizer[source]
Load the standardizer from a settings file in JSON format.
- Parameters:
path (str) – Path to the settings file.
- Returns:
The standardizer loaded from the settings file.
- Return type:
- getHashID() str[source]
Get the hash ID of the standardizer. This is simply the MD5 hash of the unique identifier of the standardizer.
- Returns:
The hash ID of the standardizer
- Return type:
- abstract getID() str[source]
Return the unique identifier of the standardizer. This method should return a unique identifier based on the settings of the standardizer.
Two standardizers with the same settings should have the same ID and produce the same standardized form for a given SMILES.
- Returns:
The unique identifier of the standardizer.
- Return type: