qsprpred.extra.data.descriptors package

Submodules

qsprpred.extra.data.descriptors.fingerprints module

Extra fingerprints from various packages:

CDKFP: CDK fingerprint
CDKExtendedFP: CDK extended fingerprint
CDKEStateFP: CDK EState fingerprint
CDKGraphOnlyFP: CDK fingerprint ignoring bond orders
CDKMACCSFP: CDK MACCS fingerprint
CDKPubchemFP: CDK PubChem fingerprint
CDKSubstructureFP: CDK Substructure fingerprint
CDKAtomPairs2DFP: CDK hashed atom pair fingerprint
CDKKlekotaRothFP: CDK hashed Klekota-Roth fingerprint

class qsprpred.extra.data.descriptors.fingerprints.CDKAtomPairs2DFP(use_counts: bool = False)[source]

Bases: Fingerprint

CDK atom pairs and topological fingerprint.

Variables:: useCounts (bool) – whether to use counts instead of presence/absence

Initialise the fingerprint.

Parameters:: use_counts – whether to use counts instead of presence/absence

property descriptors: list[str]: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Return the CDK atom pairs and topological fingerprint for the input molecules.

Parameters:: mols (list[Chem.Mol]) – molecules to obtain the fingerprint of
Returns:: np.ndarray of fingerprints for mols
Return type:: np.ndarray

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

property usedBits: list[int] | None

class qsprpred.extra.data.descriptors.fingerprints.CDKEStateFP[source]

Bases: Fingerprint

CDK EState fingerprint.

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

property descriptors: list[str]: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Return the CDK estate fingerprint for the input molecules.

Parameters:: mols (list[Chem.Mol]) – molecules to obtain the fingerprint of
Returns:: np.ndarray of fingerprints for mols
Return type:: np.ndarray

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

property usedBits: list[int] | None

class qsprpred.extra.data.descriptors.fingerprints.CDKExtendedFP[source]

Bases: Fingerprint

CDK extended fingerprint with 25 additional ring features and isotopic masses.

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

property descriptors: list[str]: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Return the CDK extended fingerprint for the input molecules.

Parameters:: mols (list[Chem.Mol]) – molecules to obtain the fingerprint of
Returns:: np.ndarray of fingerprints for mols
Return type:: np.ndarray

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

property usedBits: list[int] | None

class qsprpred.extra.data.descriptors.fingerprints.CDKFP(size: int = 1024, search_depth: int = 7)[source]

Bases: Fingerprint

The CDK fingerprint.

Variables:

size (int) – size of the fingerprint
searchDepth (int) – search depth of the fingerprint

Initialize the CDK fingerprint.

Parameters:

size (int) – size of the fingerprint
search_depth (int) – search depth of the fingerprint

property descriptors: list[str]: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Return the CDK fingerprint for the input molecules.

Parameters:: mols (list[Chem.Mol]) – molecules to obtain the fingerprint of
Returns:: np.ndarray of fingerprints for mols
Return type:: np.ndarray

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property settings

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

property usedBits: list[int] | None

class qsprpred.extra.data.descriptors.fingerprints.CDKGraphOnlyFP(size: int = 1024, search_depth: int = 7)[source]

Bases: Fingerprint

CDK fingerprint ignoring bond orders.

Variables:

size (int) – Number of bits in the CDK fingerprints (ignored for others)
searchDepth (int) – Search depth for the CDK fingerprints (ignored for others)

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

property descriptors: list[str]: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Return the CDK graph only fingerprint for the input molecules.

Parameters:: mols (list[Chem.Mol]) – molecules to obtain the fingerprint of
Returns:: np.ndarray of fingerprints for mols
Return type:: np.ndarray

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

property usedBits: list[int] | None

class qsprpred.extra.data.descriptors.fingerprints.CDKKlekotaRothFP(use_counts: bool = False)[source]

Bases: Fingerprint

CDK Klekota & Roth fingerprint.

Initialise the fingerprint.

Parameters:: use_counts (bool) – whether to use counts instead of presence/absence

property descriptors: list[str]: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Return the CDK Klekota & Roth fingerprint for the input molecules.

Parameters:: mols (list[Chem.Mol]) – molecules to obtain the fingerprint of
Returns:: np.ndarray of fingerprints for mols
Return type:: np.ndarray

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

property usedBits: list[int] | None

class qsprpred.extra.data.descriptors.fingerprints.CDKMACCSFP[source]

Bases: Fingerprint

CDK MACCS fingerprint.

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

property descriptors: list[str]: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Return the CDK MACCS fingerprint for the input molecules.

Parameters:: mols (list[Chem.Mol]) – molecules to obtain the fingerprint of
Returns:: np.ndarray of fingerprints for mols
Return type:: np.ndarray

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

property usedBits: list[int] | None

class qsprpred.extra.data.descriptors.fingerprints.CDKPubchemFP[source]

Bases: Fingerprint

CDK PubChem fingerprint.

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

property descriptors: list[str]: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Return the CDK PubChem fingerprint for the input molecules.

Parameters:: mols – molecules to obtain the fingerprint of
Returns:: list of fingerprints for “mols”
Return type:: fingerprint (list)

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

property usedBits: list[int] | None

class qsprpred.extra.data.descriptors.fingerprints.CDKSubstructureFP(use_counts: bool = False)[source]

Bases: Fingerprint

CDK Substructure fingerprint.

Based on SMARTS patterns for functional group classification by Christian Laggner.

Variables:: useCounts (bool) – whether to use counts instead of presence/absence

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

property descriptors: list[str]: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Return the CDK Substructure fingerprint for the input molecules.

Parameters:: mols (list[Chem.Mol]) – molecules to obtain the fingerprint of
Returns:: np.ndarray of fingerprints for mols
Return type:: np.ndarray

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

property usedBits: list[int] | None

qsprpred.extra.data.descriptors.sets module

Module with definitions of various extra descriptor sets:

Mordred: Descriptors from molecular descriptor calculation software Mordred.
Mold2: Descriptors from molecular descriptor calculation software Mold2.
PaDEL: Descriptors from molecular descriptor calculation software PaDEL.
ProDec: Protein descriptors from the ProDec package.

class qsprpred.extra.data.descriptors.sets.ExtendedValenceSignature(depth: int | list[int])[source]

Bases: DescriptorSet

SMILES signature based on extended valence sequence from

The Signature Molecular Descriptor.

1. Using Extended Valence Sequences in QSAR and QSPR StudiesJean-Loup Faulon, Donald P. Visco, and Ramdas S. Pophale Journal of Chemical Information and Computer Sciences 2003 43 (3), 707-720 DOI: 10.1021/ci020345w

Initialize a ExtendedValenceSignature calculator

Parameters:: depth – depth of the signature

property descriptors: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Method to calculate descriptors for a list of molecules.

This method should use molecules as they are without any preparation. Any preparation steps should be defined in the DescriptorSet.prepMols method., which is picked up by the main DescriptorSet.__call__.

Parameters:

mols (list) – list of SMILES or RDKit molecules
props (dict) – dictionary of properties
*args – positional arguments
**kwargs – keyword arguments

Returns:

numpy array of descriptor values of shape (n_mols, n_descriptors)

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

class qsprpred.extra.data.descriptors.sets.Mold2(descs: list[str] | None = None)[source]

Bases: DescriptorSet

Descriptors from molecular descriptor calculation software Mold2.

From https://github.com/OlivierBeq/Mold2_pywrapper. Initialize the descriptor with no arguments. All descriptors are always calculated.

Parameters:: descs – names of Mold2 descriptors to be calculated (e.g. D001)

Initialize a Mold2 descriptor calculator.

Parameters:: descs (list[str] | None) – names of Mold2 descriptors to be calculated (e.g. D001)

property descriptors: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Method to calculate descriptors for a list of molecules.

This method should use molecules as they are without any preparation. Any preparation steps should be defined in the DescriptorSet.prepMols method., which is picked up by the main DescriptorSet.__call__.

Parameters:

mols (list) – list of SMILES or RDKit molecules
props (dict) – dictionary of properties
*args – positional arguments
**kwargs – keyword arguments

Returns:

numpy array of descriptor values of shape (n_mols, n_descriptors)

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

class qsprpred.extra.data.descriptors.sets.Mordred(descs: list[str] | None = None, version: str | None = None, ignore_3D: bool = False, config: str | None = None)[source]

Bases: DescriptorSet

Descriptors from molecular descriptor calculation software Mordred.

From https://github.com/mordred-descriptor/mordred.

Variables:

descs (list[str]) – List of Mordred descriptor names.
version (str) – version of mordred
ignore_3D (bool) – ignore 3D information
config (str) – path to config file if available

Initialize the descriptor with the same arguments as you would pass to DescriptorsCalculator function of Mordred, except the descs argument, which can also be a list of mordred descriptor names instead of a mordred descriptor module.

Parameters:

descs (list[str]) – List of Mordred descriptor names, a Mordred descriptor module or None for all mordred descriptors
version (str) – version of mordred
ignore_3D (bool) – ignore 3D information
config (str) – path to config file?

property descriptors: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Method to calculate descriptors for a list of molecules.

This method should use molecules as they are without any preparation. Any preparation steps should be defined in the DescriptorSet.prepMols method., which is picked up by the main DescriptorSet.__call__.

Parameters:

mols (list) – list of SMILES or RDKit molecules
props (dict) – dictionary of properties
*args – positional arguments
**kwargs – keyword arguments

Returns:

numpy array of descriptor values of shape (n_mols, n_descriptors)

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

class qsprpred.extra.data.descriptors.sets.PaDEL(descs: list[str] | None = None, ignore_3d: bool = True, n_jobs: int | None = None)[source]

Bases: DescriptorSet

Descriptors from molecular descriptor calculation software PaDEL.

From https://github.com/OlivierBeq/PaDEL_pywrapper.

Variables:: descriptors (list[str]) – list of PaDEL descriptor names

Initialize a PaDEL calculator

Parameters:

descs – list of PaDEL descriptor short names
ignore_3d (bool) – skip 3D descriptor calculation

property descriptors: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any]], *args, **kwargs) → ndarray[source]

Method to calculate descriptors for a list of molecules.

This method should use molecules as they are without any preparation. Any preparation steps should be defined in the DescriptorSet.prepMols method., which is picked up by the main DescriptorSet.__call__.

Parameters:

mols (list) – list of SMILES or RDKit molecules
props (dict) – dictionary of properties
*args – positional arguments
**kwargs – keyword arguments

Returns:

numpy array of descriptor values of shape (n_mols, n_descriptors)

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

property supportsParallel: bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

class qsprpred.extra.data.descriptors.sets.ProDec(sets: list[str] | None = None, msa_provider: ~qsprpred.extra.data.utils.msa_calculator.MSAProvider = <qsprpred.extra.data.utils.msa_calculator.ClustalMSA object>)[source]

Bases: ProteinDescriptorSet

Protein descriptors from the ProDec package.

See https://github.com/OlivierBeq/ProDEC.

Variables:

sets (list[str]) – list of ProDec descriptor names (see https://github.com/OlivierBeq/ProDEC)
factory (prodec.ProteinDescriptors) – factory to calculate descriptors

Initialize a ProDec calculator.

Parameters:: sets – list of ProDec descriptor names, if None, all available are used (see https://github.com/OlivierBeq/ProDEC)

static calculateDescriptor(factory: ProteinDescriptors, msa: dict[str, str], descriptor: str)[source]

Calculate a protein descriptor for given targets using a given multiple sequence alignment.

Parameters:

factory (ProteinDescriptors) – factory to create the descriptor
msa (dict) – mapping of accession keys to sequences from the multiple sequence alignment
descriptor (str) – name of the descriptor to calculate (see https://github.com/OlivierBeq/ProDEC)

Returns:

a data frame of descriptor values of shape (acc_keys, n_descriptors), indexed by acc_keys

property descriptors: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any] | dict[str, str]], *args, **kwargs) → ndarray

Get array of calculated protein descriptors for given targets.

Parameters:

mols (list[Mol]) – list of molecules, not used
props (dict[str, list[Any] | dict[str, str]]) – dictionary of properties for the molecules, including the accession keys
*args – additional arguments, not used
**kwargs – additional keyword arguments, passed to getProteinDescriptors

Returns:

array of calculated protein descriptors

Return type:

np.ndarray

getProteinDescriptors(acc_keys: list[str], sequences: dict[str, str] | None = None, **kwargs) → DataFrame[source]

Calculate the protein descriptors for a given target.

Parameters:

acc_keys – target accession keys, defines the resulting index of the returned pd.DataFrame
sequences – optional list of protein sequences matched to the accession keys
**kwargs – any additional data passed from ProteinDescriptorCalculator

Returns:

a data frame of descriptor values of shape (acc_keys, n_descriptors),

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

supportsParallel() → bool: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

class qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet(id_prop: str | None = None)[source]

Bases: DescriptorSet

Abstract base class for protein descriptor sets.

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

abstract property descriptors: list[str]: Return a list of current descriptor names.

property dtype: Convert the descriptor values to this type.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getDescriptors(mols: list[rdkit.Chem.rdchem.Mol], props: dict[str, list[Any] | dict[str, str]], *args, **kwargs) → ndarray[source]

Get array of calculated protein descriptors for given targets.

Parameters:

mols (list[Mol]) – list of molecules, not used
props (dict[str, list[Any] | dict[str, str]]) – dictionary of properties for the molecules, including the accession keys
*args – additional arguments, not used
**kwargs – additional keyword arguments, passed to getProteinDescriptors

Returns:

array of calculated protein descriptors

Return type:

np.ndarray

abstract getProteinDescriptors(acc_keys: list[str], sequences: dict[str, str] | None = None, **kwargs) → DataFrame[source]

Calculate the protein descriptors for a given target.

Parameters:

acc_keys (list[str]) – target accession keys, the resulting data frame will be indexed by these keys
sequences (dict[str, str]) – optional list of protein sequences matched to the accession keys
**kwargs – additional data passed from ProteinDescriptorCalculator

Returns:

a data frame of descriptor values of shape (acc_keys, n_descriptors), indexed by acc_keys

Return type:

pd.DataFrame

property isFP: Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | rdkit.Chem.rdchem.Mol], to_list=False) → list[rdkit.Chem.rdchem.Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:

mols – list of molecules (SMILES str or RDKit Mol)
to_list – if True, return a list instead of a generator

Returns:

a list or generator of RDKit molecules

prepMols(mols: list[str | rdkit.Chem.rdchem.Mol]) → list[rdkit.Chem.rdchem.Mol]: Prepare the molecules for descriptor calculation.

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method. By default, no properties are required.

supportsParallel() → bool[source]: Return True if the descriptor set supports parallel calculation.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

transformToFeatureNames()

static treatInfs(df: DataFrame) → DataFrame

Replace infinite values by NaNs.

Parameters:: df – dataframe to treat
Returns:: dataframe with infinite values replaced by NaNs

qsprpred.extra.data.descriptors.tests module

class qsprpred.extra.data.descriptors.tests.TestDescriptorSetsExtra(methodName='runTest')[source]

Bases: DataSetsMixInExtras, QSPRTestCase

Test descriptor sets with extra features.

Variables:: dataset (QSPRDataset) – dataset for testing, shuffled

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs): Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:

typeobj – The data type to call this function on when both values are of the same type in assertEqual().
function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),
Counter(list(second)))

Example:

[0, 1, 1] and [1, 0, 1] compare equal.

[0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)

assertEqual(first, second, msg=None): Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None): Check that the expression is false.

assertGreater(a, b, msg=None): Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None): Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None): Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None): Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None): Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None): Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None): Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None): Included for symmetry with assertIsNone.

assertLess(a, b, msg=None): Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None): Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:

list1 – The first list to compare.
list2 – The second list to compare.
msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])

assertMultiLineEqual(first, second, msg=None): Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEqual(first, second, msg=None): Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotIn(member, container, msg=None): Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None): Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None): Fail the test if the text matches the regular expression.

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)

assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:

expected_exception – Exception class expected to be raised.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None): Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:

seq1 – The first sequence to compare.
seq2 – The second sequence to compare.
seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:

set1 – The first set to compare.
set2 – The second set to compare.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None): Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:

tuple1 – The first tuple to compare.
tuple2 – The second tuple to compare.
msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)

assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:

expected_warning – Warning class expected to be triggered.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated(): Remove the directories that are used for testing.

countTestCases()

createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
preparation_settings (dict) – dictionary containing preparation settings
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)

Create a small dataset for testing purposes.

Parameters:

name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.
target_props (list[TargetProperty] | list[dict], optional) – target properties.
preparation_settings (dict | None, optional) – preparation settings. Defaults to None.
protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.
random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:

df (pd.DataFrame) – data frame containing the dataset
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

debug(): Run the test without collecting errors in a TestResult

defaultTestResult()

classmethod doClassCleanups(): Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups(): Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm): Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None): Fail immediately, with the given message.

failureException: alias of AssertionError

classmethod getAllDescriptors() → list[qsprpred.data.descriptors.sets.DescriptorSet]

Return a list of all available molecule descriptor sets.

Returns:: list of MoleculeDescriptorSet objects
Return type:: list

classmethod getAllProteinDescriptors() → list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]

Return a list of all available protein descriptor sets.

Returns:: list of ProteinDescriptorSet objects
Return type:: list

getBigDF()

Get a large data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:: a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)
Return type:: grid

classmethod getDefaultCalculatorCombo(): Return the default descriptor calculator combo.

static getDefaultPrep(): Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)

getPCMDF() → DataFrame

Return a test dataframe with PCM data.

Returns:: dataframe with PCM data
Return type:: pd.DataFrame

getPCMSeqProvider() → Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

Return a function that provides sequences for given accessions.

Returns:: function that provides sequences for given accessions
Return type:: Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() → DataFrame

Return a test dataframe with PCM targets and their sequences.

Returns:: dataframe with PCM targets and their sequences
Return type:: pd.DataFrame

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:: list of `list`s of all possible combinations of preparation
Return type:: list

getSmallDF()

Get a small data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

id()

longMessage = True

maxDiff = 640

run(result=None)

setUp()[source]: Hook method for setting up the test fixture before exercising it.

classmethod setUpClass(): Hook method for setting up class fixture before running tests in the class.

setUpPaths(): Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason): Skip this test.

subTest(msg=<object object>, **params): Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown(): Remove all files and directories that are used for testing.

classmethod tearDownClass(): Hook method for deconstructing the class fixture after running all tests in the class.

testExtendedValenceSignature()[source]: Test the SMILES based signature descriptor calculator.

testMordred()[source]: Test the Mordred descriptor calculator.

testPaDELDescriptors()[source]: Test the PaDEL descriptor calculator.

testPaDELFingerprints = None

testPaDELFingerprints_0(**kw)

testPaDELFingerprints_1(**kw)

testPaDELFingerprints_2(**kw)

testPaDELFingerprints_3(**kw)

testPaDELFingerprints_4(**kw)

testPaDELFingerprints_5(**kw)

testPaDELFingerprints_6(**kw)

testPaDELFingerprints_7(**kw)

testPaDELFingerprints_8(**kw)

validate_split(dataset): Check if the split has the data it should have after splitting.

class qsprpred.extra.data.descriptors.tests.TestDescriptorsExtra(methodName='runTest')[source]

Bases: DataSetsMixInExtras, DescriptorInDataCheckMixIn, QSPRTestCase

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs): Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:

typeobj – The data type to call this function on when both values are of the same type in assertEqual().
function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),
Counter(list(second)))

Example:

[0, 1, 1] and [1, 0, 1] compare equal.

[0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)

assertEqual(first, second, msg=None): Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None): Check that the expression is false.

assertGreater(a, b, msg=None): Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None): Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None): Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None): Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None): Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None): Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None): Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None): Included for symmetry with assertIsNone.

assertLess(a, b, msg=None): Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None): Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:

list1 – The first list to compare.
list2 – The second list to compare.
msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])

assertMultiLineEqual(first, second, msg=None): Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEqual(first, second, msg=None): Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotIn(member, container, msg=None): Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None): Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None): Fail the test if the text matches the regular expression.

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)

assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:

expected_exception – Exception class expected to be raised.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None): Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:

seq1 – The first sequence to compare.
seq2 – The second sequence to compare.
seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:

set1 – The first set to compare.
set2 – The second set to compare.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None): Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:

tuple1 – The first tuple to compare.
tuple2 – The second tuple to compare.
msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)

assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:

expected_warning – Warning class expected to be triggered.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkDataSetContainsDescriptorSet(dataset, desc_set, prep_combo, target_props): Check if a descriptor set is in a data set.

checkDescriptors(dataset: QSPRDataset, target_props: list[dict | qsprpred.tasks.TargetProperty])

Check if information about descriptors is consistent in the data set. Checks if calculators are consistent with the descriptors contained in the data set. This is tested also before and after serialization.

Parameters:

dataset (QSPRDataset) – The data set to check.
target_props (List of dicts or TargetProperty) – list of target properties

Raises:

AssertionError – If the consistency check fails.

checkFeatures(ds: QSPRDataset, expected_length: int)

Check if the feature names and the feature matrix of a data set is consistent with expected number of variables.

Parameters:

ds (QSPRDataset) – The data set to check.
expected_length (int) – The expected number of features.

Raises:

AssertionError – If the feature names or the feature matrix is not consistent

clearGenerated(): Remove the directories that are used for testing.

countTestCases()

createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
preparation_settings (dict) – dictionary containing preparation settings
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)

Create a small dataset for testing purposes.

Parameters:

name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.
target_props (list[TargetProperty] | list[dict], optional) – target properties.
preparation_settings (dict | None, optional) – preparation settings. Defaults to None.
protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.
random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:

df (pd.DataFrame) – data frame containing the dataset
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

debug(): Run the test without collecting errors in a TestResult

defaultTestResult()

classmethod doClassCleanups(): Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups(): Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm): Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None): Fail immediately, with the given message.

failureException: alias of AssertionError

classmethod getAllDescriptors() → list[qsprpred.data.descriptors.sets.DescriptorSet]

Return a list of all available molecule descriptor sets.

Returns:: list of MoleculeDescriptorSet objects
Return type:: list

classmethod getAllProteinDescriptors() → list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]

Return a list of all available protein descriptor sets.

Returns:: list of ProteinDescriptorSet objects
Return type:: list

getBigDF()

Get a large data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

static getDatSetName(desc_set, target_props): Get a unique name for a data set.

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:: a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)
Return type:: grid

classmethod getDefaultCalculatorCombo(): Return the default descriptor calculator combo.

static getDefaultPrep(): Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)

getPCMDF() → DataFrame

Return a test dataframe with PCM data.

Returns:: dataframe with PCM data
Return type:: pd.DataFrame

getPCMSeqProvider() → Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

Return a function that provides sequences for given accessions.

Returns:: function that provides sequences for given accessions
Return type:: Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() → DataFrame

Return a test dataframe with PCM targets and their sequences.

Returns:: dataframe with PCM targets and their sequences
Return type:: pd.DataFrame

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:: list of `list`s of all possible combinations of preparation
Return type:: list

getSmallDF()

Get a small data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

id()

longMessage = True

maxDiff = 640

run(result=None)

setUp()[source]: Hook method for setting up the test fixture before exercising it.

classmethod setUpClass(): Hook method for setting up class fixture before running tests in the class.

setUpPaths(): Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason): Skip this test.

subTest(msg=<object object>, **params): Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown(): Remove all files and directories that are used for testing.

classmethod tearDownClass(): Hook method for deconstructing the class fixture after running all tests in the class.

testDescriptorsExtraAll = None

testDescriptorsExtraAll_00_Mordred(**kw): Test the calculation of extra descriptors with data preparation [with _=’Mordred’, desc_set=<qsprpred.extra.data.descriptors…ordred object at 0x7efff578c770>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_01_CDKFP(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKFP’, desc_set=<qsprpred.extra.data.descriptors….CDKFP object at 0x7efff5b2f560>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_02_CDKExtendedFP(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKExtendedFP’, desc_set=<qsprpred.extra.data.descriptors…ndedFP object at 0x7efff6b6be60>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_03_CDKEStateFP(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKEStateFP’, desc_set=<qsprpred.extra.data.descriptors…tateFP object at 0x7efff6166090>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_04_CDKGraphOnlyFP(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKGraphOnlyFP’, desc_set=<qsprpred.extra.data.descriptors…OnlyFP object at 0x7efff6af2420>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_05_CDKMACCSFP(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKMACCSFP’, desc_set=<qsprpred.extra.data.descriptors…ACCSFP object at 0x7efff5d5f680>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_06_CDKPubchemFP(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKPubchemFP’, desc_set=<qsprpred.extra.data.descriptors…chemFP object at 0x7efff5882780>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_07_CDKSubstructureFP(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKSubstructureFP’, desc_set=<qsprpred.extra.data.descriptors…tureFP object at 0x7efff5c229c0>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_08_CDKKlekotaRothFPCount(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKKlekotaRothFPCount’, desc_set=<qsprpred.extra.data.descriptors…RothFP object at 0x7efff56c1af0>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_09_CDKAtomPairs2DFP(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKAtomPairs2DFP’, desc_set=<qsprpred.extra.data.descriptors…rs2DFP object at 0x7efff56c2690>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_10_CDKSubstructureFPCount(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKSubstructureFPCount’, desc_set=<qsprpred.extra.data.descriptors…tureFP object at 0x7efff56c16a0>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_11_CDKKlekotaRothFP(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKKlekotaRothFP’, desc_set=<qsprpred.extra.data.descriptors…RothFP object at 0x7efff56c2510>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_12_CDKAtomPairs2DFPCount(**kw): Test the calculation of extra descriptors with data preparation [with _=’CDKAtomPairs2DFPCount’, desc_set=<qsprpred.extra.data.descriptors…rs2DFP object at 0x7efff56c1d00>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_13_PaDEL(**kw): Test the calculation of extra descriptors with data preparation [with _=’PaDEL’, desc_set=<qsprpred.extra.data.descriptors….PaDEL object at 0x7efff5504ce0>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

testDescriptorsExtraAll_14_ExtendedValenceSignature(**kw): Test the calculation of extra descriptors with data preparation [with _=’ExtendedValenceSignature’, desc_set=<qsprpred.extra.data.descriptors…nature object at 0x7efff55576e0>, target_props=[{‘name’: ‘CL’, ‘task’: <TargetTasks.REGRESSION: ‘REGRESSION’>}]].

validate_split(dataset): Check if the split has the data it should have after splitting.

class qsprpred.extra.data.descriptors.tests.TestDescriptorsPCM(methodName='runTest')[source]

Bases: DataSetsMixInExtras, DescriptorInDataCheckMixIn, TestCase

Test the calculation of PCM descriptors with data preparation.

Variables:: defaultMSA (MSAProvider) – Default MSA provider.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs): Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:

typeobj – The data type to call this function on when both values are of the same type in assertEqual().
function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),
Counter(list(second)))

Example:

[0, 1, 1] and [1, 0, 1] compare equal.

[0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)

assertEqual(first, second, msg=None): Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None): Check that the expression is false.

assertGreater(a, b, msg=None): Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None): Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None): Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None): Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None): Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None): Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None): Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None): Included for symmetry with assertIsNone.

assertLess(a, b, msg=None): Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None): Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:

list1 – The first list to compare.
list2 – The second list to compare.
msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])

assertMultiLineEqual(first, second, msg=None): Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEqual(first, second, msg=None): Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotIn(member, container, msg=None): Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None): Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None): Fail the test if the text matches the regular expression.

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)

assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:

expected_exception – Exception class expected to be raised.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None): Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:

seq1 – The first sequence to compare.
seq2 – The second sequence to compare.
seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:

set1 – The first set to compare.
set2 – The second set to compare.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None): Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:

tuple1 – The first tuple to compare.
tuple2 – The second tuple to compare.
msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)

assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:

expected_warning – Warning class expected to be triggered.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkDataSetContainsDescriptorSet(dataset, desc_set, prep_combo, target_props): Check if a descriptor set is in a data set.

checkDescriptors(dataset: QSPRDataset, target_props: list[dict | qsprpred.tasks.TargetProperty])

Check if information about descriptors is consistent in the data set. Checks if calculators are consistent with the descriptors contained in the data set. This is tested also before and after serialization.

Parameters:

dataset (QSPRDataset) – The data set to check.
target_props (List of dicts or TargetProperty) – list of target properties

Raises:

AssertionError – If the consistency check fails.

checkFeatures(ds: QSPRDataset, expected_length: int)

Check if the feature names and the feature matrix of a data set is consistent with expected number of variables.

Parameters:

ds (QSPRDataset) – The data set to check.
expected_length (int) – The expected number of features.

Raises:

AssertionError – If the feature names or the feature matrix is not consistent

clearGenerated(): Remove the directories that are used for testing.

countTestCases()

createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
preparation_settings (dict) – dictionary containing preparation settings
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)

Create a small dataset for testing purposes.

Parameters:

name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.
target_props (list[TargetProperty] | list[dict], optional) – target properties.
preparation_settings (dict | None, optional) – preparation settings. Defaults to None.
protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.
random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:

df (pd.DataFrame) – data frame containing the dataset
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

debug(): Run the test without collecting errors in a TestResult

defaultTestResult()

classmethod doClassCleanups(): Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups(): Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm): Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None): Fail immediately, with the given message.

failureException: alias of AssertionError

classmethod getAllDescriptors() → list[qsprpred.data.descriptors.sets.DescriptorSet]

Return a list of all available molecule descriptor sets.

Returns:: list of MoleculeDescriptorSet objects
Return type:: list

classmethod getAllProteinDescriptors() → list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]

Return a list of all available protein descriptor sets.

Returns:: list of ProteinDescriptorSet objects
Return type:: list

getBigDF()

Get a large data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

static getDatSetName(desc_set, target_props): Get a unique name for a data set.

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:: a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)
Return type:: grid

classmethod getDefaultCalculatorCombo(): Return the default descriptor calculator combo.

static getDefaultPrep(): Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)

getPCMDF() → DataFrame

Return a test dataframe with PCM data.

Returns:: dataframe with PCM data
Return type:: pd.DataFrame

getPCMSeqProvider() → Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

Return a function that provides sequences for given accessions.

Returns:: function that provides sequences for given accessions
Return type:: Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() → DataFrame

Return a test dataframe with PCM targets and their sequences.

Returns:: dataframe with PCM targets and their sequences
Return type:: pd.DataFrame

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:: list of `list`s of all possible combinations of preparation
Return type:: list

getSmallDF()

Get a small data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

id()

longMessage = True

maxDiff = 640

run(result=None)

setUp()[source]: Hook method for setting up the test fixture before exercising it.

classmethod setUpClass(): Hook method for setting up class fixture before running tests in the class.

setUpPaths(): Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason): Skip this test.

subTest(msg=<object object>, **params): Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown(): Remove all files and directories that are used for testing.

classmethod tearDownClass(): Hook method for deconstructing the class fixture after running all tests in the class.

testDescriptorsPCMAll = None

testDescriptorsPCMAll_0_ProDec_Zscale_Hellberg_MULTICLASS(**kw)

Tests all available descriptor sets with data set preparation [with _=’ProDec_Zscale Hellberg_MULTICLASS’, desc_set=<qsprpred.extra.data.descriptors…ProDec object at 0x7efff5557980>, target_props=[{‘name’: ‘pchembl_value_Median’…>, ‘th’: [2.0, 5.5, 6.5, 12.0]}]].

Note that they are not checked with all possible settings and all possible preparations, but only with the default settings provided by DataSetsPathMixIn.getDefaultPrep(). The list itself is defined and configured by DataSetsPathMixIn.getAllDescriptors(), so if you need a specific descriptor tested, add it there.

testDescriptorsPCMAll_1_ProDec_Sneath_MULTICLASS(**kw)

Tests all available descriptor sets with data set preparation [with _=’ProDec_Sneath_MULTICLASS’, desc_set=<qsprpred.extra.data.descriptors…ProDec object at 0x7efff5a043b0>, target_props=[{‘name’: ‘pchembl_value_Median’…>, ‘th’: [2.0, 5.5, 6.5, 12.0]}]].

Note that they are not checked with all possible settings and all possible preparations, but only with the default settings provided by DataSetsPathMixIn.getDefaultPrep(). The list itself is defined and configured by DataSetsPathMixIn.getAllDescriptors(), so if you need a specific descriptor tested, add it there.

testDescriptorsPCMAll_2_ProDec_Zscale_Hellberg_REGRESSION(**kw)

Tests all available descriptor sets with data set preparation [with _=’ProDec_Zscale Hellberg_REGRESSION’, desc_set=<qsprpred.extra.data.descriptors…ProDec object at 0x7efff567c950>, target_props=[{‘name’: ‘pchembl_value_Median’…asks.REGRESSION: ‘REGRESSION’>}]].

Note that they are not checked with all possible settings and all possible preparations, but only with the default settings provided by DataSetsPathMixIn.getDefaultPrep(). The list itself is defined and configured by DataSetsPathMixIn.getAllDescriptors(), so if you need a specific descriptor tested, add it there.

testDescriptorsPCMAll_3_ProDec_Sneath_REGRESSION(**kw)

Tests all available descriptor sets with data set preparation [with _=’ProDec_Sneath_REGRESSION’, desc_set=<qsprpred.extra.data.descriptors…ProDec object at 0x7efff5443470>, target_props=[{‘name’: ‘pchembl_value_Median’…asks.REGRESSION: ‘REGRESSION’>}]].

Note that they are not checked with all possible settings and all possible preparations, but only with the default settings provided by DataSetsPathMixIn.getDefaultPrep(). The list itself is defined and configured by DataSetsPathMixIn.getAllDescriptors(), so if you need a specific descriptor tested, add it there.

testDescriptorsPCMAll_4_ProDec_Zscale_Hellberg_Multitask(**kw)

Tests all available descriptor sets with data set preparation [with _=’ProDec_Zscale Hellberg_Multitask’, desc_set=<qsprpred.extra.data.descriptors…ProDec object at 0x7efff54e84a0>, target_props=[{‘name’: ‘pchembl_value_Median’…S: ‘SINGLECLASS’>, ‘th’: [6.5]}]].

Note that they are not checked with all possible settings and all possible preparations, but only with the default settings provided by DataSetsPathMixIn.getDefaultPrep(). The list itself is defined and configured by DataSetsPathMixIn.getAllDescriptors(), so if you need a specific descriptor tested, add it there.

testDescriptorsPCMAll_5_ProDec_Sneath_Multitask(**kw)

Tests all available descriptor sets with data set preparation [with _=’ProDec_Sneath_Multitask’, desc_set=<qsprpred.extra.data.descriptors…ProDec object at 0x7efff5361490>, target_props=[{‘name’: ‘pchembl_value_Median’…S: ‘SINGLECLASS’>, ‘th’: [6.5]}]].

Note that they are not checked with all possible settings and all possible preparations, but only with the default settings provided by DataSetsPathMixIn.getDefaultPrep(). The list itself is defined and configured by DataSetsPathMixIn.getAllDescriptors(), so if you need a specific descriptor tested, add it there.

validate_split(dataset): Check if the split has the data it should have after splitting.

class qsprpred.extra.data.descriptors.tests.TestPCMDataSet(methodName='runTest')[source]

Bases: DataSetsMixInExtras, TestCase

Test the PCM data set features.

Variables:

dataset (QSPRDataset) – dataset for testing
sampleDescSet (DescriptorSet) – descriptor set for testing
defaultMSA (BioPythonMSA) – MSA provider for testing

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs): Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:

typeobj – The data type to call this function on when both values are of the same type in assertEqual().
function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),
Counter(list(second)))

Example:

[0, 1, 1] and [1, 0, 1] compare equal.

[0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)

assertEqual(first, second, msg=None): Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None): Check that the expression is false.

assertGreater(a, b, msg=None): Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None): Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None): Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None): Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None): Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None): Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None): Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None): Included for symmetry with assertIsNone.

assertLess(a, b, msg=None): Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None): Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:

list1 – The first list to compare.
list2 – The second list to compare.
msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])

assertMultiLineEqual(first, second, msg=None): Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEqual(first, second, msg=None): Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotIn(member, container, msg=None): Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None): Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None): Fail the test if the text matches the regular expression.

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)

assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:

expected_exception – Exception class expected to be raised.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None): Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:

seq1 – The first sequence to compare.
seq2 – The second sequence to compare.
seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:

set1 – The first set to compare.
set2 – The second set to compare.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None): Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:

tuple1 – The first tuple to compare.
tuple2 – The second tuple to compare.
msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)

assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:

expected_warning – Warning class expected to be triggered.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated(): Remove the directories that are used for testing.

countTestCases()

createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
preparation_settings (dict) – dictionary containing preparation settings
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)

Create a small dataset for testing purposes.

Parameters:

name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.
target_props (list[TargetProperty] | list[dict], optional) – target properties.
preparation_settings (dict | None, optional) – preparation settings. Defaults to None.
protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.
random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:

df (pd.DataFrame) – data frame containing the dataset
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

debug(): Run the test without collecting errors in a TestResult

defaultTestResult()

classmethod doClassCleanups(): Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups(): Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm): Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None): Fail immediately, with the given message.

failureException: alias of AssertionError

classmethod getAllDescriptors() → list[qsprpred.data.descriptors.sets.DescriptorSet]

Return a list of all available molecule descriptor sets.

Returns:: list of MoleculeDescriptorSet objects
Return type:: list

classmethod getAllProteinDescriptors() → list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]

Return a list of all available protein descriptor sets.

Returns:: list of ProteinDescriptorSet objects
Return type:: list

getBigDF()

Get a large data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:: a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)
Return type:: grid

classmethod getDefaultCalculatorCombo(): Return the default descriptor calculator combo.

static getDefaultPrep(): Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)

getPCMDF() → DataFrame

Return a test dataframe with PCM data.

Returns:: dataframe with PCM data
Return type:: pd.DataFrame

getPCMSeqProvider() → Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

Return a function that provides sequences for given accessions.

Returns:: function that provides sequences for given accessions
Return type:: Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() → DataFrame

Return a test dataframe with PCM targets and their sequences.

Returns:: dataframe with PCM targets and their sequences
Return type:: pd.DataFrame

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:: list of `list`s of all possible combinations of preparation
Return type:: list

getSmallDF()

Get a small data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

id()

longMessage = True

maxDiff = 640

run(result=None)

setUp()[source]: Set up the test Dataframe.

classmethod setUpClass(): Hook method for setting up class fixture before running tests in the class.

setUpPaths(): Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason): Skip this test.

subTest(msg=<object object>, **params): Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown(): Remove all files and directories that are used for testing.

classmethod tearDownClass(): Hook method for deconstructing the class fixture after running all tests in the class.

testProDec = None

testProDec_0_MAFFT(**kw)

testProDec_1_ClustalMSA(**kw)

testSerialization = None

testSerialization_0_MAFFT(**kw)

Test the serialization of dataset with data split [with _=’MAFFT’, msa_provider_cls=<class ‘qsprpred.extra.data.utils.msa_calculator.MAFFT’>].

Parameters:: msa_provider_cls (BioPythonMSA) – MSA provider class

testSerialization_1_ClustalMSA(**kw)

Test the serialization of dataset with data split [with _=’ClustalMSA’, msa_provider_cls=<class ‘qsprpred.extra.data.utils.msa_calculator.ClustalMSA’>].

Parameters:: msa_provider_cls (BioPythonMSA) – MSA provider class

testSwitching()[source]: Test if the feature calculator can be switched to a new dataset.

testWithMolDescriptors()[source]: Test the calculation of protein and molecule descriptors.

validate_split(dataset): Check if the split has the data it should have after splitting.

qsprpred.extra.data.descriptors package

Submodules

qsprpred.extra.data.descriptors.fingerprints module

qsprpred.extra.data.descriptors.sets module

qsprpred.extra.data.descriptors.tests module

Module contents