qsprpred.data.chem package

Subpackages

qsprpred.data.chem.standardizers package

Submodules

qsprpred.data.chem.clustering module

class qsprpred.data.chem.clustering.FPSimilarityClusters(fp_calculator: ~qsprpred.data.descriptors.fingerprints.Fingerprint = <qsprpred.data.descriptors.fingerprints.MorganFP object>, id_prop: str | None = None)[source]

Bases: MoleculeClusters

Abstract base class for clustering molecules based on molecular fingerprint.

Variables:: fp_calculator (Fingerprint) – fingerprint calculator

Initialize the FPSimilarityClusters

Parameters:

fp_calculator (Fingerprint) – fingerprint calculator
id_prop (str) – name of the property to be used as ID

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getClusters(smiles_list: list[str]) → dict[int, list[int]][source]

Cluster a list of SMILES strings based on molecular dissimilarity.

Parameters:: smiles_list (list[str]) – list of SMILES strings to be clustered
Returns:: dictionary of clusters, where keys are cluster indices and values are indices of molecules
Return type:: clusters (dict[int, list[int]])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:

mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.
props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

supportsParallel() → bool: Whether the processor supports parallel processing.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

class qsprpred.data.chem.clustering.FPSimilarityLeaderPickerClusters(similarity_threshold: float = 0.7, fp_calculator: ~qsprpred.data.descriptors.fingerprints.Fingerprint = <qsprpred.data.descriptors.fingerprints.MorganFP object>, id_prop: str | None = None)[source]

Bases: FPSimilarityClusters

Cluster molecules based on molecular fingerprint with LeaderPicker algorithm.

Variables:

fp_calculator (FingerprintSet) – fingerprint calculator
similarity_threshold (float) – similarity threshold

Initialize the FPSimilarityClusters

Parameters:

fp_calculator (Fingerprint) – fingerprint calculator
id_prop (str) – name of the property to be used as ID

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getClusters(smiles_list: list[str]) → dict[int, list[int]]

Cluster a list of SMILES strings based on molecular dissimilarity.

Parameters:: smiles_list (list[str]) – list of SMILES strings to be clustered
Returns:: dictionary of clusters, where keys are cluster indices and values are indices of molecules
Return type:: clusters (dict[int, list[int]])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:

mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.
props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

supportsParallel() → bool: Whether the processor supports parallel processing.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

class qsprpred.data.chem.clustering.FPSimilarityMaxMinClusters(n_clusters: int | None = None, seed: int | None = None, initial_centroids: list[str] | None = None, fp_calculator: ~qsprpred.data.descriptors.fingerprints.Fingerprint = <qsprpred.data.descriptors.fingerprints.MorganFP object>, id_prop: str | None = None)[source]

Bases: FPSimilarityClusters

Cluster molecules based on molecular fingerprint with MaxMin algorithm.

Variables:

nClusters (int) – number of clusters
seed (int) – random seed
initialCentroids (list) – list of indices of initial cluster centroids

Initialize the FPSimilarityMaxMinClusters

Parameters:

n_clusters (int) – number of clusters
seed (int) – random seed
initial_centroids (list) – list of indices of initial cluster centroids
fp_calculator (Fingerprint) – fingerprint calculator
id_prop (str) – name of the property to be used as ID

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getClusters(smiles_list: list[str]) → dict[int, list[int]]

Cluster a list of SMILES strings based on molecular dissimilarity.

Parameters:: smiles_list (list[str]) – list of SMILES strings to be clustered
Returns:: dictionary of clusters, where keys are cluster indices and values are indices of molecules
Return type:: clusters (dict[int, list[int]])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:

mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.
props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

supportsParallel() → bool: Whether the processor supports parallel processing.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

class qsprpred.data.chem.clustering.MoleculeClusters(id_prop: str | None = 'ID')[source]

Bases: MolProcessorWithID, JSONSerializable, ABC

Abstract base class for clustering molecules.

Variables:: nClusters (int) – number of clusters

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

abstract getClusters(smiles_list: list[str]) → dict[source]

Cluster molecules.

Parameters:: smiles_list (list) – list of molecules to be clustered
Returns:: dictionary of clusters, where keys are cluster indices and values are indices of molecules
Return type:: clusters (dict)

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:

mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.
props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

supportsParallel() → bool[source]: Whether the processor supports parallel processing.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

class qsprpred.data.chem.clustering.RandomClusters(seed: int = 42, n_clusters: int | None = None, id_prop: str | None = None)[source]

Bases: MoleculeClusters

Randomly cluster molecules.

Variables:

seed (int) – random seed
nClusters (int) – number of clusters

Initialize the RandomClusters

Parameters:

seed (int) – random seed
n_clusters (int) – number of clusters
id_prop (str) – name of the property to be used as ID

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getClusters(smiles_list: list[str]) → dict[str, list[int]][source]

Cluster molecules.

Parameters:: smiles_list (list) – list of molecules to be clustered
Returns:: dictionary of clusters, where keys are cluster indices and values are indices of molecules
Return type:: (dict[str, list[int]])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:

mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.
props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

supportsParallel() → bool: Whether the processor supports parallel processing.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

class qsprpred.data.chem.clustering.ScaffoldClusters(scaffold: ~qsprpred.data.chem.scaffolds.Scaffold = <qsprpred.data.chem.scaffolds.BemisMurckoRDKit object>, id_prop: str | None = None)[source]

Bases: MoleculeClusters

Cluster molecules based on scaffolds.

Variables:: scaffold (Scaffold) – scaffold generator

Initialize the ScaffoldClusters

Parameters:

scaffold (Scaffold) – scaffold generator
id_prop (str) – name of the property to be used as ID

classmethod fromFile(filename: str) → Any

Initialize a new instance from a JSON file.

Parameters:: filename (str) – path to the JSON file
Returns:: new instance of the class
Return type:: instance (object)

classmethod fromJSON(json: str) → Any

Reconstruct object from a JSON string.

Parameters:: json (str) – JSON string of the object
Returns:: reconstructed object
Return type:: obj (object)

getClusters(smiles_list: list[str]) → dict[int, list[int]][source]

Cluster molecules.

Parameters:: smiles_list (list) – list of molecules to be clustered
Returns:: dictionary of clusters, where keys are cluster indices and values are indices of molecules
Return type:: clusters (dict[int, list[int]])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:

mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.
props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

supportsParallel() → bool: Whether the processor supports parallel processing.

toFile(filename: str) → str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:: filename (str) – filename to save object to
Returns:: absolute path to the saved JSON file of the object
Return type:: filename (str)

toJSON() → str

Serialize object to a JSON string. This JSON string should: contain all data necessary to reconstruct the object.

Returns:: JSON string of the object
Return type:: json (str)

qsprpred.data.chem.identifiers module

class qsprpred.data.chem.identifiers.ChemIdentifier[source]

Bases: ABC

Interface for identifiers of molecules. This should be a simple callable that given a SMILES string returns a unique identifier.

class qsprpred.data.chem.identifiers.Identifiable[source]

Bases: ABC

Interface for objects that use a ChemIdentifier to identify duplicate molecules.

abstract applyIdentifier(identifier: ChemIdentifier)[source]

Apply an identifier to the SMILES in this instance (i.e. remove duplicates).

Parameters:: identifier (ChemIdentifier) – The identifier to apply.

abstract property identifier: ChemIdentifier

Get the identifier used by this instance.

Returns:: The identifier used by this instance.
Return type:: ChemIdentifier

class qsprpred.data.chem.identifiers.InchiIdentifier[source]

Bases: ChemIdentifier

Class for InChI identifiers of molecules.

class qsprpred.data.chem.identifiers.IndexIdentifier(zfill: int = 5)[source]

Bases: ChemIdentifier

Implementation of a ChemIdentifier that returns an index as the identifier.

Variables:

index (int) – The current index.
zfill (int) – The number of digits to zero-fill the index

Initialize the index identifier.

Parameters:: zfill (int) – The number of digits to zero-fill the index

qsprpred.data.chem.matching module

class qsprpred.data.chem.matching.SMARTSMatchProcessor(id_prop: str | None = 'ID')[source]

Bases: MolProcessorWithID

Processor that checks if molecules match a SMARTS pattern.

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:

mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.
props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool: Check if the processor supports parallel processing.

qsprpred.data.chem.matching.match_mol_to_smarts(mol: Mol | str, smarts: list[str], operator: Literal['or', 'and'] = 'or', use_chirality: bool = False) → bool[source]

Check if a molecule matches a SMARTS pattern.

Parameters:

mol (Chem.Mol or str) – Molecule to check.
smarts (list[str]) – List of SMARTS patterns to check.
operator (literal["or", "and"], optional) – Whether to use an “or” or “and” operator on patterns. Defaults to “or”. use_chirality: Whether to use chirality in the search.
use_chirality (bool, optional) – Whether to use chirality in the search.

Returns:

True if the molecule matches the pattern, False otherwise.

Return type:

(bool)

qsprpred.data.chem.scaffolds module

class qsprpred.data.chem.scaffolds.BemisMurcko(real_bemismurcko: bool = True, use_csk: bool = False, id_prop: str | None = None)[source]

Bases: Scaffold

Extension of rdkit’s BM-like scaffold to make it more true to the paper. In BM’s paper, exo bonds on linkers or on rings get cutoff but two electrons remain.

In the rdkit implementation, both atoms in the exo bond get included. This means for BM C1CCC1=N and C1CCC1=O are the same, for rdkit they are different.

When flattening the BM scaffold using MakeScaffoldGeneric() this leads to distinct scaffolds, as C1CCC1=O is flattened to C1CCC1C and not C1CCC1.

In this approach, the two electrons are represented as SMILES “=*”. This is to make sure the automatic oxidation state assignment of sulfur does not flatten C1CS1(=*)(=*) into C1CS1 when explicit hydrogen count is provided.

Ref.:

Bemis, G. W., & Murcko, M. A. (1996). “The properties of known drugs. 1. Molecular frameworks.” Journal of medicinal chemistry, 39(15), 2887-2893.

Related RDKit issue: https://github.com/rdkit/rdkit/discussions/6844

Credit: Original code provided by Wim Dehaen (@dehaenw)

Variables:

realBemisMurcko (bool) – Use guidelines from Bemis murcko paper. otherwise, use native rdkit implementation.
useCSK (bool) – Make scaffold generic (convert all bonds to single and all atoms to carbon). If realBemismurcko is on, also remove all flattened exo bonds.

Initialize the scaffold generator.

Parameters:

real_bemismurcko (bool) – Use guidelines from Bemis murcko paper. otherwise, use native rdkit implementation.
use_csk (bool) – Make scaffold generic (convert all bonds to single and all atoms to carbon).
id_prop (str) – Name of the property to use as the index.

static findTerminalAtoms(mol) → list[Atom][source]

Find terminal atoms in a molecule.

Parameters:: mol (Mol) – RDKit molecule.
Returns:: List of terminal atoms.
Return type:: list[Atom]

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:

mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.
props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Check if the processor supports parallel processing.

Returns:: True if the processor supports parallel processing, False otherwise.
Return type:: bool

class qsprpred.data.chem.scaffolds.BemisMurckoRDKit(id_prop: str | None = 'ID')[source]

Bases: Scaffold

Class for calculating Murcko scaffolds of a given molecule using the default implementation in RDKit. If you want an implementation closer to the original paper, see the BemisMurcko class.

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:

mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.
props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Check if the processor supports parallel processing.

Returns:: True if the processor supports parallel processing, False otherwise.
Return type:: bool

class qsprpred.data.chem.scaffolds.Scaffold(id_prop: str | None = 'ID')[source]

Bases: MolProcessorWithID, ABC

Abstract base class for calculating molecular scaffolds of different kinds.

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:: id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:

mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.
props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

property requiredProps: list[str]: The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Check if the processor supports parallel processing.

Returns:: True if the processor supports parallel processing, False otherwise.
Return type:: bool

qsprpred.data.chem.tests module

class qsprpred.data.chem.tests.TestClusters(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase

Test calculation of clusters.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs): Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:

typeobj – The data type to call this function on when both values are of the same type in assertEqual().
function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),
Counter(list(second)))

Example:

[0, 1, 1] and [1, 0, 1] compare equal.

[0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)

assertEndsWith(s, suffix, msg=None)

assertEqual(first, second, msg=None): Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None): Check that the expression is false.

assertGreater(a, b, msg=None): Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None): Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)

assertIn(member, container, msg=None): Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None): Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None): Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None): Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None): Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None): Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)

assertLess(a, b, msg=None): Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None): Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:

list1 – The first list to compare.
list2 – The second list to compare.
msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])

assertMultiLineEqual(first, second, msg=None): Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)

assertNotEqual(first, second, msg=None): Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)

assertNotIn(member, container, msg=None): Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None): Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)

assertNotRegex(text, unexpected_regex, msg=None): Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)

assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:

expected_exception – Exception class expected to be raised.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None): Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:

seq1 – The first sequence to compare.
seq2 – The second sequence to compare.
seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:

set1 – The first set to compare.
set2 – The second set to compare.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)

assertTrue(expr, msg=None): Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:

tuple1 – The first tuple to compare.
tuple2 – The second tuple to compare.
msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)

assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:

expected_warning – Warning class expected to be triggered.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated(): Remove the directories that are used for testing.

countTestCases()

createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:

df (pd.DataFrame) – data frame containing the dataset
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
prep (dict) – dictionary containing preparation settings
n_jobs (int) – number of jobs to use for parallel processing
chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug(): Run the test without collecting errors in a TestResult

defaultTestResult()

classmethod doClassCleanups(): Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups(): Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm): Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None): Fail immediately, with the given message.

failureException: alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:: list of DescriptorCalculator objects
Return type:: list

getBigDF()

Get a large data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:: a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)
Return type:: grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:: list of created DescriptorCalculator objects
Return type:: list

static getDefaultPrep(add_imputer=None): Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:: list of `list`s of all possible combinations of preparation
Return type:: list

getSmallDF()

Get a small data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)

id()

longMessage = True

maxDiff = 640

run(result=None)

setUp()[source]: Create a test dataset.

classmethod setUpClass(): Hook method for setting up class fixture before running tests in the class.

setUpPaths(): Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason): Skip this test.

subTest(msg=<object object>, **params): Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown(): Remove all files and directories that are used for testing.

classmethod tearDownClass(): Hook method for deconstructing the class fixture after running all tests in the class.

testClusterAdd = None

testClusterAdd_0_Random(**kw): Test the adding and getting of clusters [with _=’Random’, cluster=<qsprpred.data.chem.clustering.R…usters object at 0x7f9566733620>].

testClusterAdd_1_FPSimilarityMaxMin(**kw): Test the adding and getting of clusters [with _=’FPSimilarityMaxMin’, cluster=<qsprpred.data.chem.clustering.F…usters object at 0x7f9566358910>].

testClusterAdd_2_FPSimilarityLeaderPicker(**kw): Test the adding and getting of clusters [with _=’FPSimilarityLeaderPicker’, cluster=<qsprpred.data.chem.clustering.F…usters object at 0x7f9566733230>].

testClusterAdd_3_Scaffold(**kw): Test the adding and getting of clusters [with _=’Scaffold’, cluster=<qsprpred.data.chem.clustering.S…usters object at 0x7f9566733380>].

class qsprpred.data.chem.tests.TestScaffolds(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase

Test calculation of scaffolds.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs): Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:

typeobj – The data type to call this function on when both values are of the same type in assertEqual().
function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),
Counter(list(second)))

Example:

[0, 1, 1] and [1, 0, 1] compare equal.

[0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)

assertEndsWith(s, suffix, msg=None)

assertEqual(first, second, msg=None): Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None): Check that the expression is false.

assertGreater(a, b, msg=None): Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None): Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)

assertIn(member, container, msg=None): Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None): Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None): Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None): Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None): Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None): Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)

assertLess(a, b, msg=None): Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None): Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:

list1 – The first list to compare.
list2 – The second list to compare.
msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])

assertMultiLineEqual(first, second, msg=None): Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)

assertNotEqual(first, second, msg=None): Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)

assertNotIn(member, container, msg=None): Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None): Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)

assertNotRegex(text, unexpected_regex, msg=None): Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)

assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:

expected_exception – Exception class expected to be raised.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None): Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:

seq1 – The first sequence to compare.
seq2 – The second sequence to compare.
seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:

set1 – The first set to compare.
set2 – The second set to compare.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)

assertTrue(expr, msg=None): Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:

tuple1 – The first tuple to compare.
tuple2 – The second tuple to compare.
msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)

assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:

expected_warning – Warning class expected to be triggered.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated(): Remove the directories that are used for testing.

countTestCases()

createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:

df (pd.DataFrame) – data frame containing the dataset
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
prep (dict) – dictionary containing preparation settings
n_jobs (int) – number of jobs to use for parallel processing
chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug(): Run the test without collecting errors in a TestResult

defaultTestResult()

classmethod doClassCleanups(): Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups(): Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm): Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None): Fail immediately, with the given message.

failureException: alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:: list of DescriptorCalculator objects
Return type:: list

getBigDF()

Get a large data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:: a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)
Return type:: grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:: list of created DescriptorCalculator objects
Return type:: list

static getDefaultPrep(add_imputer=None): Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:: list of `list`s of all possible combinations of preparation
Return type:: list

getSmallDF()

Get a small data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)

id()

longMessage = True

maxDiff = 640

run(result=None)

setUp()[source]: Create a small dataset.

classmethod setUpClass(): Hook method for setting up class fixture before running tests in the class.

setUpPaths(): Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason): Skip this test.

subTest(msg=<object object>, **params): Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown(): Remove all files and directories that are used for testing.

classmethod tearDownClass(): Hook method for deconstructing the class fixture after running all tests in the class.

testScaffoldAdd = None

testScaffoldAdd_0_Murcko(**kw): Test the adding and getting of scaffolds [with _=’Murcko’, scaffold=<qsprpred.data.chem.scaffolds.Be…oRDKit object at 0x7f9566358550>].

testScaffoldAdd_1_BemisMurcko(**kw): Test the adding and getting of scaffolds [with _=’BemisMurcko’, scaffold=<qsprpred.data.chem.scaffolds.Be…Murcko object at 0x7f95667334d0>].

testScaffoldAdd_2_BemisMurckoCSK(**kw): Test the adding and getting of scaffolds [with _=’BemisMurckoCSK’, scaffold=<qsprpred.data.chem.scaffolds.Be…Murcko object at 0x7f9566358690>].

testScaffoldAdd_3_BemisMurckoJustCSK(**kw): Test the adding and getting of scaffolds [with _=’BemisMurckoJustCSK’, scaffold=<qsprpred.data.chem.scaffolds.Be…Murcko object at 0x7f95663587d0>].

testScaffoldAdd_4_BemisMurckoOff(**kw): Test the adding and getting of scaffolds [with _=’BemisMurckoOff’, scaffold=<qsprpred.data.chem.scaffolds.Be…Murcko object at 0x7f9566350050>].

class qsprpred.data.chem.tests.TestStandardizers(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase

Test the standardizers.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs): Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:

typeobj – The data type to call this function on when both values are of the same type in assertEqual().
function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),
Counter(list(second)))

Example:

[0, 1, 1] and [1, 0, 1] compare equal.

[0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)

assertEndsWith(s, suffix, msg=None)

assertEqual(first, second, msg=None): Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None): Check that the expression is false.

assertGreater(a, b, msg=None): Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None): Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)

assertIn(member, container, msg=None): Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None): Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None): Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None): Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None): Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None): Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)

assertLess(a, b, msg=None): Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None): Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:

list1 – The first list to compare.
list2 – The second list to compare.
msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])

assertMultiLineEqual(first, second, msg=None): Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)

assertNotEqual(first, second, msg=None): Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)

assertNotIn(member, container, msg=None): Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None): Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)

assertNotRegex(text, unexpected_regex, msg=None): Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)

assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)

assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:

expected_exception – Exception class expected to be raised.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None): Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:

seq1 – The first sequence to compare.
seq2 – The second sequence to compare.
seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:

set1 – The first set to compare.
set2 – The second set to compare.
msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)

assertTrue(expr, msg=None): Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:

tuple1 – The first tuple to compare.
tuple2 – The second tuple to compare.
msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)

assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:

expected_warning – Warning class expected to be triggered.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated(): Remove the directories that are used for testing.

countTestCases()

createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:

df (pd.DataFrame) – data frame containing the dataset
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
prep (dict) – dictionary containing preparation settings
n_jobs (int) – number of jobs to use for parallel processing
chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug(): Run the test without collecting errors in a TestResult

defaultTestResult()

classmethod doClassCleanups(): Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups(): Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm): Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None): Fail immediately, with the given message.

failureException: alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:: list of DescriptorCalculator objects
Return type:: list

getBigDF()

Get a large data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:: a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)
Return type:: grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:: list of created DescriptorCalculator objects
Return type:: list

static getDefaultPrep(add_imputer=None): Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:: list of `list`s of all possible combinations of preparation
Return type:: list

getSmallDF()

Get a small data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)

id()

longMessage = True

maxDiff = 640

run(result=None)

setUp()[source]: Hook method for setting up the test fixture before exercising it.

classmethod setUpClass(): Hook method for setting up class fixture before running tests in the class.

setUpPaths(): Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason): Skip this test.

subTest(msg=<object object>, **params): Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown(): Remove all files and directories that are used for testing.

classmethod tearDownClass(): Hook method for deconstructing the class fixture after running all tests in the class.

testInvalidFilter()[source]: Test the invalid filter.

qsprpred.data.chem package

Subpackages

Submodules

qsprpred.data.chem.clustering module

qsprpred.data.chem.identifiers module

qsprpred.data.chem.matching module

qsprpred.data.chem.scaffolds module

qsprpred.data.chem.tests module

Module contents