qsprpred.data.sampling package

Submodules

qsprpred.data.sampling.splits module

Different splitters to create train and tests for evalutating QSPR model performance.

To add a new data splitter: * Add a DataSplit subclass for your new splitter

class qsprpred.data.sampling.splits.BootstrapSplit(split: DataSplit, n_bootstraps=5, seed=None, dataset=None)[source]

Bases: DataSplit, Randomized, DataSetDependent

Splits dataset in random train and test subsets (bootstraps). Unlike cross-validation, bootstrapping allows for repeated samples in the test set.

Variables:
  • nBootstraps (int) – number of bootstraps to perform

  • seed (int) – Random state to use for shuffling and other random operations.

Initialize a BootstrapSplit object.

Parameters:
  • split (DataSplit) – the splitter to use for the bootstraps

  • n_bootstraps (int) – number of bootstraps to perform

  • seed (int) – random seed to use for random operations

  • dataset (QSPRDataSet) – dataset for the underlying splitter if it is DataSetDependent

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

property randomState: int

Get the random state for the object.

setDataSet(dataset)[source]

Set the dataset for the underlying splitter.

split(X: ndarray | DataFrame, y: ndarray | DataFrame | Series) Iterable[tuple[list[int], list[int]]][source]

Split the given data into nBootstraps training and test sets.

Parameters:
  • X (np.ndarray | pd.DataFrame) – the input data matrix

  • y (np.ndarray | pd.DataFrame | pd.Series) – the target variable(s)

Returns:

an generator over nBootstraps tuples generated by the underlying splitter

splitDataset(dataset: QSPRDataSet)
toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.data.sampling.splits.ClusterSplit(test_fraction: float = 0.1, n_folds: int = 1, custom_test_list: list[str] | None = None, seed: int | None = None, clustering: MoleculeClusters | None = None, data_set: QSPRDataSet | None = None, **split_kwargs)[source]

Bases: GBMTDataSplit, Randomized

Splits dataset into balanced train and test subsets based on clusters of similar molecules.

Variables:
  • testFraction (float) – fraction of total dataset to testset

  • customTestList (list) – list of molecule indexes to force in test set

  • seed (int) – Random state to use for shuffling and other random operations.

  • split_kwargs (dict) – additional arguments to be passed to the GloballyBalancedSplit

Initialize a GBMTDataSplit object.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

property randomState: int

Get the random state for the object.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

split(X: ndarray | DataFrame, y: ndarray | DataFrame | Series) Iterable[tuple[list[int], list[int]]]

Split dataset into balanced train and test subsets based on an initial clustering algorithm.

Parameters:
  • X (np.ndarray | pd.DataFrame) – the input data matrix

  • y (np.ndarray | pd.DataFrame | pd.Series) – the target variable(s)

Returns:

an generator over the generated subsets represented as a tuple of (train_indices, test_indices) where the indices are the row indices of the input data matrix

splitDataset(dataset: QSPRDataSet)
toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.data.sampling.splits.DataSplit[source]

Bases: JSONSerializable, ABC

Defines a function to split a dataframe into train and test set.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

abstract split(X: ndarray | DataFrame, y: ndarray | DataFrame | Series) Iterable[tuple[list[int], list[int]]][source]

Split the given data into one or multiple train/test subsets.

These classes handle partitioning of a feature matrix by returning an generator of train and test indices. It is compatible with the approach taken in the sklearn package (see sklearn.model_selection._BaseKFold). This can be used for both cross-validation or a one time train/test split.

Parameters:
  • X (np.ndarray | pd.DataFrame) – the input data matrix

  • y (np.ndarray | pd.DataFrame | pd.Series) – the target variable(s)

Returns:

an generator over the generated subsets represented as a tuple of (train_indices, test_indices) where the indices are the row indices of the input data matrix X (note that these are integer indices, rather than a pandas index!)

splitDataset(dataset: QSPRDataSet)[source]
toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.data.sampling.splits.GBMTDataSplit(clustering: ~qsprpred.data.chem.clustering.MoleculeClusters = <qsprpred.data.chem.clustering.FPSimilarityMaxMinClusters object>, test_fraction: float = 0.1, n_folds: int = 1, custom_test_list: list[str] | None = None, data_set: ~qsprpred.data.tables.interfaces.qspr_data_set.QSPRDataSet | None = None, **split_kwargs)[source]

Bases: DataSplit, DataSetDependent

Splits dataset into balanced train and test subsets based on an initial clustering algorithm. If nFolds is specified, the determined clusters will be split into nFolds groups of approximately equal size, and the splits will be generated by leaving out one group at a time.

More information on the GBMT algorithm can be found at:

https://github.com/CDDLeiden/gbmt-splits

Variables:
  • clustering (MoleculeClusters) – clustering algorithm to use

  • testFraction (float) – fraction of total dataset to test set, ignored if nFolds > 1

  • nFolds (int) – number of folds to split the dataset into (this overrides testFraction and customTestList)

  • customTestList (list) – list of molecule indexes to force in test set, ignored if nFolds > 1

  • splitKwargs (dict) – additional arguments to be passed to the GloballyBalancedSplit

Initialize a GBMTDataSplit object.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

split(X: ndarray | DataFrame, y: ndarray | DataFrame | Series) Iterable[tuple[list[int], list[int]]][source]

Split dataset into balanced train and test subsets based on an initial clustering algorithm.

Parameters:
  • X (np.ndarray | pd.DataFrame) – the input data matrix

  • y (np.ndarray | pd.DataFrame | pd.Series) – the target variable(s)

Returns:

an generator over the generated subsets represented as a tuple of (train_indices, test_indices) where the indices are the row indices of the input data matrix

splitDataset(dataset: QSPRDataSet)
toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.data.sampling.splits.GBMTRandomSplit(test_fraction: float = 0.1, n_folds: int = 1, seed: int | None = None, n_initial_clusters: int | None = None, custom_test_list: list[str] | None = None, data_set: QSPRDataSet | None = None, **split_kwargs)[source]

Bases: GBMTDataSplit, Randomized

Splits dataset into balanced random train and test subsets.

Variables:
  • testFraction (float) – fraction of total dataset to testset

  • customTestList (list) – list of molecule indexes to force in test set

  • split_kwargs (dict) – additional arguments to be passed to the GloballyBalancedSplit

Initialize a GBMTDataSplit object.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

property randomState: int

Get the random state for the object.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

split(X: ndarray | DataFrame, y: ndarray | DataFrame | Series) Iterable[tuple[list[int], list[int]]]

Split dataset into balanced train and test subsets based on an initial clustering algorithm.

Parameters:
  • X (np.ndarray | pd.DataFrame) – the input data matrix

  • y (np.ndarray | pd.DataFrame | pd.Series) – the target variable(s)

Returns:

an generator over the generated subsets represented as a tuple of (train_indices, test_indices) where the indices are the row indices of the input data matrix

splitDataset(dataset: QSPRDataSet)
toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.data.sampling.splits.ManualSplit(splitprop: str | list, trainval: str, testval: str, data_set: QSPRDataSet | None = None)[source]

Bases: DataSplit, DataSetDependent

Splits dataset in train and test subsets based on a column in the dataframe.

Variables:
  • splitProp (str | list) – name(s) of the column(s) in the dataset that contains the split

  • trainVal (str) – value in splitcol that will be used for training

  • testVal (str) – value in splitcol that will be used for testing

Raises:

ValueError – if there are more values in splitcol than trainval and testval

Initialize the ManualSplit object with the splitcol, trainval and testval attributes.

One or more columns can be provided in splitprop to generate multiple splits, e.g. like cross-validation.

Parameters:
  • splitprop (str | list) – name(s) of the column(s) in the dataset that contain(s) the split

  • trainval (str) – value in a splitprop that will be used for training

  • testval (str) – value in splitprop that will be used for testing

  • data_set (QSPRDataSet) – dataset that this splitter will be acting on

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

split(X, y)[source]

Split the given data into one or multiple train/test subsets based on the predefined splitprop(s).

Parameters:
  • X (np.ndarray | pd.DataFrame) – the input data matrix

  • y (np.ndarray | pd.DataFrame | pd.Series) – the target variable(s)

Returns:

an generator over the generated subsets represented as a tuple of (train_indices, test_indices) where the indices are the row indices of the input data matrix

splitDataset(dataset: QSPRDataSet)
toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.data.sampling.splits.RandomSplit(test_fraction=0.1, seed: int | None = None)[source]

Bases: DataSplit, Randomized

Splits dataset in random train and test subsets.

Variables:
  • testFraction (float) – fraction of total dataset to testset

  • seed (int) – Random state to use for shuffling and other random operations.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

property randomState: int

Get the random state for the object.

split(X, y)[source]

Split the given data into one or multiple train/test subsets.

These classes handle partitioning of a feature matrix by returning an generator of train and test indices. It is compatible with the approach taken in the sklearn package (see sklearn.model_selection._BaseKFold). This can be used for both cross-validation or a one time train/test split.

Parameters:
  • X (np.ndarray | pd.DataFrame) – the input data matrix

  • y (np.ndarray | pd.DataFrame | pd.Series) – the target variable(s)

Returns:

an generator over the generated subsets represented as a tuple of (train_indices, test_indices) where the indices are the row indices of the input data matrix X (note that these are integer indices, rather than a pandas index!)

splitDataset(dataset: QSPRDataSet)
toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.data.sampling.splits.ScaffoldSplit(scaffold: ~qsprpred.data.chem.scaffolds.Scaffold = <qsprpred.data.chem.scaffolds.BemisMurckoRDKit object>, test_fraction: float = 0.1, n_folds: int = 1, custom_test_list: list | None = None, data_set: ~qsprpred.data.tables.interfaces.qspr_data_set.QSPRDataSet | None = None, **split_kwargs)[source]

Bases: GBMTDataSplit

Splits dataset into balanced train and test subsets based on molecular scaffolds.

Variables:
  • testFraction (float) – fraction of total dataset to testset

  • customTestList (list) – list of molecule indexes to force in test set

  • split_kwargs (dict) – additional arguments to be passed to the GloballyBalancedSplit

Initialize a GBMTDataSplit object.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

split(X: ndarray | DataFrame, y: ndarray | DataFrame | Series) Iterable[tuple[list[int], list[int]]]

Split dataset into balanced train and test subsets based on an initial clustering algorithm.

Parameters:
  • X (np.ndarray | pd.DataFrame) – the input data matrix

  • y (np.ndarray | pd.DataFrame | pd.Series) – the target variable(s)

Returns:

an generator over the generated subsets represented as a tuple of (train_indices, test_indices) where the indices are the row indices of the input data matrix

splitDataset(dataset: QSPRDataSet)
toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

class qsprpred.data.sampling.splits.TemporalSplit(timesplit: float | list[float], timeprop: str, data_set: QSPRDataSet | None = None)[source]

Bases: DataSplit, DataSetDependent

Splits dataset train and test subsets based on a threshold in time.

Variables:
  • timeSplit (float) – time point after which sample to test set

  • timeCol (str) – name of the column within the dataframe with timepoints

Initialize a TemporalSplit object.

Parameters:
  • timesplit (float | list[float]) – time point after which sample is moved to test set. If a list is provided, the splitter will split the dataset into multiple subsets based on the timepoints in the list.

  • timeprop (str) – name of the column within the dataset with timepoints

  • dataset (QSPRDataSet) – dataset that this splitter will be acting on

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDataSet() QSPRDataSet

Get the data set attached to this object.

Returns:

The data set attached to this object

Return type:

QSPRDataSet

Raises:

ValueError – If no data set is attached to this object.

property hasDataSet: bool

Indicates if this object has a data set attached to it.

setDataSet(dataset: QSPRDataSet | None) None

Set the data set for this object.

split(X, y)[source]

Split single-task dataset based on a time threshold.

Parameters:
  • X (np.ndarray | pd.DataFrame) – the input data matrix

  • y (np.ndarray | pd.DataFrame | pd.Series) – the target variable(s)

Returns:

an generator over the generated subsets represented as a tuple of (train_indices, test_indices) where the indices are the row indices of the input data matrix

splitDataset(dataset: QSPRDataSet)
toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

qsprpred.data.sampling.tests module

class qsprpred.data.sampling.tests.TestDataSplitters(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase, DataPrepCheckMixIn

Small tests to only check if the data splitters work on their own.

The tests here should be used to check for all their specific parameters and edge cases.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

checkDescriptors(dataset: QSPRDataSet, target_props: list[dict | TargetSpec])

Check if information about descriptors is consistent in the data set. Checks if calculators are consistent with the descriptors contained in the data set. This is tested also before and after serialization.

Parameters:
  • dataset (QSPRDataSet) – The data set to check.

  • target_props (List of dicts or TargetProperty) – list of target properties

Raises:

AssertionError – If the consistency check fails.

checkFeatures(X_train, y_train, X_test=None, y_test=None)

Check if features matrices are the correct type and shape and if the indices are consistent between features and targets. Also check if there is no overlap between the train and test indices if both are provided.

checkPrep(dataset: QSPRDataSet, pipeline: DatasetPipeline, split: DataSplit | None = None)

Check if the data preparation is consistent before and after reloading

checkSplit(dataset: QSPRDataSet, name: str)

Check if the split has the data it should have after splitting.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Hook method for setting up the test fixture before exercising it.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testClusterSplit = None
testClusterSplit_0(**kw)

Test the cluster split function [with multitask=False, clustering_algorithm=<qsprpred.data.chem.clustering.F…usters object at 0x7f9561e32350>, custom_test_list=None].

testClusterSplit_1(**kw)

Test the cluster split function [with multitask=False, clustering_algorithm=<qsprpred.data.chem.clustering.F…usters object at 0x7f9561e32490>, custom_test_list=[‘ClusterSplit_storage_library_0…usterSplit_storage_library_001’]].

testClusterSplit_2(**kw)

Test the cluster split function [with multitask=True, clustering_algorithm=<qsprpred.data.chem.clustering.F…usters object at 0x7f9562fafa80>, custom_test_list=None].

testClusterSplit_3(**kw)

Test the cluster split function [with multitask=True, clustering_algorithm=<qsprpred.data.chem.clustering.F…usters object at 0x7f9561e31e50>, custom_test_list=[‘ClusterSplit_storage_library_0…usterSplit_storage_library_001’]].

testManualSplit()[source]

Test the manual split function, where the split is done manually.

testRandomSplit = None
testRandomSplit_0(**kw)

Test the random split function [with multitask=False].

testRandomSplit_1(**kw)

Test the random split function [with multitask=True].

testScaffoldSplit = None
testScaffoldSplit_0(**kw)

Test the scaffold split function [with multitask=False, scaffold=<qsprpred.data.chem.scaffolds.Be…oRDKit object at 0x7f9562faf490>, custom_test_list=None].

testScaffoldSplit_1(**kw)

Test the scaffold split function [with multitask=False, scaffold=<qsprpred.data.chem.scaffolds.Be…Murcko object at 0x7f9562faf5c0>, custom_test_list=[’ScaffoldSplit_storage_library_…ffoldSplit_storage_library_001’]].

testScaffoldSplit_2(**kw)

Test the scaffold split function [with multitask=True, scaffold=<qsprpred.data.chem.scaffolds.Be…oRDKit object at 0x7f95611d45f0>, custom_test_list=None].

testSerialization()[source]

Test the serialization of dataset with datasplit.

testTemporalSplit = None
testTemporalSplit_0(**kw)

Test the temporal split function, where the split is done based on a time [with multitask=False] property.

testTemporalSplit_1(**kw)

Test the temporal split function, where the split is done based on a time [with multitask=True] property.

class qsprpred.data.sampling.tests.TestFoldSplitters(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase

Small tests to only check if the fold splitters work on their own.

The tests here should be used to check for all their specific parameters and edge cases.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Hook method for setting up the test fixture before exercising it.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testBootstrappedFold()[source]
testStandardFolds()[source]

Test the default fold generator, which is a 5-fold cross validation.

validateFolds(folds, more=None)[source]

Check if the folds have the data they should have after splitting.

Module contents