qsprpred.extra.data.utils.testing package

Submodules

qsprpred.extra.data.utils.testing.path_mixins module

class qsprpred.extra.data.utils.testing.path_mixins.DataSetsMixInExtras[source]

Bases: DataSetsPathMixIn

MixIn class for testing data sets in extras.

clearGenerated()

Remove the directories that are used for testing.

createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • preparation_settings (dict) – dictionary containing preparation settings

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)[source]

Create a small dataset for testing purposes.

Parameters:
  • name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.

  • target_props (list[TargetProperty] | list[dict], optional) – target properties.

  • preparation_settings (dict | None, optional) – preparation settings. Defaults to None.

  • protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.

  • random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

classmethod getAllDescriptors() list[qsprpred.data.descriptors.sets.DescriptorSet][source]

Return a list of all available molecule descriptor sets.

Returns:

list of MoleculeDescriptorSet objects

Return type:

list

classmethod getAllProteinDescriptors() list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet][source]

Return a list of all available protein descriptor sets.

Returns:

list of ProteinDescriptorSet objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()[source]

Return the default descriptor calculator combo.

static getDefaultPrep()

Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)[source]
getPCMDF() DataFrame[source]

Return a test dataframe with PCM data.

Returns:

dataframe with PCM data

Return type:

pd.DataFrame

getPCMSeqProvider() Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]][source]

Return a function that provides sequences for given accessions.

Returns:

function that provides sequences for given accessions

Return type:

Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() DataFrame[source]

Return a test dataframe with PCM targets and their sequences.

Returns:

dataframe with PCM targets and their sequences

Return type:

pd.DataFrame

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

setUpPaths()[source]

Create the directories that are used for testing.

tearDown()

Remove all files and directories that are used for testing.

validate_split(dataset)

Check if the split has the data it should have after splitting.

Module contents