qsprpred.extra.data.utils.testing package

Submodules

qsprpred.extra.data.utils.testing.path_mixins module

class qsprpred.extra.data.utils.testing.path_mixins.DataSetsMixInExtras[source]

Bases: DataSetsPathMixIn

MixIn class for testing data sets in extras.

clearGenerated(): Remove the directories that are used for testing.

createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
preparation_settings (dict) – dictionary containing preparation settings
random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)

Create a large dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)[source]

Create a small dataset for testing purposes.

Parameters:

name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.
target_props (list[TargetProperty] | list[dict], optional) – target properties.
preparation_settings (dict | None, optional) – preparation settings. Defaults to None.
protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.
random_state (int, optional) – random seed to use in the dataset. Defaults to None

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)

Create a small dataset for testing purposes.

Parameters:

name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)

Create a dataset for testing purposes from the given data frame.

Parameters:

df (pd.DataFrame) – data frame containing the dataset
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
prep (dict) – dictionary containing preparation settings

Returns:

a QSPRDataset object

Return type:

QSPRDataset

classmethod getAllDescriptors() → list[qsprpred.data.descriptors.sets.DescriptorSet][source]

Return a list of all available molecule descriptor sets.

Returns:: list of MoleculeDescriptorSet objects
Return type:: list

classmethod getAllProteinDescriptors() → list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet][source]

Return a list of all available protein descriptor sets.

Returns:: list of ProteinDescriptorSet objects
Return type:: list

getBigDF()

Get a large data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:: a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)
Return type:: grid

classmethod getDefaultCalculatorCombo()[source]: Return the default descriptor calculator combo.

static getDefaultPrep(): Return a dictionary with default preparation settings.

classmethod getMSAProvider(out_dir: str)[source]

getPCMDF() → DataFrame[source]

Return a test dataframe with PCM data.

Returns:: dataframe with PCM data
Return type:: pd.DataFrame

getPCMSeqProvider() → Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]][source]

Return a function that provides sequences for given accessions.

Returns:: function that provides sequences for given accessions
Return type:: Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]

getPCMTargetsDF() → DataFrame[source]

Return a test dataframe with PCM targets and their sequences.

Returns:: dataframe with PCM targets and their sequences
Return type:: pd.DataFrame

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:: list of `list`s of all possible combinations of preparation
Return type:: list

getSmallDF()

Get a small data frame for testing purposes.

Returns:: a pandas.DataFrame containing the dataset
Return type:: pd.DataFrame

setUpPaths()[source]: Create the directories that are used for testing.

tearDown(): Remove all files and directories that are used for testing.

validate_split(dataset): Check if the split has the data it should have after splitting.

qsprpred.extra.data.utils.testing package

Submodules

qsprpred.extra.data.utils.testing.path_mixins module

Module contents