qsprpred.extra.data.tables package
Submodules
qsprpred.extra.data.tables.pcm module
- class qsprpred.extra.data.tables.pcm.PCMDataSet(name: str, protein_col: str, target_props: list[qsprpred.tasks.TargetProperty | dict], df: DataFrame | None = None, smiles_col: str = 'SMILES', protein_seq_provider: Callable | None = None, add_rdkit: bool = False, store_dir: str = '.', overwrite: bool = False, n_jobs: int | None = 1, chunk_size: int | None = None, drop_invalids: bool = True, drop_empty: bool = True, index_cols: list[str] | None = None, autoindex_name: str = 'QSPRID', random_state: int | None = None, store_format: str = 'pkl')[source]
Bases:
QSPRDataset
Extension of
QSARDataset
for PCM modelling.It allows specification of a column with protein identifiers and the calculation of protein descriptors.
- Variables:
proteinCol (str) – name of column in df containing the protein target identifier (usually a UniProt ID) to use for protein descriptors for PCM modelling and other protein related tasks.
proteinSeqProvider (Callable) – function that takes a list of protein identifiers and returns a
dict
mapping those identifiers to their sequences. Defaults toNone
.
Construct a data set to handle PCM data.
- Parameters:
name (str) – data name, used in saving the data
protein_col (str) – name of column in df containing the protein target identifier (usually a UniProt ID) to use for protein descriptors for PCM modelling and other protein related tasks.
protein_seq_provider – Callable = None, optional): function that takes a list of protein identifiers and returns a
dict
mapping those identifiers to their sequences. Defaults toNone
.target_props (list[TargetProperty | dict]) – target properties, names should correspond with target column name in
df
df (pd.DataFrame, optional) – input dataframe containing smiles and target property. Defaults to
None
.smiles_col (str, optional) – name of column in
df
containing SMILES. Defaults to “SMILES”.add_rdkit (bool, optional) – if
True
, column with rdkit molecules will be added todf
. Defaults toFalse
.store_dir (str, optional) – directory for saving the output data. Defaults to ‘.’.
overwrite (bool, optional) – if
True
, existing data will be overwritten. Defaults toFalse
.n_jobs (int, optional) – number of parallel jobs. If <= 0, all available cores will be used. Defaults to 1.
chunk_size (int, optional) – chunk size for parallel processing. Defaults to 50.
drop_invalids (bool, optional) – If
True
, invalid SMILES will be dropped. Defaults toTrue
.drop_empty (bool, optional) – If
True
, rows with empty SMILES will be dropped. Defaults toTrue
.index_cols (List[str], optional) – columns to be used as index in the dataframe. Defaults to
None
in which case a custom ID will be generated.autoindex_name (str, optional) – Column name to use for automatically generated IDs.
random_state (int, optional) – random state for reproducibility. Defaults to
None
.store_format – format to use for storing the data (‘pkl’ or ‘csv’).
- Raises:
ValueError – Raised if threshold given with non-classification task.
- addClusters(clusters: list['MoleculeClusters'], recalculate: bool = False)
Add clusters to the data frame.
A new column is created that contains the identifier of the corresponding cluster calculator.
- Parameters:
clusters (list) – list of
MoleculeClusters
calculators.recalculate (bool) – Whether to recalculate clusters even if they are already present in the data frame.
- addDescriptors(descriptors: list[qsprpred.data.descriptors.sets.DescriptorSet | qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet], recalculate: bool = False, featurize: bool = True, *args, **kwargs)[source]
Add descriptors to the data set.
If descriptors are already present, they will be recalculated if
recalculate
isTrue
. Featurization will be performed after adding descriptors iffeaturize
isTrue
. Featurization converts current data matrices to pure numeric matrices of selected descriptors (features).- Parameters:
descriptors (list[DescriptorSet]) – list of descriptor sets to add
recalculate (bool, optional) – whether to recalculate descriptors if they are already present. Defaults to
False
.featurize (bool, optional) – whether to featurize the data set splits after adding descriptors. Defaults to
True
.*args – additional positional arguments to pass to each descriptor set
**kwargs – additional keyword arguments to pass to each descriptor set
- addFeatures(feature_calculators: list[qsprpred.data.descriptors.sets.DescriptorSet], recalculate: bool = False)
Add features to the data set.
- Parameters:
feature_calculators (list[DescriptorSet]) – list of feature calculators to add. Defaults to None.
recalculate (bool) – if True, recalculate features even if they are already present in the data set. Defaults to False.
- addScaffolds(scaffolds: list[qsprpred.data.chem.scaffolds.Scaffold], add_rdkit_scaffold: bool = False, recalculate: bool = False)
Add scaffolds to the data frame.
A new column is created that contains the SMILES of the corresponding scaffold. If
add_rdkit_scaffold
is set toTrue
, a new column is created that contains the RDKit scaffold of the corresponding molecule.
- apply(func: Callable[[dict[str, list[Any]] | DataFrame, ...], Any], func_args: tuple[Any] | None = None, func_kwargs: dict[str, Any] | None = None, on_props: list[str] | None = None, as_df: bool = False, chunk_size: int | None = None, n_jobs: int | None = None) Generator
Apply a function to the data frame. The properties of the data set are passed as the first positional argument to the function. This will be a dictionary of the form
{'prop1': [...], 'prop2': [...], ...}
. Ifas_df
isTrue
, the properties will be passed as a data frame instead.Any additional arguments specified in
func_args
andfunc_kwargs
will be passed to the function after the properties as positional and keyword arguments, respectively.If
on_props
is specified, only the properties in this list will be passed to the function. Ifon_props
isNone
, all properties will be passed to the function.- Parameters:
func (Callable) – Function to apply to the data frame.
func_args (list) – Positional arguments to pass to the function.
func_kwargs (dict) – Keyword arguments to pass to the function.
on_props (list[str]) – list of properties to send to the function as arguments
as_df (bool) – If
True
, the function is applied to chunks represented as data frames.chunk_size (int) – Size of chunks to use per job in parallel processing. If
None
, the chunk size will be set toself.chunkSize
. The chunk size will always be set to the number of rows in the data frame ifn_jobs
or `self.nJobs is 1.n_jobs (int) – Number of jobs to use for parallel processing. If
None
,self.nJobs
is used.
- Returns:
Generator that yields the results of the function applied to each chunk of the data frame as determined by
chunk_size
andn_jobs
. Each item in the generator will be the result of the function applied to one chunk of the data set.- Return type:
Generator
- attachDescriptors(calculator: DescriptorSet, descriptors: DataFrame, index_cols: list)
Attach descriptors to the data frame.
- Parameters:
calculator (DescriptorsCalculator) – DescriptorsCalculator object to use for descriptor calculation.
descriptors (pd.DataFrame) – DataFrame containing the descriptors to attach.
index_cols (list) – List of column names to use as index.
- checkFeatures()
Check consistency of features and descriptors.
- checkMols(throw: bool = True)
Returns a boolean array indicating whether each molecule is valid or not. If
throw
isTrue
, an exception is thrown if any molecule is invalid.- Parameters:
throw (bool) – Whether to throw an exception if any molecule is invalid.
- Returns:
Boolean series indicating whether each molecule is valid.
- Return type:
mask (pd.Series)
- clearFiles()
Remove all files associated with this data set from disk.
- createScaffoldGroups(mols_per_group: int = 10)
Create scaffold groups.
A scaffold group is a list of molecules that share the same scaffold. New columns are created that contain the scaffold group ID and the scaffold group size.
- Parameters:
mols_per_group (int) – number of molecules per scaffold group.
- property descriptorSets
Get the descriptor calculators for this table.
- dropDescriptorSets(descriptors: list[qsprpred.data.descriptors.sets.DescriptorSet | str], full_removal: bool = False)
Drop descriptors from the given sets from the data frame.
- Parameters:
descriptors (list[DescriptorSet | str]) – List of
DescriptorSet
objects or their names. Name of a descriptor set corresponds to the result returned by its__str__
method.full_removal (bool) – Whether to remove the descriptor data (will perform full removal). By default, a soft removal is performed by just rendering the descriptors inactive. A full removal will remove the descriptorSet from the dataset, including the saved files. It is not possible to restore a descriptorSet after a full removal.
- dropDescriptors(descriptors: list[str])
Drop descriptors by name. Performs a simple feature selection by removing the given descriptor names from the data set.
- dropEmptyProperties(names: list[str])
Drop rows with empty target property value from the data set.
- dropEmptySmiles()
Drop rows with empty SMILES from the data set.
- dropInvalids()
Drops invalid molecules from the data set.
- Returns:
- Boolean mask of invalid molecules in the original
data set.
- Return type:
mask (pd.Series)
- dropOutliers()
Drop outliers from the test set based on the applicability domain.
- featurize(update_splits=True)
- featurizeSplits(shuffle: bool = True, random_state: int | None = None)
If the data set has descriptors, load them into the train and test splits.
If no descriptors are available, remove all features from the splits. They will become zero length along the feature axis (columns), but will retain their original length along the sample axis (rows). This is useful for the case where the data set has no descriptors, but the user wants to retain train and test splits.
shuffle (bool): whether to shuffle the training and test sets random_state (int): random state for shuffling
- fillMissing(fill_value: float, columns: list[str] | None = None)
Fill missing values in the data set with a given value.
- filter(table_filters: list[Callable])
Filter the data set using the given filters.
- Parameters:
table_filters (list[Callable]) – list of filters to apply
- filterFeatures(feature_filters: list[Callable])
Filter features in the data set.
- Parameters:
feature_filters (list[Callable]) – list of feature filter functions that take X feature matrix and y target vector as arguments
- classmethod fromFile(filename: str) PandasDataTable
Load a
StoredTable
object from a file.- Parameters:
filename (str) – The name of the file to load the object from.
- Returns:
The
StoredTable
object itself.
- static fromMolTable(mol_table: MoleculeTable, protein_col: str, target_props: list[qsprpred.tasks.TargetProperty | dict] | None = None, name: str | None = None, **kwargs) PCMDataSet [source]
Construct a data set to handle PCM data from a
MoleculeTable
.- Parameters:
mol_table (MoleculeTable) –
MoleculeTable
instance containing the PCM data.protein_col (str) – name of column in df containing the protein target identifier (usually a UniProt ID) to use for protein descriptors for PCM modelling and other protein related tasks.
target_props (list[TargetProperty | dict], optional) – target properties, names should correspond with target column name in
df
name (str, optional) – data name, used in saving the data. Defaults to
None
.**kwargs – keyword arguments to be passed to the
PCMDataset
constructor.
- Returns:
PCMDataset
instance containing the PCM data.- Return type:
- static fromSDF(name, filename, smiles_prop, *args, **kwargs)[source]
Create QSPRDataset from SDF file.
It is currently not implemented for QSPRDataset, but you can convert from ‘MoleculeTable’ with the ‘fromMolTable’ method.
- static fromSMILES(name: str, smiles: list, *args, **kwargs)
Create a
MoleculeTable
instance from a list of SMILES sequences.- Parameters:
name (str) – Name of the data set.
smiles (list) – list of SMILES sequences.
*args – Additional arguments to pass to the
MoleculeTable
constructor.**kwargs – Additional keyword arguments to pass to the
MoleculeTable
constructor.
- static fromTableFile(name: str, filename: str, sep: str = '\t', *args, **kwargs)
Create QSPRDataset from table file (i.e. CSV or TSV).
- Parameters:
- Returns:
QSPRDataset
object- Return type:
- generateDescriptorDataSetName(ds_set: str | DescriptorSet)
Generate a descriptor set name from a descriptor set.
- generateIndex(name: str | None = None, prefix: str | None = None)
Generate a custom index for the data frame automatically.
- getApplicability()
Get applicability predictions for the test set.
- getClusterNames(clusters: list['MoleculeClusters'] | None = None)
Get the names of the clusters in the data frame.
- Returns:
List of cluster names.
- Return type:
- getClusters(clusters: list['MoleculeClusters'] | None = None)
Get the subset of the data frame that contains only clusters.
- Returns:
Data frame containing only clusters.
- Return type:
pd.DataFrame
- getDF()
Get the data frame this instance manages.
- Returns:
The data frame this instance manages.
- Return type:
pd.DataFrame
- getDescriptorNames()
Get the names of the descriptors present for molecules in this data set.
- Returns:
list of descriptor names.
- Return type:
- getDescriptors(active_only=False)
Get the calculated descriptors as a pandas data frame.
- Returns:
Data frame containing only descriptors.
- Return type:
pd.DataFrame
- getFeatures(inplace: bool = False, concat: bool = False, raw: bool = False, ordered: bool = False, refit_standardizer: bool = True)
Get the current feature sets (training and test) from the dataset.
This method also applies any feature standardizers that have been set on the dataset during preparation. Outliers are dropped from the test set if they are present, unless
concat
isTrue
.- Parameters:
inplace (bool) – If
True
, the created feature matrices will be saved to the dataset object itself as ‘X’ and ‘X_ind’ attributes. Note that this will overwrite any existing feature matrices and if the data preparation workflow changes, these are not kept up to date. Therefore, it is recommended to generate new feature sets after any data set changes.concat (bool) – If
True
, the training and test feature matrices will be concatenated into a single matrix. This is useful for training models that do not require separate training and test sets (i.e. the final optimized models).raw (bool) – If
True
, the raw feature matrices will be returned without any standardization applied.ordered (bool) – If
True
, the returned feature matrices will be ordered according to the original order of the data set. This is only relevant ifconcat
isTrue
.refit_standardizer (bool) – If
True
, the feature standardizer will be refit on the training set upon this call. IfFalse
, the previously fitted standardizer will be used. Defaults toTrue
. UseFalse
if this dataset is used for prediction only and the standardizer has been initialized already.
- getProperties() list[str]
Get names of all properties/variables saved in the data frame (all columns).
- Returns:
list of property names.
- Return type:
- getProperty(name: str) Series
Get property values from the data set.
- Parameters:
name (str) – Name of the property to get.
- Returns:
List of values for the property.
- Return type:
pd.Series
- getProteinKeys() list[str] [source]
Return a list of keys identifying the proteins in the data frame.
- Returns:
List of protein keys.
- Return type:
keys (list)
- getProteinSequences() dict[str, str] [source]
Return a dictionary of protein sequences for the proteins in the data frame.
- Returns:
Dictionary of protein sequences.
- Return type:
sequences (dict)
- getScaffoldGroups(scaffold_name: str, mol_per_group: int = 10)
Get the scaffold groups for a given combination of scaffold and number of molecules per scaffold group.
- getScaffoldNames(scaffolds: list[qsprpred.data.chem.scaffolds.Scaffold] | None = None, include_mols: bool = False)
Get the names of the scaffolds in the data frame.
- getScaffolds(scaffolds: list[qsprpred.data.chem.scaffolds.Scaffold] | None = None, include_mols: bool = False)
Get the subset of the data frame that contains only scaffolds.
- Parameters:
include_mols (bool) – Whether to include the RDKit scaffold columns as well.
- Returns:
Data frame containing only scaffolds.
- Return type:
pd.DataFrame
- getSubset(prefix: str)
Get a subset of the data set by providing a prefix for the column names or a column name directly.
- Parameters:
prefix (str) – Prefix of the column names to select.
- getSummary()
Make a summary with some statistics about the molecules in this table. The summary contains the number of molecules per target and the number of unique molecules per target.
Requires this data set to be imported from Papyrus for now.
- Returns:
A dataframe with the summary statistics.
- Return type:
(pd.DataFrame)
- getTargetProperties(names: list) list[qsprpred.tasks.TargetProperty]
Get the target properties with the given names.
- Parameters:
- Returns:
list of target properties
- Return type:
- getTargetPropertiesValues(concat: bool = False, ordered: bool = False)
Get the response values (training and test) for the set target property.
- Parameters:
- Returns:
tuple
of (train_responses, test_responses) orpandas.DataFrame
of all target property values
- property hasClusters
Check whether the data frame contains clusters.
- Returns:
Whether the data frame contains clusters.
- Return type:
- hasDescriptors(descriptors: list[qsprpred.data.descriptors.sets.DescriptorSet | str] | None = None) bool | list[bool]
Check whether the data frame contains given descriptors.
- Parameters:
descriptors (list) – list of
DescriptorSet
objects or prefixes of descriptors to check for. IfNone
, all descriptors are checked for and a single boolean is returned if any descriptors are found.- Returns:
list of booleans indicating whether each descriptor is present or not.
- Return type:
- property hasFeatures
Check whether the currently selected set of features is not empty.
- hasProperty(name)
Check whether a property is present in the data frame.
- property hasScaffoldGroups
Check whether the data frame contains scaffold groups.
- Returns:
Whether the data frame contains scaffold groups.
- Return type:
- property hasScaffolds
Check whether the data frame contains scaffolds.
- Returns:
Whether the data frame contains scaffolds.
- Return type:
- imputeProperties(names: list[str], imputer: Callable)
Impute missing property values.
- Parameters:
names (list) – List of property names to impute.
imputer (Callable) –
- imputer object implementing the
fit_transform
method from scikit-learn API.
- imputer object implementing the
- property isMultiTask
Check if the dataset contains multiple target properties.
- iterChunks(include_props: list[str] | None = None, as_dict: bool = False, chunk_size: int | None = None) Generator[DataFrame | dict, None, None]
Batch a data frame into chunks of the given size.
- Parameters:
- Returns:
Generator that yields batches of the data frame as smaller data frames.
- Return type:
Generator[pd.DataFrame, None, None]
- iterFolds(split: DataSplit, concat: bool = False) Generator[tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame | pandas.core.series.Series, pandas.core.frame.DataFrame | pandas.core.series.Series, list[int], list[int]], None, None]
Iterate over the folds of the dataset.
- loadDescriptorsToSplits(shuffle: bool = True, random_state: int | None = None)
Load all available descriptors into the train and test splits.
If no descriptors are available, an exception will be raised.
- Parameters:
- Raises:
ValueError – if no descriptors are available
- makeClassification(target_property: str, th: list[float] | None = None)
Switch to classification task using the given threshold values.
- makeRegression(target_property: str)
Switch to regression task using the given target property.
- Parameters:
target_property (str) – name of the target property to use for regression
- property metaFile
The path to the meta file of this data set.
- property nJobs
- property nTargetProperties
Get the number of target properties in the dataset.
- prepareDataset(smiles_standardizer: str | ~typing.Callable | None = 'chembl', data_filters: list | None = (<qsprpred.data.processing.data_filters.RepeatsFilter object>, ), split=None, feature_calculators: list[qsprpred.data.descriptors.sets.DescriptorSet] | None = None, feature_filters: list | None = None, feature_standardizer: ~qsprpred.data.processing.feature_standardizers.SKLearnStandardizer | None = None, feature_fill_value: float = nan, applicability_domain: ~qsprpred.data.processing.applicability_domain.ApplicabilityDomain | ~mlchemad.base.ApplicabilityDomain | None = None, drop_outliers: bool = False, recalculate_features: bool = False, shuffle: bool = True, random_state: int | None = None)
Prepare the dataset for use in QSPR model.
- Parameters:
smiles_standardizer (str | Callable) – either
chembl
,old
, or a partial function that reads and standardizes smiles. IfNone
, no standardization will be performed. Defaults tochembl
.data_filters (list of datafilter obj) – filters number of rows from dataset
split (datasplitter obj) – splits the dataset into train and test set
feature_calculators (list[DescriptorSet]) – descriptor sets to add to the data set
feature_filters (list of feature filter objs) – filters features
feature_standardizer (SKLearnStandardizer or sklearn.base.BaseEstimator) – standardizes and/or scales features
feature_fill_value (float) – value to fill missing values with. Defaults to
numpy.nan
applicability_domain (applicabilityDomain obj) – attaches an applicability domain calculator to the dataset and fits it on the training set
drop_outliers (bool) – whether to drop samples that are outside the applicability domain from the test set, if one is attached.
recalculate_features (bool) – recalculate features even if they are already present in the file
shuffle (bool) – whether to shuffle the created training and test sets
random_state (int) – random state for shuffling
- processMols(processor: MolProcessor, proc_args: tuple[Any] | None = None, proc_kwargs: dict[str, Any] | None = None, add_props: list[str] | None = None, as_rdkit: bool = False, chunk_size: int | None = None, n_jobs: int | None = None) Generator
Apply a function to the molecules in the data frame. The SMILES or an RDKit molecule will be supplied as the first positional argument to the function. Additional properties to provide from the data set can be specified with ‘add_props’, which will be a dictionary supplied as an additional positional argument to the function.
IMPORTANT: For successful parallel processing, the processor must be picklable. Also note that the returned generator will produce results as soon as they are ready, which means that the chunks of data will not be in the same order as the original data frame. However, you can pass the value of
idProp
inadd_props
to identify the processed molecules. SeeCheckSmilesValid
for an example.- Parameters:
processor (MolProcessor) –
MolProcessor
object to use for processing.proc_args (list, optional) – Any additional positional arguments to pass to the processor.
proc_kwargs (dict, optional) – Any additional keyword arguments to pass to the processor.
add_props (list, optional) – List of data set properties to send to the processor. If
None
, all properties will be sent.as_rdkit (bool, optional) – Whether to convert the molecules to RDKit molecules before applying the processor.
chunk_size (int, optional) – Size of chunks to use per job in parallel. If not specified,
self.chunkSize
is used.n_jobs (int, optional) – Number of jobs to use for parallel processing. If not specified,
self.nJobs
is used.
- Returns:
A generator that yields the results of the supplied processor on the chunked molecules from the data set.
- Return type:
Generator
- reload()
Reload the data table from disk.
- removeProperty(name)
Remove a property from the data frame.
- Parameters:
name (str) – Name of the property to delete.
- reset()
Reset the data set. Splits will be removed and all descriptors will be moved to the training data. Molecule standardization and molecule filtering are not affected.
- resetTargetProperty(prop: TargetProperty | str)
Reset target property to its original value.
- Parameters:
prop (TargetProperty | str) – target property to reset
- restoreDescriptorSets(descriptors: list[qsprpred.data.descriptors.sets.DescriptorSet | str])
Restore descriptors that were previously removed.
- Parameters:
descriptors (list[DescriptorSet | str]) – List of
DescriptorSet
objects or their names. Name of a descriptor set corresponds to the result returned by its__str__
method.
- restoreTrainingData()
Restore training data from the data frame.
If the data frame contains a column ‘Split_IsTrain’, the data will be split into training and independent sets. Otherwise, the independent set will be empty. If descriptors are available, the resulting training matrices will be featurized.
- classmethod runMolProcess(props: dict[str, list] | DataFrame, func: MolProcessor, add_rdkit: bool, smiles_col: str, *args, **kwargs)
A helper method to run a
MolProcessor
on a list of molecules viaapply
. It converts the SMILES to RDKit molecules if required and then applies the function to theMolProcessor
object.- Parameters:
props (dict) – Dictionary of properties that will be passed in addition to the molecule structure.
func (MolProcessor) –
MolProcessor
object to use for processing.add_rdkit (bool) – Whether to convert the SMILES to RDKit molecules before applying the function.
smiles_col (str) – Name of the property containing the SMILES sequences.
*args – Additional positional arguments to pass to the function.
**kwargs – Additional keyword arguments to pass to the function.
- sample(n: int, name: str | None = None, random_state: int | None = None) MoleculeTable
Sample n molecules from the table.
- Parameters:
- Returns:
A dataframe with the sampled molecules.
- Return type:
- save(save_split: bool = True)
Save the data set to file and serialize metadata.
- Parameters:
save_split (bool) – whether to save split data to the managed data frame.
- saveSplit()
Save split data to the managed data frame.
- searchOnProperty(prop_name: str, values: list[str], name: str | None = None, exact=False) MoleculeTable
Search in this table using a property name and a list of values. It is assumed that the property is searchable with string matching. Either an exact match or a partial match can be used. If ‘exact’ is
False
, the search will be performed with partial matching, i.e. all molecules that contain any of the given values in the property will be returned. If ‘exact’ isTrue
, only molecules that have the exact property value for any of the given values will be returned.- Parameters:
prop_name (str) – Name of the property to search on.
values (list[str]) – List of values to search for. If any of the values is found in the property, the molecule will be considered a match.
name (str | None, optional) – Name of the new table. Defaults to the name of the old table, plus the
_searched
suffix.exact (bool, optional) – Whether to use exact matching, i.e. whether to search for exact strings or just substrings. Defaults to False.
- Returns:
A new table with the molecules from the old table with the given property values.
- Return type:
- searchWithIndex(index: Index, name: str | None = None) MoleculeTable [source]
Search in this table using a pandas index. The return values is a new table with the molecules from the old table with the given indices.
- Parameters:
index (pd.Index) – Indices to search for in this table.
name (str) – Name of the new table. Defaults to the name of the old table, plus the
_searched
suffix.
- Returns:
A new table with the molecules from the old table with the given indices.
- Return type:
- searchWithSMARTS(patterns: list[str], operator: ~typing.Literal['or', 'and'] = 'or', use_chirality: bool = False, name: str | None = None, match_function: ~typing.Callable = <function match_mol_to_smarts>) MoleculeTable
Search the molecules in the table with a SMARTS pattern.
- Parameters:
patterns – List of SMARTS patterns to search with.
operator (object) – Whether to use an “or” or “and” operator on patterns. Defaults to “or”.
use_chirality – Whether to use chirality in the search.
name – Name of the new table. Defaults to the name of the old table, plus the
smarts_searched
suffix.match_function – Function to use for matching the molecules to the SMARTS patterns. Defaults to
match_mol_to_smarts
.
- Returns:
A dataframe with the molecules that match the pattern.
- Return type:
(MolTable)
- setApplicabilityDomain(applicability_domain: ApplicabilityDomain | ApplicabilityDomain)
Set the applicability domain calculator.
- Parameters:
applicability_domain (ApplicabilityDomain | MLChemADApplicabilityDomain) – applicability domain calculator instance
- setFeatureStandardizer(feature_standardizer)
Set feature standardizer.
- Parameters:
feature_standardizer (SKLearnStandardizer | BaseEstimator) – feature standardizer
- setIndex(cols: list[str])
Create and index column from several columns of the data set. This also resets the
idProp
attribute to be the name of the index columns joined by a ‘~’ character. The values of the columns are also joined in the same way to create the index. Thus, make sure the values of the columns are unique together and can be joined to a string.
- setRandomState(random_state: int)
Set the random state for this instance.
- Parameters:
random_state (int) – Random state to use for shuffling and other random operations.
- setTargetProperties(target_props: list[qsprpred.tasks.TargetProperty | dict], drop_empty: bool = True)
Set list of target properties and apply transformations if specified.
- Parameters:
target_props (list[TargetProperty]) – list of target properties
drop_empty (bool, optional) – whether to drop rows with empty target property values. Defaults to
True
.
- setTargetProperty(prop: TargetProperty | dict, drop_empty: bool = True)
Add a target property to the dataset.
- Parameters:
prop (TargetProperty) – name of the target property to add
drop_empty (bool) – whether to drop rows with empty target property values. Defaults to
True
.
- property smiles: Generator[str, None, None]
Get the SMILES strings of the molecules in the data frame.
- Returns:
Generator of SMILES strings.
- Return type:
Generator[str, None, None]
- split(split: DataSplit, featurize: bool = False)
Split dataset into train and test set.
You can either split tha data frame itself or you can set
featurize
toTrue
if you want to use feature matrices instead of the raw data frame.
- standardizeSmiles(smiles_standardizer, drop_invalid=True)
Apply smiles_standardizer to the compounds in parallel
- Parameters:
() (smiles_standardizer) – either
None
to skip the standardization,chembl
,old
, or a partial function that reads and standardizes smiles.drop_invalid (bool) – whether to drop invalid SMILES from the data set. Defaults to
True
. IfFalse
, invalid SMILES will be retained in their original form. Ifself.invalidsRemoved
isTrue
, there will be no effect even ifdrop_invalid
isTrue
. Setself.invalidsRemoved
toFalse
on this instance to force the removal of invalid SMILES.
- Raises:
ValueError – when smiles_standardizer is not a callable or one of the predefined strings.
- property storeDir
The data set folder containing the data set files after saving.
- property storePath
The path to the main data set file.
- property storePrefix
The prefix of the data set files.
- property targetPropertyNames
Get the names of the target properties.
- toFile(filename: str)
Save the metafile and all associated files to a custom location.
- Parameters:
filename (str) – absolute path to the saved metafile.
- toJSON() str
- Serialize object to a JSON string. This JSON string should
contain all data necessary to reconstruct the object.
- Returns:
JSON string of the object
- Return type:
json (str)
- transformProperties(targets: list[str], transformer: Callable)
Transform the target properties using the given transformer.
- Parameters:
- unsetTargetProperty(name: str | TargetProperty)
Unset the target property. It will not remove it from the data set, but will make it unavailable for training.
- Parameters:
name (str | TargetProperty) – name of the target property to drop or the property itself
qsprpred.extra.data.tables.tests module
- class qsprpred.extra.data.tables.tests.TestPCMDataSetPreparation(methodName='runTest')[source]
Bases:
DataSetsMixInExtras
,DataPrepCheckMixIn
,TestCase
Test the preparation of the PCMDataSet.
Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.
- classmethod addClassCleanup(function, /, *args, **kwargs)
Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).
- addCleanup(function, /, *args, **kwargs)
Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.
Cleanup items are called even if setUp fails (unlike tearDown).
- addTypeEqualityFunc(typeobj, function)
Add a type specific assertEqual style function to compare a type.
This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.
- Parameters:
typeobj – The data type to call this function on when both values are of the same type in assertEqual().
function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.
- assertAlmostEqual(first, second, places=None, msg=None, delta=None)
Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.
Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).
If the two objects compare equal then they will automatically compare almost equal.
- assertCountEqual(first, second, msg=None)
Asserts that two iterables have the same elements, the same number of times, without regard to order.
- self.assertEqual(Counter(list(first)),
Counter(list(second)))
- Example:
[0, 1, 1] and [1, 0, 1] compare equal.
[0, 0, 1] and [0, 1] compare unequal.
- assertDictEqual(d1, d2, msg=None)
- assertEqual(first, second, msg=None)
Fail if the two objects are unequal as determined by the ‘==’ operator.
- assertFalse(expr, msg=None)
Check that the expression is false.
- assertGreater(a, b, msg=None)
Just like self.assertTrue(a > b), but with a nicer default message.
- assertGreaterEqual(a, b, msg=None)
Just like self.assertTrue(a >= b), but with a nicer default message.
- assertIn(member, container, msg=None)
Just like self.assertTrue(a in b), but with a nicer default message.
- assertIs(expr1, expr2, msg=None)
Just like self.assertTrue(a is b), but with a nicer default message.
- assertIsInstance(obj, cls, msg=None)
Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.
- assertIsNone(obj, msg=None)
Same as self.assertTrue(obj is None), with a nicer default message.
- assertIsNot(expr1, expr2, msg=None)
Just like self.assertTrue(a is not b), but with a nicer default message.
- assertIsNotNone(obj, msg=None)
Included for symmetry with assertIsNone.
- assertLess(a, b, msg=None)
Just like self.assertTrue(a < b), but with a nicer default message.
- assertLessEqual(a, b, msg=None)
Just like self.assertTrue(a <= b), but with a nicer default message.
- assertListEqual(list1, list2, msg=None)
A list-specific equality assertion.
- Parameters:
list1 – The first list to compare.
list2 – The second list to compare.
msg – Optional message to use on failure instead of a list of differences.
- assertLogs(logger=None, level=None)
Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.
This method must be used as a context manager, and will yield a recording object with two attributes:
output
andrecords
. At the end of the context manager, theoutput
attribute will be a list of the matching formatted log messages and therecords
attribute will be a list of the corresponding LogRecord objects.Example:
with self.assertLogs('foo', level='INFO') as cm: logging.getLogger('foo').info('first message') logging.getLogger('foo.bar').error('second message') self.assertEqual(cm.output, ['INFO:foo:first message', 'ERROR:foo.bar:second message'])
- assertMultiLineEqual(first, second, msg=None)
Assert that two multi-line strings are equal.
- assertNoLogs(logger=None, level=None)
Fail unless no log messages of level level or higher are emitted on logger_name or its children.
This method must be used as a context manager.
- assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)
Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.
Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).
Objects that are equal automatically fail.
- assertNotEqual(first, second, msg=None)
Fail if the two objects are equal as determined by the ‘!=’ operator.
- assertNotIn(member, container, msg=None)
Just like self.assertTrue(a not in b), but with a nicer default message.
- assertNotIsInstance(obj, cls, msg=None)
Included for symmetry with assertIsInstance.
- assertNotRegex(text, unexpected_regex, msg=None)
Fail the test if the text matches the regular expression.
- assertRaises(expected_exception, *args, **kwargs)
Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.
If called with the callable and arguments omitted, will return a context object used like this:
with self.assertRaises(SomeException): do_something()
An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.
The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:
with self.assertRaises(SomeException) as cm: do_something() the_exception = cm.exception self.assertEqual(the_exception.error_code, 3)
- assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)
Asserts that the message in a raised exception matches a regex.
- Parameters:
expected_exception – Exception class expected to be raised.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.
- assertRegex(text, expected_regex, msg=None)
Fail the test unless the text matches the regular expression.
- assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)
An equality assertion for ordered sequences (like lists and tuples).
For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.
- Parameters:
seq1 – The first sequence to compare.
seq2 – The second sequence to compare.
seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
msg – Optional message to use on failure instead of a list of differences.
- assertSetEqual(set1, set2, msg=None)
A set-specific equality assertion.
- Parameters:
set1 – The first set to compare.
set2 – The second set to compare.
msg – Optional message to use on failure instead of a list of differences.
assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).
- assertTrue(expr, msg=None)
Check that the expression is true.
- assertTupleEqual(tuple1, tuple2, msg=None)
A tuple-specific equality assertion.
- Parameters:
tuple1 – The first tuple to compare.
tuple2 – The second tuple to compare.
msg – Optional message to use on failure instead of a list of differences.
- assertWarns(expected_warning, *args, **kwargs)
Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.
If called with the callable and arguments omitted, will return a context object used like this:
with self.assertWarns(SomeWarning): do_something()
An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.
The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:
with self.assertWarns(SomeWarning) as cm: do_something() the_warning = cm.warning self.assertEqual(the_warning.some_attribute, 147)
- assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)
Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.
- Parameters:
expected_warning – Warning class expected to be triggered.
expected_regex – Regex (re.Pattern object or string) expected to be found in error message.
args – Function to be called and extra positional args.
kwargs – Extra kwargs.
msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.
- checkDescriptors(dataset: QSPRDataset, target_props: list[dict | qsprpred.tasks.TargetProperty])
Check if information about descriptors is consistent in the data set. Checks if calculators are consistent with the descriptors contained in the data set. This is tested also before and after serialization.
- Parameters:
dataset (QSPRDataset) – The data set to check.
target_props (List of dicts or TargetProperty) – list of target properties
- Raises:
AssertionError – If the consistency check fails.
- checkFeatures(ds: QSPRDataset, expected_length: int)
Check if the feature names and the feature matrix of a data set is consistent with expected number of variables.
- Parameters:
ds (QSPRDataset) – The data set to check.
expected_length (int) – The expected number of features.
- Raises:
AssertionError – If the feature names or the feature matrix is not consistent
- checkPrep(dataset, feature_calculators, split, feature_standardizer, feature_filter, data_filter, applicability_domain, expected_target_props)
Check the consistency of the dataset after preparation.
- clearGenerated()
Remove the directories that are used for testing.
- countTestCases()
- createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': <TargetTasks.MULTICLASS: 'MULTICLASS'>, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)
Create a large dataset for testing purposes.
- Parameters:
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
preparation_settings (dict) – dictionary containing preparation settings
random_state (int) – random state to use for splitting and shuffling
- Returns:
a
QSPRDataset
object- Return type:
- createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42, n_jobs=1, chunk_size=None)
Create a large dataset for testing purposes.
- Parameters:
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings
- Returns:
a
QSPRDataset
object- Return type:
- createPCMDataSet(name: str = 'QSPRDataset_test_pcm', target_props: list[qsprpred.tasks.TargetProperty] | list[dict] = [{'name': 'pchembl_value_Median', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings: dict | None = None, protein_col: str = 'accession', random_state: int | None = None)
Create a small dataset for testing purposes.
- Parameters:
name (str, optional) – name of the dataset. Defaults to “QSPRDataset_test”.
target_props (list[TargetProperty] | list[dict], optional) – target properties.
preparation_settings (dict | None, optional) – preparation settings. Defaults to None.
protein_col (str, optional) – name of the column with protein accessions. Defaults to “accession”.
random_state (int, optional) – random seed to use in the dataset. Defaults to
None
- Returns:
a
QSPRDataset
object- Return type:
- createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], preparation_settings=None, random_state=42)
Create a small dataset for testing purposes.
- Parameters:
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
preparation_settings (dict) – dictionary containing preparation settings
- Returns:
a
QSPRDataset
object- Return type:
- createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': <TargetTasks.REGRESSION: 'REGRESSION'>}], random_state=None, prep=None, n_jobs=1, chunk_size=None)
Create a dataset for testing purposes from the given data frame.
- Parameters:
df (pd.DataFrame) – data frame containing the dataset
name (str) – name of the dataset
target_props (List of dicts or TargetProperty) – list of target properties
random_state (int) – random state to use for splitting and shuffling
prep (dict) – dictionary containing preparation settings
- Returns:
a
QSPRDataset
object- Return type:
- debug()
Run the test without collecting errors in a TestResult
- defaultTestResult()
- classmethod doClassCleanups()
Execute all class cleanup functions. Normally called for you after tearDownClass.
- doCleanups()
Execute all cleanup functions. Normally called for you after tearDown.
- classmethod enterClassContext(cm)
Same as enterContext, but class-wide.
- enterContext(cm)
Enters the supplied context manager.
If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.
- fail(msg=None)
Fail immediately, with the given message.
- failureException
alias of
AssertionError
- fetchDataset(name: str) PCMDataSet [source]
Create a quick dataset with the given name.
- Parameters:
name (str) – Name of the dataset.
- Returns:
The dataset.
- Return type:
- classmethod getAllDescriptors() list[qsprpred.data.descriptors.sets.DescriptorSet]
Return a list of all available molecule descriptor sets.
- Returns:
list of
MoleculeDescriptorSet
objects- Return type:
- classmethod getAllProteinDescriptors() list[qsprpred.extra.data.descriptors.sets.ProteinDescriptorSet]
Return a list of all available protein descriptor sets.
- Returns:
list of
ProteinDescriptorSet
objects- Return type:
- getBigDF()
Get a large data frame for testing purposes.
- Returns:
a
pandas.DataFrame
containing the dataset- Return type:
pd.DataFrame
- classmethod getDataPrepGrid()
Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.
- Returns:
a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)
- Return type:
grid
- classmethod getDefaultCalculatorCombo()
Return the default descriptor calculator combo.
- static getDefaultPrep()
Return a dictionary with default preparation settings.
- getPCMDF() DataFrame
Return a test dataframe with PCM data.
- Returns:
dataframe with PCM data
- Return type:
pd.DataFrame
- getPCMSeqProvider() Callable[[list[str]], tuple[dict[str, str], dict[str, dict]]]
Return a function that provides sequences for given accessions.
- getPCMTargetsDF() DataFrame
Return a test dataframe with PCM targets and their sequences.
- Returns:
dataframe with PCM targets and their sequences
- Return type:
pd.DataFrame
- classmethod getPrepCombos()
Return a list of all possible preparation combinations as generated by
getDataPrepGrid
as well as their names. The generated list can be used to parameterize tests with the given named combinations.
- getSmallDF()
Get a small data frame for testing purposes.
- Returns:
a
pandas.DataFrame
containing the dataset- Return type:
pd.DataFrame
- id()
- longMessage = True
- maxDiff = 640
- run(result=None)
- classmethod setUpClass()
Hook method for setting up class fixture before running tests in the class.
- setUpPaths()
Create the directories that are used for testing.
- shortDescription()
Returns a one-line description of the test, or None if no description has been provided.
The default implementation of this method returns the first line of the specified test method’s docstring.
- skipTest(reason)
Skip this test.
- subTest(msg=<object object>, **params)
Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.
- tearDown()
Remove all files and directories that are used for testing.
- classmethod tearDownClass()
Hook method for deconstructing the class fixture after running all tests in the class.
- testPrepCombinations = None
- testPrepCombinations_00_MorganFP_ProDec_None_None_None_None_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_None_None_None_None’, name=’MorganFP_ProDec_None_None_None_None_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3c4d7f0>), split=None, feature_standardizer=None, feature_filter=None, data_filter=None, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_01_MorganFP_ProDec_None_None_None_None_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_None_None_None_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_None_None_None_None_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3c4d7c0>), split=None, feature_standardizer=None, feature_filter=None, data_filter=None, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff3b476b0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_02_MorganFP_ProDec_None_None_None_RepeatsFilter_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_None_None_RepeatsFilter_None’, name=’MorganFP_ProDec_None_None_None_RepeatsFilter_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3b477d0>), split=None, feature_standardizer=None, feature_filter=None, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff3bf8860>, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_03_MorganFP_ProDec_None_None_None_RepeatsFilter_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_None_None_…ilter_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_None_None_None_…ilter_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3bf88f0>), split=None, feature_standardizer=None, feature_filter=None, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff3b477a0>, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff3d534d0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_04_MorganFP_ProDec_None_None_HighCorrelationFilter_None_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_None_HighCorrelationFilter_None_None’, name=’MorganFP_ProDec_None_None_HighCorrelationFilter_None_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3bf8950>), split=None, feature_standardizer=None, feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff472c350>, data_filter=None, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_05_MorganFP_ProDec_None_None_HighCorrelationFilter_None_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_None_HighC…_None_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_None_None_HighC…_None_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3ebf8f0>), split=None, feature_standardizer=None, feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff474f2c0>, data_filter=None, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff395b770>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_06_MorganFP_ProDec_None_None_HighCorrelationFilter_RepeatsFilter_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_None_HighC…lationFilter_RepeatsFilter_None’, name=’MorganFP_ProDec_None_None_HighC…lationFilter_RepeatsFilter_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3ada780>), split=None, feature_standardizer=None, feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff48ff680>, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff3810830>, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_07_MorganFP_ProDec_None_None_HighCorrelationFilter_RepeatsFilter_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_None_HighC…ilter_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_None_None_HighC…ilter_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff395b800>), split=None, feature_standardizer=None, feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff3bf89e0>, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff38858b0>, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff3885880>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_08_MorganFP_ProDec_None_StandardScaler_None_None_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_StandardScaler_None_None_None’, name=’MorganFP_ProDec_None_StandardScaler_None_None_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3810920>), split=None, feature_standardizer=StandardScaler(), feature_filter=None, data_filter=None, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_09_MorganFP_ProDec_None_StandardScaler_None_None_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_StandardSc…_None_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_None_StandardSc…_None_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3ada7e0>), split=None, feature_standardizer=StandardScaler(), feature_filter=None, data_filter=None, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff37739e0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_10_MorganFP_ProDec_None_StandardScaler_None_RepeatsFilter_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_StandardScaler_None_RepeatsFilter_None’, name=’MorganFP_ProDec_None_StandardScaler_None_RepeatsFilter_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff38859d0>), split=None, feature_standardizer=StandardScaler(), feature_filter=None, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff3628ad0>, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_11_MorganFP_ProDec_None_StandardScaler_None_RepeatsFilter_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_StandardSc…ilter_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_None_StandardSc…ilter_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3773ad0>), split=None, feature_standardizer=StandardScaler(), feature_filter=None, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff36a1b80>, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff36a1be0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_12_MorganFP_ProDec_None_StandardScaler_HighCorrelationFilter_None_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_StandardSc…HighCorrelationFilter_None_None’, name=’MorganFP_ProDec_None_StandardSc…HighCorrelationFilter_None_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3628c20>), split=None, feature_standardizer=StandardScaler(), feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff351ec60>, data_filter=None, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_13_MorganFP_ProDec_None_StandardScaler_HighCorrelationFilter_None_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_StandardSc…_None_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_None_StandardSc…_None_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff38fea80>), split=None, feature_standardizer=StandardScaler(), feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff358bd10>, data_filter=None, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff358bd70>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_14_MorganFP_ProDec_None_StandardScaler_HighCorrelationFilter_RepeatsFilter_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_StandardSc…lationFilter_RepeatsFilter_None’, name=’MorganFP_ProDec_None_StandardSc…lationFilter_RepeatsFilter_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff351edb0>), split=None, feature_standardizer=StandardScaler(), feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff3444e30>, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff3444e90>, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_15_MorganFP_ProDec_None_StandardScaler_HighCorrelationFilter_RepeatsFilter_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_None_StandardSc…ilter_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_None_StandardSc…ilter_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff358be90>), split=None, feature_standardizer=StandardScaler(), feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff34b5f10>, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff34b5f70>, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff34b5fa0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_16_MorganFP_ProDec_RandomSplit_None_None_None_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_None_None_None_None’, name=’MorganFP_ProDec_RandomSplit_None_None_None_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff36a1d00>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff358be30>, feature_standardizer=None, feature_filter=None, data_filter=None, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_17_MorganFP_ProDec_RandomSplit_None_None_None_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Non…_None_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_RandomSplit_Non…_None_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3444f50>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff36a1ca0>, feature_standardizer=None, feature_filter=None, data_filter=None, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff33e00e0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_18_MorganFP_ProDec_RandomSplit_None_None_RepeatsFilter_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_None_None_RepeatsFilter_None’, name=’MorganFP_ProDec_RandomSplit_None_None_RepeatsFilter_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff34b60c0>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff48ff230>, feature_standardizer=None, feature_filter=None, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff3251160>, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_19_MorganFP_ProDec_RandomSplit_None_None_RepeatsFilter_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Non…ilter_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_RandomSplit_Non…ilter_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff33e0170>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff34b6060>, feature_standardizer=None, feature_filter=None, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff32ca1e0>, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff32ca1b0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_20_MorganFP_ProDec_RandomSplit_None_HighCorrelationFilter_None_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Non…HighCorrelationFilter_None_None’, name=’MorganFP_ProDec_RandomSplit_Non…HighCorrelationFilter_None_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3251250>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff3444fb0>, feature_standardizer=None, feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff314f290>, data_filter=None, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_21_MorganFP_ProDec_RandomSplit_None_HighCorrelationFilter_None_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Non…_None_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_RandomSplit_Non…_None_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff332f110>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff32511f0>, feature_standardizer=None, feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff31f8350>, data_filter=None, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff31f8320>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_22_MorganFP_ProDec_RandomSplit_None_HighCorrelationFilter_RepeatsFilter_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Non…lationFilter_RepeatsFilter_None’, name=’MorganFP_ProDec_RandomSplit_Non…lationFilter_RepeatsFilter_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff314f380>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff332f0b0>, feature_standardizer=None, feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff306d400>, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff306d3d0>, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_23_MorganFP_ProDec_RandomSplit_None_HighCorrelationFilter_RepeatsFilter_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Non…ilter_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_RandomSplit_Non…ilter_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff31f8470>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff314f320>, feature_standardizer=None, feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff30ea4b0>, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff30ea480>, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff30ea4e0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_24_MorganFP_ProDec_RandomSplit_StandardScaler_None_None_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_StandardScaler_None_None_None’, name=’MorganFP_ProDec_RandomSplit_StandardScaler_None_None_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff32ca300>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff31f8410>, feature_standardizer=StandardScaler(), feature_filter=None, data_filter=None, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_25_MorganFP_ProDec_RandomSplit_StandardScaler_None_None_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Sta…_None_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_RandomSplit_Sta…_None_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff30ea5a0>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff306d520>, feature_standardizer=StandardScaler(), feature_filter=None, data_filter=None, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff2e0c6b0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_26_MorganFP_ProDec_RandomSplit_StandardScaler_None_RepeatsFilter_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Sta…dScaler_None_RepeatsFilter_None’, name=’MorganFP_ProDec_RandomSplit_Sta…dScaler_None_RepeatsFilter_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff2f636b0>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff33e01d0>, feature_standardizer=StandardScaler(), feature_filter=None, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff2e85790>, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_27_MorganFP_ProDec_RandomSplit_StandardScaler_None_RepeatsFilter_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Sta…ilter_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_RandomSplit_Sta…ilter_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff2e0c7d0>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff2f63650>, feature_standardizer=StandardScaler(), feature_filter=None, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff2efe870>, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff2efe8a0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_28_MorganFP_ProDec_RandomSplit_StandardScaler_HighCorrelationFilter_None_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Sta…HighCorrelationFilter_None_None’, name=’MorganFP_ProDec_RandomSplit_Sta…HighCorrelationFilter_None_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff30ea600>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff2e0c770>, feature_standardizer=StandardScaler(), feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff2d6f980>, data_filter=None, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_29_MorganFP_ProDec_RandomSplit_StandardScaler_HighCorrelationFilter_None_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Sta…_None_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_RandomSplit_Sta…_None_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff2efe9c0>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff2e858b0>, feature_standardizer=StandardScaler(), feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff2c24aa0>, data_filter=None, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff2c24ad0>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_30_MorganFP_ProDec_RandomSplit_StandardScaler_HighCorrelationFilter_RepeatsFilter_None(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Sta…lationFilter_RepeatsFilter_None’, name=’MorganFP_ProDec_RandomSplit_Sta…lationFilter_RepeatsFilter_None’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff3773b30>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff2efe960>, feature_standardizer=StandardScaler(), feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff2ca1bb0>, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff2ca1be0>, applicability_domain=None].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- testPrepCombinations_31_MorganFP_ProDec_RandomSplit_StandardScaler_HighCorrelationFilter_RepeatsFilter_TopKatApplicabilityDomain(**kw)
Test the preparation of the dataset [with _=’MorganFP_ProDec_RandomSplit_Sta…ilter_TopKatApplicabilityDomain’, name=’MorganFP_ProDec_RandomSplit_Sta…ilter_TopKatApplicabilityDomain’, feature_calculators=(<qsprpred.data.descriptors.fing…roDec object at 0x7efff306d4c0>), split=<qsprpred.data.sampling.splits.R…mSplit object at 0x7efff2d6faa0>, feature_standardizer=StandardScaler(), feature_filter=<qsprpred.data.processing.featur…Filter object at 0x7efff2b16cc0>, data_filter=<qsprpred.data.processing.data_f…Filter object at 0x7efff2b16cf0>, applicability_domain=<mlchemad.applicability_domains….Domain object at 0x7efff2b16d20>].
Use different combinations of feature calculators, feature standardizers, feature filters and data filters.
- Parameters:
name (str) – Name of the dataset.
feature_calculators (list[DescriptorsCalculator]) – List of feature calculators.
split (DataSplit) – Splitting strategy.
feature_standardizer (SKLearnStandardizer) – Feature standardizer.
feature_filter (Callable) – Feature filter.
data_filter (Callable) – Data filter.
applicability_domain (Callable) – Applicability domain.
- validate_split(dataset)
Check if the split has the data it should have after splitting.