qsprpred.data.tables.interfaces package
Submodules
qsprpred.data.tables.interfaces.data_set_dependent module
- class qsprpred.data.tables.interfaces.data_set_dependent.DataSetDependent(dataset: QSPRDataSet | None = None, **kwargs: Any)[source]
Bases:
JSONSerializableClasses that need an attached
QSPRDataSetshould inherit from this class, and it will be supplied to them via this API.- Variables:
dataSet (QSPRDataSet) – The data set attached to this object.
Initialize the object with a data set.
- Parameters:
dataset (QSPRDataSet, optional) – The data set to attach to this object. Defaults to None.
- getDataSet() QSPRDataSet[source]
Get the data set attached to this object.
- Returns:
The data set attached to this object
- Return type:
- Raises:
ValueError – If no data set is attached to this object.
- setDataSet(dataset: QSPRDataSet | None) None[source]
Set the data set for this object.
qsprpred.data.tables.interfaces.molecule_data_set module
- class qsprpred.data.tables.interfaces.molecule_data_set.MoleculeDataSet[source]
Bases:
PropertyStorage,DescriptorProvider,MolProcessable,SMARTSSearchable,Summarizable,Randomized,Identifiable,Standardizable,ABCInterface for storing and managing chemical data sets for machine learning.
- abstract addDescriptors(descriptors: DescriptorSet, *args, **kwargs)
Add descriptors to the dataset.
- Parameters:
descriptors (list[DescriptorSet]) – The descriptors to add.
args – Additional positional arguments to be passed to each descriptor set.
kwargs – Additional keyword arguments to be passed to each descriptor set.
- abstract addEntries(ids: list[str], props: dict[str, list], raise_on_existing: bool = True)
Add entries to the storage.
- abstract addProperty(name: str, data: Sized, ids: list[str] | None = None)
Add a property to the dataset. The supplied data should be a sized list of values of the same length as the number of entries in the storage.
- abstract apply(func: callable, func_args: list | None = None, func_kwargs: dict | None = None, on_props: tuple[str, ...] | None = None, as_df: bool = False) Generator[Iterable[Any], None, None]
Apply a function on all or selected properties of the chunks of data. The properties are supplied as the first positional argument to the function. The format of the properties is up to the downstream implementation, but it should always be a single object supplied as the first parameter.
- Parameters:
func (callable) – The function to apply.
func_args (list, optional) – The positional arguments of the function.
func_kwargs (dict, optional) – The keyword arguments of the function.
on_props (list, optional) – The properties to apply the function on.
as_df (bool, optional) – Provide properties as a DataFrame to the function.
- Returns:
A generator that yields the results of the function applied to each chunk.
- abstract applyIdentifier(identifier: ChemIdentifier)
Apply an identifier to the SMILES in this instance (i.e. remove duplicates).
- Parameters:
identifier (ChemIdentifier) – The identifier to apply.
- abstract applyStandardizer(standardizer: ChemStandardizer)
Apply a standardizer to the SMILES in the store.
- Parameters:
standardizer (ChemStandardizer) – The standardizer to apply
- abstract clear()
Delete entries in the persistent storage.
- abstract property descriptorSets: list[DescriptorSet]
Get the descriptor sets that are currently in the storage.
- Returns:
a
listof descriptor sets
- abstract dropDescriptorSets(descriptors: list[DescriptorSet | str])
Drop descriptor sets from the storage.
- Parameters:
descriptors – The descriptor sets to drop.
- abstract dropEntries(ids: Iterable[str])
Drop entries from the storage.
- Parameters:
ids (list) – The IDs of the entries to drop.
- abstract getDF() DataFrame
Get the stored properties as a pandas DataFrame.
- Returns:
The data as a pandas DataFrame.
- Return type:
pd.DataFrame
- abstract getDescriptorNames() list[str]
Get the names of the descriptors that are currently in the storage.
- Returns:
a
listof descriptor names
- abstract getDescriptors() DataFrame
Get the table of descriptors that are currently in the storage.
- Returns:
a pd.DataFrame with the descriptors
- abstract getProperty(name: str, ids: tuple[str] | None = None) Iterable[Any]
Get values of a given property.
- abstract getSubset(subset: Iterable[str], ids: Iterable[str] | None = None) PropertyStorage
Get a subset of the storage for the given properties.
- Parameters:
- Returns:
The subset of the storage.
- Return type:
- abstract getSummary() DataFrame
Make a summary with some statistics about this object or action.
- Returns:
A dataframe with the summary statistics.
- Return type:
(pd.DataFrame)
- abstract hasDescriptors()
Indicates if the storage has descriptors.
- abstract property identifier: ChemIdentifier
Get the identifier used by this instance.
- Returns:
The identifier used by this instance.
- Return type:
- abstract iterChunks(size: int | None = None, on_props: list | None = None) Generator[list[Any], None, None]
Iterate over chunks of molecules across the store.
- Returns:
an iterable of lists of stored molecules
- abstract property metaFile: str
Get the absolute path to the metadata file that describes how the persisted data can be accessed. This can be used to load the object back from storage using the
fromFileclass method.- Returns:
The absolute path to the metadata file.
- Return type:
- abstract processMols(processor: MolProcessor, proc_args: tuple[Any, ...] | None = None, proc_kwargs: dict[str, Any] | None = None, mol_type: Literal['smiles', 'mol', 'rdkit'] = 'mol', add_props: Iterable[str] | None = None) Generator[Any, None, None]
Process the molecules in this instance with a given
MolProcessor.- Parameters:
processor (MolProcessor) – The processor to use.
proc_args (tuple, optional) – Additional arguments to pass to the processor.
proc_kwargs (dict, optional) – Additional keyword arguments to pass to the processor.
mol_type (str, optional) – The type of molecule to process.
add_props (list, optional) – Additional properties to add to the dataset.
- Returns:
A generator that yields the processed molecules.
- Return type:
Generator
- abstract reload()
Reset the current state by reloading from storage.
- abstract removeProperty(name: str)
Remove a property from the dataset.
- Parameters:
name (str) – The name of the property.
- abstract save() str
Save current state to storage and return the path to the serialized file.
- Returns:
The path to the serialized file.
- Return type:
- abstract searchOnProperty(prop_name: str, values: list[float | int | str], exact=False) PropSearchable
Search the molecules within this
MoleculeDataSeton a property value.- Parameters:
prop_name – Name of the column to search on.
values – Values to search for.
exact – Whether to search for exact matches or not.
- Returns:
Another instance that can be filtered further.
- Return type:
- abstract searchWithSMARTS(patterns: list[str]) SMARTSSearchable
Search the molecules within this instance with SMARTS patterns.
- Parameters:
patterns – List of SMARTS patterns to search with.
- Returns:
Another instance that can be filtered further.
- Return type:
- abstract property smiles: Generator[str, None, None]
Get the SMILES strings of the molecules in this instance.
- Returns:
Generator of SMILES strings.
- Return type:
Generator[str, None, None]
- abstract property standardizer: ChemStandardizer
Get the standardizer used by the store.
- Returns:
The standardizer used by the store.
- Return type:
qsprpred.data.tables.interfaces.qspr_data_set module
- class qsprpred.data.tables.interfaces.qspr_data_set.QSPRDataSet[source]
Bases:
MoleculeDataSet,ABCInterface for storing and managing QSPR-specific data sets.
- abstract addDescriptors(descriptors: DescriptorSet, *args, **kwargs)
Add descriptors to the dataset.
- Parameters:
descriptors (list[DescriptorSet]) – The descriptors to add.
args – Additional positional arguments to be passed to each descriptor set.
kwargs – Additional keyword arguments to be passed to each descriptor set.
- abstract addEntries(ids: list[str], props: dict[str, list], raise_on_existing: bool = True)
Add entries to the storage.
- abstract addProperty(name: str, data: Sized, ids: list[str] | None = None)
Add a property to the dataset. The supplied data should be a sized list of values of the same length as the number of entries in the storage.
- abstract addTargetProperty(prop: TargetSpec | dict, drop_empty: bool = True)[source]
Add a target property to the dataset.
- Parameters:
prop (TargetSpec) – name of the target property to add
drop_empty (bool) – whether to drop rows with empty target property values. Defaults to
True.
- abstract apply(func: callable, func_args: list | None = None, func_kwargs: dict | None = None, on_props: tuple[str, ...] | None = None, as_df: bool = False) Generator[Iterable[Any], None, None]
Apply a function on all or selected properties of the chunks of data. The properties are supplied as the first positional argument to the function. The format of the properties is up to the downstream implementation, but it should always be a single object supplied as the first parameter.
- Parameters:
func (callable) – The function to apply.
func_args (list, optional) – The positional arguments of the function.
func_kwargs (dict, optional) – The keyword arguments of the function.
on_props (list, optional) – The properties to apply the function on.
as_df (bool, optional) – Provide properties as a DataFrame to the function.
- Returns:
A generator that yields the results of the function applied to each chunk.
- abstract applyIdentifier(identifier: ChemIdentifier)
Apply an identifier to the SMILES in this instance (i.e. remove duplicates).
- Parameters:
identifier (ChemIdentifier) – The identifier to apply.
- abstract applyStandardizer(standardizer: ChemStandardizer)
Apply a standardizer to the SMILES in the store.
- Parameters:
standardizer (ChemStandardizer) – The standardizer to apply
- abstract clear()
Delete entries in the persistent storage.
- abstract property descriptorSets: list[DescriptorSet]
Get the descriptor sets that are currently in the storage.
- Returns:
a
listof descriptor sets
- abstract dropDescriptorSets(descriptors: list[DescriptorSet | str])
Drop descriptor sets from the storage.
- Parameters:
descriptors – The descriptor sets to drop.
- abstract dropEntries(ids: Iterable[str])
Drop entries from the storage.
- Parameters:
ids (list) – The IDs of the entries to drop.
- abstract getDF() DataFrame
Get the stored properties as a pandas DataFrame.
- Returns:
The data as a pandas DataFrame.
- Return type:
pd.DataFrame
- abstract getDescriptorNames() list[str]
Get the names of the descriptors that are currently in the storage.
- Returns:
a
listof descriptor names
- abstract getDescriptors() DataFrame
Get the table of descriptors that are currently in the storage.
- Returns:
a pd.DataFrame with the descriptors
- abstract getProperty(name: str, ids: tuple[str] | None = None) Iterable[Any]
Get values of a given property.
- abstract getSubset(subset: Iterable[str], ids: Iterable[str] | None = None) PropertyStorage
Get a subset of the storage for the given properties.
- Parameters:
- Returns:
The subset of the storage.
- Return type:
- abstract getSummary() DataFrame
Make a summary with some statistics about this object or action.
- Returns:
A dataframe with the summary statistics.
- Return type:
(pd.DataFrame)
- getTargetPropertiesNames() list[str][source]
Get the names of the target properties. :returns: list of target property names :rtype: (list[str])
- abstract hasDescriptors()
Indicates if the storage has descriptors.
- abstract property identifier: ChemIdentifier
Get the identifier used by this instance.
- Returns:
The identifier used by this instance.
- Return type:
- abstract iterChunks(size: int | None = None, on_props: list | None = None) Generator[list[Any], None, None]
Iterate over chunks of molecules across the store.
- Returns:
an iterable of lists of stored molecules
- abstract makeClassification(target_property: str, threshold: float | list[float])[source]
Make this a classification dataset for the given target property.
- abstract makeRegression(target_property: str)[source]
- Make this a regression dataset for the given target property.
This is only possible if the target property was previously converted to classification.
- Parameters:
target_property (str) – The name of the target property.
- abstract property metaFile: str
Get the absolute path to the metadata file that describes how the persisted data can be accessed. This can be used to load the object back from storage using the
fromFileclass method.- Returns:
The absolute path to the metadata file.
- Return type:
- abstract processMols(processor: MolProcessor, proc_args: tuple[Any, ...] | None = None, proc_kwargs: dict[str, Any] | None = None, mol_type: Literal['smiles', 'mol', 'rdkit'] = 'mol', add_props: Iterable[str] | None = None) Generator[Any, None, None]
Process the molecules in this instance with a given
MolProcessor.- Parameters:
processor (MolProcessor) – The processor to use.
proc_args (tuple, optional) – Additional arguments to pass to the processor.
proc_kwargs (dict, optional) – Additional keyword arguments to pass to the processor.
mol_type (str, optional) – The type of molecule to process.
add_props (list, optional) – Additional properties to add to the dataset.
- Returns:
A generator that yields the processed molecules.
- Return type:
Generator
- abstract reload()
Reset the current state by reloading from storage.
- abstract removeProperty(name: str)
Remove a property from the dataset.
- Parameters:
name (str) – The name of the property.
- abstract restoreTargetProperty(prop: TargetSpec | str)[source]
Restore a target property to the original state.
- Parameters:
prop (TargetSpec | str) – The target property to restore.
- abstract save() str
Save current state to storage and return the path to the serialized file.
- Returns:
The path to the serialized file.
- Return type:
- abstract searchOnProperty(prop_name: str, values: list[float | int | str], exact=False) PropSearchable
Search the molecules within this
MoleculeDataSeton a property value.- Parameters:
prop_name – Name of the column to search on.
values – Values to search for.
exact – Whether to search for exact matches or not.
- Returns:
Another instance that can be filtered further.
- Return type:
- abstract searchWithSMARTS(patterns: list[str]) SMARTSSearchable
Search the molecules within this instance with SMARTS patterns.
- Parameters:
patterns – List of SMARTS patterns to search with.
- Returns:
Another instance that can be filtered further.
- Return type:
- abstract setTargetProperties(target_props: list[TargetSpec | dict], drop_empty: bool = True)[source]
Set the target properties for the dataset.
- Parameters:
target_props (list[TargetSpec | dict]) – The target properties to add.
drop_empty (bool) – If True, drop rows with missing target properties.
- abstract property smiles: Generator[str, None, None]
Get the SMILES strings of the molecules in this instance.
- Returns:
Generator of SMILES strings.
- Return type:
Generator[str, None, None]
- abstract property standardizer: ChemStandardizer
Get the standardizer used by the store.
- Returns:
The standardizer used by the store.
- Return type:
- abstract property targetProperties: list[TargetSpec]
Get the target properties of the dataset.
- Returns:
list of target properties
- Return type:
(list)
- toFile(filename: str) str
Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.
- toJSON() str
- Serialize object to a JSON string. This JSON string should
contain all data necessary to reconstruct the object.
- Returns:
JSON string of the object
- Return type:
json (str)
- abstract unsetTargetProperty(name: str | TargetSpec)[source]
Unset the target property with the given name.
- Parameters:
TargetSpec) ((str |) – name of the target property to unset