qsprpred.data.descriptors package

Submodules

qsprpred.data.descriptors.fingerprints module

Fingerprint classes.

class qsprpred.data.descriptors.fingerprints.AtomPairFP(nBits=2048, **kwargs)[source]

Bases: Fingerprint

Atom pair fingerprint.

Variables:
  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s GetHashedAtomPairFingerprintAsBitVect function

Initialize the atom pair fingerprint.

Parameters:
  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s GetHashedAtomPairFingerprintAsBitVect

property descriptors: list[str]

list of descriptors.

property dtype

Data type of the descriptor.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the atom pair fingerprints for the input molecules.

Parameters:
  • mols (list) – list of RDKit molecules

  • props (dict) – dictionary of properties (not used)

  • *args – positional arguments (not used)

  • **kwargs – keyword arguments (not used, set in the constructor)

Returns:

descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

property isFP

Whether the descriptor is a fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules by adding hydrogens.

Parameters:

mols (list[str | Mol]) – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

property usedBits: list[int] | None

List of bits of the fingerprint currently being used.

class qsprpred.data.descriptors.fingerprints.AvalonFP(nBits=1024, **kwargs)[source]

Bases: Fingerprint

Avalon fingerprint.

Variables:
  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to Avalon’s GetAvalonFP

Initialize the Avalon fingerprint.

Parameters:
  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to Avalon’s GetAvalonFP

property descriptors: list[str]

list of descriptors.

property dtype

Data type of the descriptor.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the Avalon fingerprints for the input molecules.

Parameters:
  • mols (list) – list of RDKit molecules

  • props (dict) – dictionary of properties (not used)

  • *args – positional arguments (not used)

  • **kwargs – keyword arguments (not used, set in the constructor)

Returns:

descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

property isFP

Whether the descriptor is a fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules by adding hydrogens.

Parameters:

mols (list[str | Mol]) – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

property usedBits: list[int] | None

List of bits of the fingerprint currently being used.

class qsprpred.data.descriptors.fingerprints.Fingerprint(used_bits: list[int] | None = None)[source]

Bases: DescriptorSet, ABC

Base class for calculation of binary fingerprints.

Variables:
  • usedBits (list) – list of bits of the fingerprint currently being used

  • descriptors (list) – list of descriptors

  • isFP (bool) – Whether the descriptor is a fingerprint

  • dtype (type) – Data type of the descriptor

Initialize the fingerprint.

Parameters:

used_bits (list) – list of bits of the fingerprint currently being used

property descriptors: list[str]

list of descriptors.

property dtype

Data type of the descriptor.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

abstract getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray

Method to calculate descriptors for a list of molecules.

This method should use molecules as they are without any preparation. Any preparation steps should be defined in the DescriptorSet.prepMols method., which is picked up by the main DescriptorSet.__call__.

Parameters:
  • mols (list) – list of SMILES or RDKit molecules

  • props (dict) – dictionary of properties

  • *args – positional arguments

  • **kwargs – keyword arguments

Returns:

descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

property isFP

Whether the descriptor is a fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol][source]

Prepare the molecules by adding hydrogens.

Parameters:

mols (list[str | Mol]) – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

property usedBits: list[int] | None

List of bits of the fingerprint currently being used.

class qsprpred.data.descriptors.fingerprints.LayeredFP(minPath=1, maxPath=7, nBits=2048, **kwargs)[source]

Bases: Fingerprint

Layered fingerprint.

Variables:
  • minPath (int) – minimum path length

  • maxPath (int) – maximum path length

  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s LayeredFingerprint

Initialize the layered fingerprint.

Parameters:
  • minPath (int) – minimum path length

  • maxPath (int) – maximum path length

  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s LayeredFingerprint

property descriptors: list[str]

list of descriptors.

property dtype

Data type of the descriptor.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the layered fingerprints for the input molecules.

Parameters:
  • mols (list) – list of RDKit molecules

  • props (dict) – dictionary of properties (not used)

  • *args – positional arguments (not used)

  • **kwargs – keyword arguments (not used, set in the constructor)

property isFP

Whether the descriptor is a fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules by adding hydrogens.

Parameters:

mols (list[str | Mol]) – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

property usedBits: list[int] | None

List of bits of the fingerprint currently being used.

class qsprpred.data.descriptors.fingerprints.MaccsFP(nBits=167, **kwargs)[source]

Bases: Fingerprint

MACCS keys fingerprint.

Variables:
  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s GetMACCSKeysFingerprint function

Initialize the MACCS keys fingerprint.

Parameters:
  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s GetMACCSKeysFingerprint function

property descriptors: list[str]

list of descriptors.

property dtype

Data type of the descriptor.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the MACCS keys fingerprints for the input molecules.

Parameters:
  • mols (list) – list of RDKit molecules

  • props (dict) – dictionary of properties (not used)

  • *args – positional arguments (not used)

  • **kwargs – keyword arguments (not used, set in the constructor)

Returns:

descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

property isFP

Whether the descriptor is a fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules by adding hydrogens.

Parameters:

mols (list[str | Mol]) – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

property usedBits: list[int] | None

List of bits of the fingerprint currently being used.

class qsprpred.data.descriptors.fingerprints.MorganFP(radius=2, nBits=2048, **kwargs)[source]

Bases: Fingerprint

Morgan fingerprint.

Variables:
  • radius (int) – radius of the fingerprint

  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s GetMorganFingerprintAsBitVect function

Initialize the Morgan fingerprint.

Parameters:
  • radius (int) – radius of the fingerprint

  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments

property descriptors: list[str]

list of descriptors.

property dtype

Data type of the descriptor.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the Morgan fingerprints for the input molecules.

Parameters:
  • mols (list) – list of RDKit molecules

  • props (dict) – dictionary of properties

  • *args – positional arguments (not used)

  • **kwargs – keyword arguments (not used, set in the constructor)

Returns:

descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

property isFP

Whether the descriptor is a fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules by adding hydrogens.

Parameters:

mols (list[str | Mol]) – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

property usedBits: list[int] | None

List of bits of the fingerprint currently being used.

class qsprpred.data.descriptors.fingerprints.PatternFP(nBits=2048, **kwargs)[source]

Bases: Fingerprint

Pattern fingerprint.

Variables:
  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s PatternFingerprint

Initialize the pattern fingerprint.

Parameters:
  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s PatternFingerprint

property descriptors: list[str]

list of descriptors.

property dtype

Data type of the descriptor.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the pattern fingerprints for the input molecules.

Parameters:
  • mols (list) – list of RDKit molecules

  • props (dict) – dictionary of properties (not used)

  • *args – positional arguments (not used)

  • **kwargs – keyword arguments (not used, set in the constructor)

Returns:

descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

property isFP

Whether the descriptor is a fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules by adding hydrogens.

Parameters:

mols (list[str | Mol]) – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

property usedBits: list[int] | None

List of bits of the fingerprint currently being used.

class qsprpred.data.descriptors.fingerprints.RDKitFP(minPath=1, maxPath=7, nBits=2048, **kwargs)[source]

Bases: Fingerprint

RDKit fingerprint.

This is a wrapper around RDKit’s RDKFingerprint function.

Variables:
  • minPath (int) – minimum path length

  • maxPath (int) – maximum path length

  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s RDKFingerprint

Initialize the RDKit fingerprint.

Parameters:
  • minPath (int) – minimum path length

  • maxPath (int) – maximum path length

  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s RDKFingerprint

property descriptors: list[str]

list of descriptors.

property dtype

Data type of the descriptor.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the RDKit fingerprints for the input molecules.

Parameters:
  • mols (list) – list of RDKit molecules

  • props (dict) – dictionary of properties (not used)

  • *args – positional arguments (not used)

  • **kwargs – keyword arguments (not used, set in the constructor)

Returns:

descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

property isFP

Whether the descriptor is a fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules by adding hydrogens.

Parameters:

mols (list[str | Mol]) – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

property usedBits: list[int] | None

List of bits of the fingerprint currently being used.

class qsprpred.data.descriptors.fingerprints.RDKitMACCSFP(used_bits: list[int] | None = None)[source]

Bases: Fingerprint

RDKits implementation of MACCS keys fingerprint.

Initialize the fingerprint.

Parameters:

used_bits (list) – list of bits of the fingerprint currently being used

property descriptors: list[str]

list of descriptors.

property dtype

Data type of the descriptor.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the MACCS keys fingerprints for the input molecules.

Parameters:
  • mols (list) – list of RDKit molecules

  • props (dict) – dictionary of properties (not used)

  • *args – positional arguments (not used)

  • **kwargs – keyword arguments (not used)

Returns:

np.ndarray of shape (n_mols, n_descriptors)

property isFP

Whether the descriptor is a fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules by adding hydrogens.

Parameters:

mols (list[str | Mol]) – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

property usedBits: list[int] | None

List of bits of the fingerprint currently being used.

class qsprpred.data.descriptors.fingerprints.TopologicalFP(nBits=2048, **kwargs)[source]

Bases: Fingerprint

Topological torsion fingerprint.

Variables:
  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments to pass to RDKit’s GetHashedTopologicalTorsionFingerprintAsBitVect function

Initialize the topological torsion fingerprint.

Parameters:
  • nBits (int) – number of bits in the fingerprint

  • kwargs (dict) – additional keyword arguments

property descriptors: list[str]

list of descriptors.

property dtype

Data type of the descriptor.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the topological torsion fingerprints for the input molecules.

Parameters:
  • mols (list) – list of RDKit molecules

  • props (dict) – dictionary of properties (not used)

  • *args – positional arguments (not used)

  • **kwargs – keyword arguments (not used)

Returns:

descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

property isFP

Whether the descriptor is a fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules by adding hydrogens.

Parameters:

mols (list[str | Mol]) – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

property usedBits: list[int] | None

List of bits of the fingerprint currently being used.

qsprpred.data.descriptors.sets module

Descriptorset: a collection of descriptors that can be calculated for a molecule. To add a new descriptor or fingerprint calculator: * Add a descriptor subclass for your descriptor calculator * Add a function to retrieve your descriptor by name to the descriptor retriever class

class qsprpred.data.descriptors.sets.DataFrameDescriptorSet(df: DataFrame, joining_cols: list[str] | None = None, suffix: str = '', source_is_multi_index: bool = False)[source]

Bases: DescriptorSet

DescriptorSet that uses a pandas.DataFrame of precalculated descriptors.

Variables:
  • requiredProps (list) – list of required properties

  • descriptors (list) – list of descriptor names

  • _df (pd.DataFrame) – dataframe of descriptors

  • _cols (list) – list of columns to use as the new multi-index

  • _descriptors (list) – list of descriptor names

Initialize the descriptor set with a dataframe of descriptors.

Parameters:
  • df – dataframe of descriptors

  • joining_cols – list of columns to use as joining index, properties of the same name must exist in the data set this descriptor is added to

  • suffix – suffix to add to the descriptor name

  • source_is_multi_index – assume that a multi-index is already present in the supplied dataframe. If True, the joining_cols argument must also be specified to indicate which properties should be used to create the multi-index in the destination.

property descriptors: list[str]

Return the descriptor names.

property dtype

Return the data type of the descriptor values.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDF()[source]

Return the dataframe of descriptors.

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Return the descriptors for the input molecules. It simply searches for descriptor values in the data frame using the idProp as index.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties

  • *args – positional arguments

  • **kwargs – keyword arguments

Returns:

descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

getIndex()[source]

Return the index of the dataframe.

getIndexCols()[source]

Return the index columns of the dataframe.

property isFP

Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules for descriptor calculation.

Parameters:

mols – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

Return the required properties for the dataframe.

static setIndex(df: DataFrame, cols: list[str]) DataFrame[source]

Create a multi-index from several columns of the data set.

Parameters:
  • df (pd.DataFrame) – DataFrame to set index for.

  • cols (list[str]) – List of columns to use as the new multi-index.

Returns:

DataFrame with the new multi-index set.

Return type:

pd.DataFrame

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

class qsprpred.data.descriptors.sets.DescriptorSet(id_prop: str | None = 'ID')[source]

Bases: JSONSerializable, MolProcessorWithID, ABC

MolProcessorWithID that calculates descriptors for a molecule.

Variables:
  • descriptors (list) – list of descriptor names

  • isFP (bool) – whether the descriptor set is a binary fingerprint

  • supportsParallel (bool) – whether the descriptor set supports parallel calculation

  • dtype – data type of the descriptor values

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:

id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

abstract property descriptors: list[str]

Return a list of current descriptor names.

property dtype

Return the data type of the descriptor values.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

abstract getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Method to calculate descriptors for a list of molecules.

This method should use molecules as they are without any preparation. Any preparation steps should be defined in the DescriptorSet.prepMols method., which is picked up by the main DescriptorSet.__call__.

Parameters:
  • mols (list) – list of SMILES or RDKit molecules

  • props (dict) – dictionary of properties

  • *args – positional arguments

  • **kwargs – keyword arguments

Returns:

descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

property isFP

Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None][source]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]][source]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol][source]

Prepare the molecules for descriptor calculation.

Parameters:

mols – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str][source]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame[source]

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

class qsprpred.data.descriptors.sets.DrugExPhyschem(physchem_props: list[str] | None = None)[source]

Bases: DescriptorSet

Various properties used for scoring in DrugEx.

Variables:
  • props (list) – list of properties to calculate

  • descriptors (list) – list of descriptor names

  • _prop_dict (dict) – dictionary of physchem property names and their corresponding functions

Initialize the descriptorset with Property arguments (a list of properties to calculate) to select a subset.

Parameters:

physchem_props – list of properties to calculate

property descriptors: list[str]

Return the list of properties to calculate.

Returns:

list of property names

Return type:

list[str]

property dtype

Return the data type of the descriptor values.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols, props, *args, **kwargs)[source]

Calculate the DrugEx properties for a molecule.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

  • args – positional arguments

  • kwargs – keyword arguments

Returns:

array of descriptor values of shape (n_mols, n_descriptors)

Return type:

np.ndarray

static getPropDict() dict[str, callable][source]

Return a dictionary of DrugEx properties and their corresponding functions.

property isFP

Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules for descriptor calculation.

Parameters:

mols – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

class qsprpred.data.descriptors.sets.PredictorDesc(model: Type[ForwardRef('QSPRModel')] | str)[source]

Bases: DescriptorSet

MoleculeDescriptorSet that uses a Predictor object to calculate descriptors from a molecule.

Variables:
  • model (QSPRModel) – a fitted model instance

  • descriptors (list) – list of descriptor names

  • _descriptors (list) – list of descriptors

Initialize the descriptorset with a QSPRModel object.

Parameters:

model (QSPRModel) – a fitted model instance or a path to the model’s meta file

property descriptors

Return the descriptors names.

property dtype

Return the data type of the descriptor values.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[str | Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the descriptor for a list of molecules.

Parameters:
  • mols (list) – list of smiles or rdkit molecules

  • props (dict) – dictionary of properties for the passed molecules

  • args – positional arguments

  • kwargs – keyword arguments

Returns:

array of descriptor values of shape (n_mols, n_descriptors)

Return type:

np.ndarray

property isFP

Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules for descriptor calculation.

Parameters:

mols – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

class qsprpred.data.descriptors.sets.RDKitDescs(rdkit_descriptors: list[str] | None = None, include_3d: bool = False)[source]

Bases: DescriptorSet

Calculate RDkit descriptors.

Variables:
  • descriptors (list) – list of RDKit descriptors to calculate

  • include3D (bool) – include 3D descriptors

Initialize the descriptorset with a list of RDKit descriptors to calculate.

Parameters:
  • rdkit_descriptors (list[str]) – list of descriptors to calculate, if none, all 2D rdkit descriptors will be calculated

  • include_3d – if True, 3D descriptors will be calculated

property descriptors: list[str]

Return the list of RDKit descriptors to calculate.

property dtype

Return the data type of the descriptor values.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the RDKit descriptors for a molecule.

Parameters:
  • mols (list[Mol]) – list of RDKit molecules

  • props (dict[str, list[Any]]) – dictionary of properties for the passed molecules

  • args – positional arguments

  • kwargs – keyword arguments

Returns:

array of descriptor values of shape (n_mols, n_descriptors)

Return type:

np.ndarray

property isFP

Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules for descriptor calculation.

Parameters:

mols – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

class qsprpred.data.descriptors.sets.RandomDescs(n: int = 10, missing: float | int | None = None, seed: int | None = None)[source]

Bases: DescriptorSet, Randomized

Descriptorset of a set of random numbers as descriptors.

Note. that when setting the randomState the seed of random number generator is based on the idProp of the dataset. Therefore, if the id of the molecules changes the results will be different, however if the order of the molecules is shuffled the descriptors per molecule will remain the same.

Variables:
  • n (int) – number of random descriptors to generate

  • missing (float | int | None) – fraction of missing values or number of missing values per descriptor

  • randomState (int | None) – random state to use for the random number generator,

Initialize the descriptorset with a number of random descriptors.

Parameters:
  • n (int) – number of random descriptors to generate

  • missing (float | int| None) – fraction of missing values or number of missing values per descriptor

property descriptors: list[str]

Return the descriptor names.

property dtype

Return the data type of the descriptor values.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[str | Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the descriptor for a list of molecules.

Parameters:
  • mols (list) – list of smiles or rdkit molecules

  • props (dict) – dictionary of properties for the passed molecules

  • args – positional arguments

  • kwargs – keyword arguments

Returns:

array of descriptor values of shape (n_mols, n)

Return type:

np.ndarray

property isFP

Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules for descriptor calculation.

Parameters:

mols – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property randomState: int | None

Get the random state for the object.

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

class qsprpred.data.descriptors.sets.SmilesDesc(id_prop: str | None = 'ID')[source]

Bases: DescriptorSet

Descriptorset that calculates descriptors from a SMILES sequence.

Variables:

descriptors (list) – list of descriptor

Initialize the processor with the name of the property that contains the molecule’s unique identifier.

Parameters:

id_prop (str) – Name of the property that contains the molecule’s unique identifier. Defaults to “QSPRID”.

property descriptors: list[str]

Return the descriptor names.

property dtype

Return the data type of the descriptor values.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Return smiles as descriptors.

Parameters:
  • mols (list) – list of smiles or rdkit molecules

  • props (dict) – dictionary of properties for the passed molecules

  • args – positional arguments

  • kwargs – keyword arguments

Returns:

array of descriptor values of shape (n_mols, n_descriptors)

Return type:

(np.ndarray)

property isFP

Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules for descriptor calculation.

Parameters:

mols – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame[source]

handle infinite values in the dataframe

Just return the dataframe as is since SMILES are strings.

Parameters:

(pd.DataFrame) – dataframe to treat

class qsprpred.data.descriptors.sets.TanimotoDistances(list_of_smiles: list[str], fingerprint_type: Type[ForwardRef('Fingerprint')], *args, **kwargs)[source]

Bases: DescriptorSet

Calculate Tanimoto distances to a list of SMILES sequences.

Variables:
  • fingerprintType (Fingerprint) – fingerprint type to use.

  • descriptors (list) – list of descriptor names

  • _descriptors (list) – list of SMILES sequences to calculate the distances.

  • _argsfingerprint arguments

  • _kwargsfingerprint keyword arguments, should contain fingerprint_type

Initialize the descriptorset with a list of SMILES sequences and a fingerprint type.

Parameters:
  • list_of_smiles (list of strings) – list of SMILES to calculate the distances.

  • fingerprint_type (Fingerprint) – fingerprint type to use.

  • *argsfingerprint arguments

  • **kwargsfingerprint keyword arguments, should contain fingerprint_type

calculate_fingerprints(list_of_smiles: list[str]) list[ExplicitBitVect][source]

Calculate the fingerprints for the list of SMILES sequences

Parameters:

list_of_smiles (list[str]) – list of SMILES sequences

Returns:

list of fingerprints

Return type:

list[DataStructs.ExplicitBitVect]

property descriptors: list[str]

Return the list of SMILES sequences to calculate the distances.

property dtype

Return the data type of the descriptor values.

classmethod fromFile(filename: str) Any

Initialize a new instance from a JSON file.

Parameters:

filename (str) – path to the JSON file

Returns:

new instance of the class

Return type:

instance (object)

classmethod fromJSON(json: str) Any

Reconstruct object from a JSON string.

Parameters:

json (str) – JSON string of the object

Returns:

reconstructed object

Return type:

obj (object)

getDescriptors(mols: list[Mol], props: dict[str, list[Any]], *args, **kwargs) ndarray[source]

Calculate the Tanimoto distances to the list of SMILES sequences.

Parameters:
  • mols (List[str] or List[rdkit.Chem.rdchem.Mol]) – SMILES sequences or RDKit molecules to calculate the distances.

  • props (dict) – dictionary of properties for the passed molecules

  • args – positional arguments for the fingerprint calculator

  • kwargs – keyword arguments for the fingerprint calculator

property isFP

Return True if descriptor set is a binary fingerprint.

static iterMols(mols: list[str | Mol], to_list=False) list[Mol] | Generator[Mol, None, None]

Create a molecule generator or list from RDKit molecules or SMILES.

Parameters:
  • mols – list of molecules (SMILES str or RDKit Mol)

  • to_list – if True, return a list instead of a generator

Returns:

generator or list of RDKit molecules

Return type:

(list[Mol] | Generator[Mol, None, None])

iterMolsAndIDs(mols, props: dict[str, list] | None)

Iterate over molecules and their corresponding IDs regardless of the input molecule format. This is just a helper function that will detect the input and yield the molecule and its ID.

Parameters:
  • mols (list[str | Mol | StoredMol]) – A list of SMILES or RDKit molecules to process.

  • props (dict) – An optional dictionary of properties related to the molecules to process.

Returns:

A tuple of the molecules and their IDs.

Return type:

tuple[Mol, str]

parsePropsAndMols(mols: list[str | Mol], props: dict[str, list[Any]] | None) tuple[list[Mol], dict[str, list[Any]]]

Parse the properties and molecules passed to the descriptor set.

Parameters:
  • mols – list of SMILES or RDKit molecules

  • props – dictionary of properties for the passed molecules

Returns:

list of RDKit molecules and dictionary of properties

Return type:

(list[Mol], dict[str, list])

Raises:

AssertionError – if the properties are not provided for a StoredMol instance

prepMols(mols: list[str | Mol]) list[Mol]

Prepare the molecules for descriptor calculation.

Parameters:

mols – list of SMILES or RDKit molecules

Returns:

list of RDKit molecules

Return type:

(list[Mol])

property requiredProps: list[str]

The properties required by the processor. This is to inform the caller that the processor requires certain properties to be passed to the __call__ method or via the props attribute of StoredMol instances.

property supportsParallel: bool

Return True if the descriptor set supports parallel calculation.

toFile(filename: str) str

Serialize object to a JSON file. This JSON file should contain all data necessary to reconstruct the object.

Parameters:

filename (str) – filename to save object to

Returns:

absolute path to the saved JSON file of the object

Return type:

filename (str)

toJSON() str
Serialize object to a JSON string. This JSON string should

contain all data necessary to reconstruct the object.

Returns:

JSON string of the object

Return type:

json (str)

transformToFeatureNames() list[str]

Transform the descriptor names to feature names by adding the descriptor set name as a prefix.

Returns:

list of feature names

Return type:

(list[str])

static treatInfs(df: DataFrame) DataFrame

Replace infinite values by NaNs.

Parameters:

df (pd.DataFrame) – dataframe to treat

Returns:

dataframe with infinite values replaced by NaNs

Return type:

(pd.DataFrame)

qsprpred.data.descriptors.tests module

class qsprpred.data.descriptors.tests.TestDescriptorCalculation(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase

Test the calculation of descriptors.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

static getDescList()[source]
classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Set up the test Dataframe.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testDropping()[source]

Test dropping of descriptors from data sets.

testSwitching = None
testSwitching_0(**kw)

Test if the feature calculator can be switched to a new dataset [with n_cpu=1, chunk_size=None].

testSwitching_1(**kw)

Test if the feature calculator can be switched to a new dataset [with n_cpu=2, chunk_size=None].

testSwitching_2(**kw)

Test if the feature calculator can be switched to a new dataset [with n_cpu=1, chunk_size=17].

testSwitching_3(**kw)

Test if the feature calculator can be switched to a new dataset [with n_cpu=2, chunk_size=17].

class qsprpred.data.descriptors.tests.TestDescriptorSets(methodName='runTest')[source]

Bases: DataSetsPathMixIn, QSPRTestCase

Test the descriptor sets.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

classmethod addClassCleanup(function, /, *args, **kwargs)

Same as addCleanup, except the cleanup items are called even if setUpClass fails (unlike tearDownClass).

addCleanup(function, /, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().

  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.

assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertCountEqual(first, second, msg=None)

Asserts that two iterables have the same elements, the same number of times, without regard to order.

self.assertEqual(Counter(list(first)),

Counter(list(second)))

Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.

  • [0, 0, 1] and [0, 1] compare unequal.

assertDictEqual(d1, d2, msg=None)
assertEndsWith(s, suffix, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertHasAttr(obj, name, msg=None)
assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertIsSubclass(cls, superclass, msg=None)
assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.

  • list2 – The second list to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNoLogs(logger=None, level=None)

Fail unless no log messages of level level or higher are emitted on logger_name or its children.

This method must be used as a context manager.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the difference between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotEndsWith(s, suffix, msg=None)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotHasAttr(obj, name, msg=None)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotIsSubclass(cls, superclass, msg=None)
assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotStartsWith(s, prefix, msg=None)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.

assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.

  • seq2 – The second sequence to compare.

  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.

  • set2 – The second set to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertStartsWith(s, prefix, msg=None)
assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.

  • tuple2 – The second tuple to compare.

  • msg – Optional message to use on failure instead of a list of differences.

assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.

  • expected_regex – Regex (re.Pattern object or string) expected to be found in error message.

  • args – Function to be called and extra positional args.

  • kwargs – Extra kwargs.

  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.

clearGenerated()

Remove the directories that are used for testing.

countTestCases()
createLargeMultitaskDataSet(name='QSPRDataset_multi_test', target_props=[{'name': 'HBD', 'task': TargetTasks.MULTICLASS, 'th': [-1, 1, 2, 100]}, {'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createLargeTestDataSet(name='QSPRDataset_test_large', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a large dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createSmallTestDataSet(name='QSPRDataset_test_small', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=42, drop_empty_target_props=True)

Create a small dataset for testing purposes.

Parameters:
  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

createTestDataSetFromFrame(df, name='QSPRDataset_test', target_props=[{'name': 'CL', 'task': TargetTasks.REGRESSION}], random_state=None, n_jobs=1, chunk_size=None, drop_empty_target_props=True)

Create a dataset for testing purposes from the given data frame.

Parameters:
  • df (pd.DataFrame) – data frame containing the dataset

  • name (str) – name of the dataset

  • target_props (List of dicts or TargetProperty) – list of target properties

  • random_state (int) – random state to use for splitting and shuffling

  • prep (dict) – dictionary containing preparation settings

  • n_jobs (int) – number of jobs to use for parallel processing

  • chunk_size (int) – size of chunks to use per job in parallel processing

Returns:

a QSPRDataSet object

Return type:

QSPRDataSet

debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
classmethod doClassCleanups()

Execute all class cleanup functions. Normally called for you after tearDownClass.

doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

classmethod enterClassContext(cm)

Same as enterContext, but class-wide.

enterContext(cm)

Enters the supplied context manager.

If successful, also adds its __exit__ method as a cleanup function and returns the result of the __enter__ method.

fail(msg=None)

Fail immediately, with the given message.

failureException

alias of AssertionError

classmethod getAllDescriptorSets()

Return a list of (ideally) all available descriptor sets. For now they need to be added manually to the list below.

TODO: would be nice to create the list automatically by implementing a descriptor set registry that would hold all installed descriptor sets.

Returns:

list of DescriptorCalculator objects

Return type:

list

getBigDF()

Get a large data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

classmethod getDataPrepGrid()

Return a list of many possible combinations of descriptor calculators, splits, feature standardizers, feature filters and data filters. Again, this is not exhaustive, but should cover a lot of cases.

Returns:

a generator that yields tuples of all possible combinations as stated above, each tuple is defined as: (descriptor_calculator, split, feature_standardizer, feature_filters, data_filters)

Return type:

grid

classmethod getDefaultCalculatorCombo()

Makes a list of default descriptor calculators that can be used in tests.

It creates a calculator with only morgan fingerprints and rdkit descriptors, but also one with them both to test behaviour with multiple descriptor sets. Override this method if you want to test with other descriptor sets and calculator combinations.

Returns:

list of created DescriptorCalculator objects

Return type:

list

static getDefaultPrep(add_imputer=None)

Return a dictionary with default preparation settings.

classmethod getPrepCombos()

Return a list of all possible preparation combinations as generated by getDataPrepGrid as well as their names. The generated list can be used to parameterize tests with the given named combinations.

Returns:

list of `list`s of all possible combinations of preparation

Return type:

list

getSmallDF()

Get a small data frame for testing purposes.

Returns:

a pandas.DataFrame containing the dataset

Return type:

pd.DataFrame

getStorage(df, name, n_jobs=1, chunk_size=None)
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]

Create the test Dataframe.

classmethod setUpClass()

Hook method for setting up class fixture before running tests in the class.

setUpPaths()

Create the directories that are used for testing.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Remove all files and directories that are used for testing.

classmethod tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

testDrugExPhyschem()[source]

Test the DrugExPhyschem descriptor calculator.

testFingerprintSet()[source]

Test the fingerprint set descriptor calculator.

testPredictorDescriptor()[source]

Test the PredictorDesc descriptor set.

testRDKitDescs()[source]

Test the rdkit descriptors calculator.

testRandomDescs()[source]

Test the random descriptors calculator.

testSmilesDesc()[source]

Test the smiles descriptors calculator.

testTanimotoDistances()[source]

Test the Tanimoto distances descriptor calculator, which calculates the Tanimoto distances between a list of SMILES.

Module contents