drugex.utils package

Submodules

drugex.utils.download module

Utility functions to download files.

drugex.utils.download.check_sha256sum(filename, sha256)[source]

Check the SHA256 checksum of a file corresponds to that given.

Parameters:
  • filename (str) – Path the file to check

  • sha256 (str) – SHA256 checksum to compare that of the downloaded file to

Returns:

True if the file’s SHA256 checksum and the one provided match

drugex.utils.download.download_file(url, out_path, extract_out_path=None, byte_size=None, sha256sum=None, progress=True, callback=None) None[source]

Download a file and extract its content if is a ZIP or TAR.GZ file.

Parameters:
  • url (str) – URL of the file to be downloaded.

  • out_path (str) – Path the file should be written to

  • extract_out_path (str) – Path to extract the content of the ZIP/TAR.GZ file into

  • byte_size (int) – Size in bytes of the file to be downloaded; ignored if None

  • sha256sum (str) – SHA256 checksum to compare that of the downloaded file to

  • progress (bool) – should progress be shown

  • callback (Callable[[], Any]) – callback function to be called after each chunk of the file is downloaded

drugex.utils.download.sha256sum(filename, blocksize=None)[source]

Calculate the SHA256 cheksum of a file.

Parameters:
  • filename – path of the file

  • blocksize – size of iterative chunks for calculation (default: 64 KiB)

Returns:

SHA256 checksum is hexadecimal

drugex.utils.fingerprints module

drugex.utils.fingerprints.get_fingerprint(mol: Mol, fp_type: str)[source]

drugex.utils.gcmol module

gcmol

Created by: Martin Sicho On: 10.06.22, 16:50

drugex.utils.gcmol.canonicalize(smiles: str, include_stereocenters=True) str | None[source]

Canonicalize the SMILES strings with RDKit. The algorithm is detailed under https://pubs.acs.org/doi/full/10.1021/acs.jcim.5b00543 :param smiles: SMILES string to canonicalize :param include_stereocenters: whether to keep the stereochemical information in the canonical SMILES string

Returns:

Canonicalized SMILES string, None if the molecule is invalid.

drugex.utils.gcmol.canonicalize_list(smiles_list: Iterable[str], include_stereocenters=True) List[str][source]

Canonicalize a list of smiles. Filters out repetitions and removes corrupted molecules. :param smiles_list: molecules as SMILES strings :param include_stereocenters: whether to keep the stereochemical information in the canonical SMILES strings

Returns:

The canonicalized and filtered input smiles.

drugex.utils.gcmol.remove_duplicates(list_with_duplicates)[source]

Removes the duplicates and keeps the ordering of the original list. For duplicates, the first occurrence is kept and the later occurrences are ignored. :param list_with_duplicates: list that possibly contains duplicates

Returns:

A list with no duplicates.

drugex.utils.optim module

class drugex.utils.optim.ScheduledOptim(optimizer, init_lr, d_model, n_warmup_steps=4000)[source]

Bases: object

A simple wrapper class for learning rate scheduling

step()[source]

Step with the inner optimizer

zero_grad()[source]

Zero out the gradients with the inner optimizer

drugex.utils.pareto module

drugex.utils.pareto.get_Pareto_fronts(scores)[source]

Identify the Pareto fronts from a given set of scores.

Parameters:

scores (numpy.ndarray) – An (n_points, n_scores) array of scores.

Returns:

A list containing the indices of points belonging to each Pareto front.

Return type:

list of numpy.ndarray

Module contents