qsprpred.extra.data.utils package
Subpackages
- qsprpred.extra.data.utils.testing package
- Submodules
- qsprpred.extra.data.utils.testing.path_mixins module
DataSetsMixInExtrasDataSetsMixInExtras.clearGenerated()DataSetsMixInExtras.createLargeMultitaskDataSet()DataSetsMixInExtras.createLargeTestDataSet()DataSetsMixInExtras.createPCMDataSet()DataSetsMixInExtras.createSmallTestDataSet()DataSetsMixInExtras.createTestDataSetFromFrame()DataSetsMixInExtras.getAllDescriptors()DataSetsMixInExtras.getAllProteinDescriptors()DataSetsMixInExtras.getBigDF()DataSetsMixInExtras.getDataPrepGrid()DataSetsMixInExtras.getDefaultCalculatorCombo()DataSetsMixInExtras.getDefaultPrep()DataSetsMixInExtras.getMSAProvider()DataSetsMixInExtras.getPCMDF()DataSetsMixInExtras.getPCMSeqProvider()DataSetsMixInExtras.getPCMTargetsDF()DataSetsMixInExtras.getPrepCombos()DataSetsMixInExtras.getSmallDF()DataSetsMixInExtras.setUpPaths()DataSetsMixInExtras.tearDown()DataSetsMixInExtras.validate_split()
- Module contents
Submodules
qsprpred.extra.data.utils.msa_calculator module
Various implementations of multiple sequence alignment (MSA).
The MSA providers are used to align sequences for protein descriptor calculation. This
is required for the calculation of descriptors that are based on sequence alignments,
such as ProDec.
- class qsprpred.extra.data.utils.msa_calculator.BioPythonMSA(out_dir: str = '.', fname: str = 'alignment.aln-fasta.fasta')[source]
Bases:
MSAProvider,JSONSerializable,ABCCommon functionality for MSA providers using BioPython command line wrappers.
- Variables:
outDir – directory to save the alignment to
fname – file name of the alignment file
cache – cache of alignments performed so far by the provider
Initializes the MSA provider.
- Parameters:
- abstract property cmd: str
The command that runs the alignment algorithm.
- Returns:
the command to run the alignment algorithm
- Return type:
cmd (str)
- property current
The current alignment.
Returns the current alignment as a dictionary where keys are sequence IDs as
strand values are aligned sequences asstr. The values are of the same length and contain gaps (“-”) where necessary. If the alignment is not yet calculated,Noneis returned.
- getFromCache(target_ids: list[str]) dict[slice(<class 'str'>, <class 'str'>, None)] | None[source]
Gets the alignment from the cache if it exists for a
listof sequence IDs. :param target_ids: list of sequence IDs to get the alignment for, :type target_ids: list[str]
- parseAlignment(sequences: dict[slice(<class 'str'>, <class 'str'>, None)]) dict[str, str][source]
Parse the alignment from the output file of the alignment algorithm.
- Parameters:
sequences – the original dictionary of sequences that were aligned
- Returns:
the aligned sequences mapped to their IDs
- parseSequences(sequences: dict[str, str], **kwargs) tuple[str, int][source]
Create object with sequences and the passed metadata.
Saves the sequences to a file that will serve as input to the command line tools.
- saveToCache(target_ids: list[str], alignment: dict[slice(<class 'str'>, <class 'str'>, None)])[source]
Saves the alignment to the cache for a
listof sequence IDs.
- class qsprpred.extra.data.utils.msa_calculator.ClustalMSA(out_dir: str = '.', fname: str = 'alignment.aln-fasta.fasta')[source]
Bases:
BioPythonMSAMultiple sequence alignment provider using the Clustal Omega Linux program - http://www.clustal.org/omega/
Uses the BioPython wrapper for Clustal Omega - https://biopython.org/docs/1.76/api/Bio.Align.Applications.html#Bio.Align.Applications.ClustalOmegaCommandline
Initializes the MSA provider.
- Parameters:
- property cmd: str
The command that runs the alignment algorithm.
- Returns:
the command to run the alignment algorithm
- Return type:
cmd (str)
- property current
The current alignment.
Returns the current alignment as a dictionary where keys are sequence IDs as
strand values are aligned sequences asstr. The values are of the same length and contain gaps (“-”) where necessary. If the alignment is not yet calculated,Noneis returned.
- getFromCache(target_ids: list[str]) dict[slice(<class 'str'>, <class 'str'>, None)] | None
Gets the alignment from the cache if it exists for a
listof sequence IDs. :param target_ids: list of sequence IDs to get the alignment for, :type target_ids: list[str]
- parseAlignment(sequences: dict[slice(<class 'str'>, <class 'str'>, None)]) dict[str, str]
Parse the alignment from the output file of the alignment algorithm.
- Parameters:
sequences – the original dictionary of sequences that were aligned
- Returns:
the aligned sequences mapped to their IDs
- parseSequences(sequences: dict[str, str], **kwargs) tuple[str, int]
Create object with sequences and the passed metadata.
Saves the sequences to a file that will serve as input to the command line tools.
- saveToCache(target_ids: list[str], alignment: dict[slice(<class 'str'>, <class 'str'>, None)])
Saves the alignment to the cache for a
listof sequence IDs.
- class qsprpred.extra.data.utils.msa_calculator.MAFFT(out_dir: str = '.', fname: str = 'alignment.aln-fasta.fasta')[source]
Bases:
BioPythonMSAMultiple sequence alignment provider using the MAFFT cross-platform program - https://mafft.cbrc.jp/alignment/software/
Uses the BioPython wrapper for MAFFT: - https://biopython.org/docs/1.76/api/Bio.Align.Applications.html#Bio.Align.Applications.MafftCommandline
Initializes the MSA provider.
- Parameters:
- property cmd: str
The command that runs the alignment algorithm.
- Returns:
the command to run the alignment algorithm
- Return type:
cmd (str)
- property current
The current alignment.
Returns the current alignment as a dictionary where keys are sequence IDs as
strand values are aligned sequences asstr. The values are of the same length and contain gaps (“-”) where necessary. If the alignment is not yet calculated,Noneis returned.
- getFromCache(target_ids: list[str]) dict[slice(<class 'str'>, <class 'str'>, None)] | None
Gets the alignment from the cache if it exists for a
listof sequence IDs. :param target_ids: list of sequence IDs to get the alignment for, :type target_ids: list[str]
- parseAlignment(sequences: dict[slice(<class 'str'>, <class 'str'>, None)]) dict[str, str]
Parse the alignment from the output file of the alignment algorithm.
- Parameters:
sequences – the original dictionary of sequences that were aligned
- Returns:
the aligned sequences mapped to their IDs
- parseSequences(sequences: dict[str, str], **kwargs) tuple[str, int]
Create object with sequences and the passed metadata.
Saves the sequences to a file that will serve as input to the command line tools.
- saveToCache(target_ids: list[str], alignment: dict[slice(<class 'str'>, <class 'str'>, None)])
Saves the alignment to the cache for a
listof sequence IDs.
- class qsprpred.extra.data.utils.msa_calculator.MSAProvider[source]
Bases:
FileSerializable,ABCInterface for multiple sequence alignment providers.
This interface defines how calculation and storage of multiple sequence alignments (MSAs) is handled.
- abstract property current: dict[slice(<class 'str'>, <class 'str'>, None)] | None
The current alignment.
Returns the current alignment as a dictionary where keys are sequence IDs as
strand values are aligned sequences asstr. The values are of the same length and contain gaps (“-”) where necessary. If the alignment is not yet calculated,Noneis returned.