qsprpred.extra.data.storage.protein.interfaces package
Submodules
qsprpred.extra.data.storage.protein.interfaces.protein_storage module
- class qsprpred.extra.data.storage.protein.interfaces.protein_storage.ProteinStorage[source]
Bases:
PropertyStorage,ABCStorage for proteins.
- Variables:
sequenceProp (str) – name of the property that contains all protein sequences
proteins (Iterable[StoredProtein]) – all proteins in the store
- abstract addEntries(ids: list[str], props: dict[str, list], raise_on_existing: bool = True)
Add entries to the storage.
- abstract addProperty(name: str, data: Sized, ids: list[str] | None = None)
Add a property to the dataset. The supplied data should be a sized list of values of the same length as the number of entries in the storage.
- abstract add_protein(protein: StoredProtein, raise_on_existing=True) StoredProtein[source]
Add a protein to the store.
- Parameters:
protein (StoredProtein) – protein sequence
raise_on_existing (bool) – raise an exception if the protein already exists in the store
- Returns:
instance of the added protein
- Return type:
- abstract apply(func: callable, func_args: list | None = None, func_kwargs: dict | None = None, on_props: tuple[str, ...] | None = None, as_df: bool = False) Generator[Iterable[Any], None, None]
Apply a function on all or selected properties of the chunks of data. The properties are supplied as the first positional argument to the function. The format of the properties is up to the downstream implementation, but it should always be a single object supplied as the first parameter.
- Parameters:
func (callable) – The function to apply.
func_args (list, optional) – The positional arguments of the function.
func_kwargs (dict, optional) – The keyword arguments of the function.
on_props (list, optional) – The properties to apply the function on.
as_df (bool, optional) – Provide properties as a DataFrame to the function.
- Returns:
A generator that yields the results of the function applied to each chunk.
- abstract clear()
Delete entries in the persistent storage.
- abstract dropEntries(ids: Iterable[str])
Drop entries from the storage.
- Parameters:
ids (list) – The IDs of the entries to drop.
- abstract getDF() DataFrame
Get the stored properties as a pandas DataFrame.
- Returns:
The data as a pandas DataFrame.
- Return type:
pd.DataFrame
- abstract getPCMInfo() tuple[dict[str, str], dict][source]
Return a dictionary mapping of protein ids to their sequences and a dictionary with metadata for each. This is mainly for compatibility with QSPRpred’s PCM modelling API.
- Returns:
Dictionary of protein sequences. metadata (dict): Dictionary of metadata for each protein.
- Return type:
sequences (dict)
- abstract getProperty(name: str, ids: tuple[str] | None = None) Iterable[Any]
Get values of a given property.
- abstract getProtein(protein_id: str) StoredProtein[source]
Get a protein from the store using its name.
- Parameters:
protein_id (str) – name of the protein to search
- Returns:
instance of
Protein- Return type:
- abstract getSubset(subset: Iterable[str], ids: Iterable[str] | None = None) PropertyStorage
Get a subset of the storage for the given properties.
- Parameters:
- Returns:
The subset of the storage.
- Return type:
- abstract iterChunks(size: int | None = None, on_props: list | None = None) Generator[list[Any], None, None]
Iterate over chunks of molecules across the store.
- Returns:
an iterable of lists of stored molecules
- abstract property metaFile: str
Get the absolute path to the metadata file that describes how the persisted data can be accessed. This can be used to load the object back from storage using the
fromFileclass method.- Returns:
The absolute path to the metadata file.
- Return type:
- abstract property proteins: Iterable[StoredProtein]
Get all proteins in the store.
- Returns:
iterable of
Proteininstances- Return type:
Iterable[StoredProtein]
- abstract reload()
Reset the current state by reloading from storage.
- abstract removeProperty(name: str)
Remove a property from the dataset.
- Parameters:
name (str) – The name of the property.
- abstract save() str
Save current state to storage and return the path to the serialized file.
- Returns:
The path to the serialized file.
- Return type:
- abstract searchOnProperty(prop_name: str, values: list[float | int | str], exact=False) PropSearchable
Search the molecules within this
MoleculeDataSeton a property value.- Parameters:
prop_name – Name of the column to search on.
values – Values to search for.
exact – Whether to search for exact matches or not.
- Returns:
Another instance that can be filtered further.
- Return type:
- abstract property sequenceProp: str
Get the name of the property that contains all protein sequences.
qsprpred.extra.data.storage.protein.interfaces.storedprotein module
- class qsprpred.extra.data.storage.protein.interfaces.storedprotein.StoredProtein[source]
Bases:
ABCA protein object.
- Variables:
id (str) – id of the protein
sequence (str) – sequence of the protein
representations (Iterable[StoredProtein]) – representations of the protein
- abstract property parent: StoredProtein
Get the parent protein.
- abstract property representations: Iterable[StoredProtein]
Get all representations of the protein.