Overview of available features
Data Sources
DataSource: Base class for data sources.
Data sources are used to load data from a source programmatically.
Papyrus: Papyrus (See data collection with Papyrus tutorial.)
Storage
ChemStore: Interface for storing and managing chemical data.
Data storage classes are used to store and manage chemical data.
More information can be found in the basic data representation tutorial
and the advanced data representation tutorial.
PandasChemStore: PandasChemStore
SMILES Standardizers
ChemStandardizer: Base class for SMILES standardizers.
Standardizers to convert SMILES to a standardized form.
ChemblStandardizer: ChemblStandardizerPapyrusStandardizer: PapyrusStandardizerNaiveStandardizer: NaiveStandardizer
Data Filters
DataFilter: Base class for data filters.
Data filters are used to filter data based on some criteria. Examples can be found in the data preparation tutorial.
CategoryFilter: CategoryFilterRepeatsFilter: RepeatsFilter
Descriptor Sets
DescriptorSet: Base class for descriptor sets.
Descriptor sets are used to calculate molecular descriptors for a set of molecules. Examples can be found in the descriptor calculation tutorial.
DrugExPhyschem: DrugExPhyschemPredictorDesc:PredictorDescRDKitDescs: RDKitDescsSmilesDescs: SmilesDescsTanimotoDistances: TanimotoDistancesDataFrameDescriptorSet: DataFrameDescriptorSetRandomDescs: RandomDescsFingerprint: FingerprintAtomPairFP: AtomPairFPAvalonFP: AvalonFPLayeredFP: LayeredFPMaccsFP: MaccsFPMorganFP: MorganFPPatternFP: PatternFPRDKitFP: RDKitFPRDKitMACCSFP: RDKitMACCSFPTopologicalFP: TopologicalFP
ExtendedValenceSignature: ExtendedValenceSignatureMold2: Mold2Mordred: MordredPaDEL: PaDELProteinDescriptorSet: ProteinDescriptorSetProDec: ProDec
Fingerprint: FingerprintCDKAtomPairs2DFP: CDKAtomPairs2DFPCDKEStateFP: CDKEStateFPCDKExtendedFP: CDKExtendedFPCDKFP: CDKFPCDKGraphOnlyFP: CDKGraphOnlyFPCDKKlekotaRothFP: CDKKlekotaRothFPCDKMACCSFP: CDKMACCSFPCDKPubchemFP: CDKPubchemFPCDKSubstructureFP: CDKSubstructureFP
Data Splitters
DataSplit: Base class for data splitters.
Data splitters are used to split data into training and test sets. Examples can be found in the data splitting tutorial.
RandomSplit: RandomSplitScaffoldSplit: ScaffoldSplitterTemporalSplit: StratifiedSplitterManualSplit: ManualSplitBootstrapSplit: BootstrapSplitGBMTDataSplit: GBMTDataSplitGBMTRandomSplit: GBMTRandomSplitClusterSplit: ClusterSplit
LeaveTargetsOut: LeaveTargetsOutPCMSplit: PCMSplitTemporalPerTarget: TemporalPerTarget
Data Filters
DataFilter: Base class for Data Filters.
Data filters are used to filter rows from a dataframe. Examples can be found in the data preparation tutorial.
CategoryFilter: CategoryFilterRepeatsFilter: RepeatsFilterNaNFilter: NaNFilterOutlierFilter: OutlierFilter
Feature Filters
FeatureFilter: Base class for feature filters.
Feature filters are used to filter features based on some criteria. Examples can be found in the data preparation tutorial.
HighCorrelationFilter: HighCorrelationFilterLowVarianceFilter: LowVarianceFilterBorutaFilter: BorutaFilter
Feature Transformers
FeatureTransformer: Base class for feature transformers.
Feature transformers are used for feature standardization and transformation Examples can be found in the data preparation tutorial.
SklearnStep: SklearnStep
Target Transformers
TargetTransformer: Base class for target transformers.
Target transformers are used for target discretization and transformation.
Discretizer: DiscretizerSimpleTargetTransformer: SimpleTargetTransformer
Imputers
Imputer: Base class for imputers.
Imputers are used for filling missing values in the descriptor or target values. Examples can be found in the data preparation tutorial and the multi task modelling tutorial.
TargetImputer: TargetImputerFeatureImputer: FeatureImputer
Models
QSPRModel: Base class for models.
Models are used to predict properties of molecules. A general example can be found in the quick start tutorial. More detailed information can be found throughout the basic and advanced modelling tutorials.
SklearnModel: SklearnModel
PCMModel: PCMModel (See PCM tutorial.)
More information can be found in the deep learning tutorial.
DNNModel: DNNModelChempropModel: ChempropModel (See Chemprop tutorial.)PyBoostModel: PyBoostModel
Metrics
Metric: Base class for metrics
Metrics are used to evaluate the performance of models. More information can be found in the model assessment tutorial.
SklearnMetrics: SklearnMetricsMaskedMetric: MaskedMetricCalibrationError: CalibrationErrorBEDROC: BEDROCEnrichmentFactor: EnrichmentFactorRobustInitialEnhancement: RobustInitialEnhancementPrevalence: PrevalenceSensitivity: SensitivitySpecificity: SpecificityPositivePredictivity: PositivePredictivityNegativePredictivity: NegativePredictivityCohenKappa: CohenKappaBalancedPositivePredictivity: BalancedPositivePredictivityBalancedNegativePredictivity: BalancedNegativePredictivityBalancedMatthewsCorrcoeff: BalancedMatthewsCorrcoeffBalancedCohenKappa: BalancedCohenKappaKSlope: KSlopeR20: R20KPrimeSlope: KPrimeSlopeRPrime20: RPrime20Pearson: PearsonSpearman: SpearmanKendall: KendallAverageFoldError: AverageFoldErrorAbsoluteAverageFoldError: AbsoluteAverageFoldErrorPercentageWithinFoldError: PercentageWithinFoldError
Model Assessors
ModelAssessor: Base class for model assessors.
Model assessors are used to assess the performance of models. More information be found in the model assessment tutorial.
Assessor: Assessor
Hyperparameter Optimizers
HyperparameterOptimization: Base class for hyperparameter optimizers.
Hyperparameter optimizers are used to optimize the hyperparameters of models. More information can be found in the hyperparameter optimization tutorial.
GridSearchOptimization: GridSearchOptimizationOptunaOptimization: OptunaOptimization
Model Plots
ModelPlot: Base class for model plots.
Model plots are used to visualize the performance of models. Examples can be found throughout the basic and advanced modelling tutorials.
RegressionPlot: RegressionPlotCorrelationPlot: CorrelationPlotWilliamsPlot: WilliamsPlot
ClassifierPlot: ClassifierPlotROCPlot: ROCPlotPRCPlot: PRCPlotCalibrationPlot: CalibrationPlotMetricsPlot: MetricsPlotConfusionMatrixPlot: ConfusionMatrixPlot
Monitors
FitMonitor: Base class for monitoring model fittingAssessorMonitor: Base class for monitoring model assessment (subclass ofFitMonitor)HyperparameterOptimizationMonitor: Base class for monitoring hyperparameter optimization (subclass ofAssessorMonitor)
Monitors are used to monitor the training of models. More information can be found in the model monitoring tutorial.
NullMonitor: NullMonitorListMonitor: ListMonitorBaseMonitor: BaseMonitorFileMonitor: FileMonitorWandBMonitor: WandBMonitor
Scaffolds
Scaffold: Base class for scaffolds.
Class for calculating molecular scaffolds of different kinds
Murcko: MurckoBemisMurcko: BemisMurcko
Clustering
MoleculeClusters: Base class for clustering molecules.
Classes for clustering molecules
RandomClusters: RandomClustersScaffoldClusters: ScaffoldClustersFPSimilarityClusters: FPSimilarityClustersFPSimilarityMaxMinClusters: FPSimilarityMaxMinClustersFPSimilarityLeaderPickerClusters: FPSimilarityLeaderPickerClusters