Steps in QSAR Model Generation:
- Preparation of input data (structures, known biological activities)
- 3D Geometry optimization (conformation generation, alignment)
- Calculation of descriptors
- Statistical Analysis (Feature selection, regression)
- QSAR model building
- Interpretation, validation and prediction
Structure optimization
- 100,000 atoms: Molecular mechanics - Use of empirically derived potential function
- 1000 atoms: Semi-empirical quantum mechanics - Use of approximate Schrodinger equation
- 100 atoms: Ab initio quantum mechanics - Solve exact Schrodinger equation
Descriptor Calculation and Statistic Software for QSAR:
- ADAPT
- TSAR
- SciQSAR
- Cerius2
Statistical Packages
- SAS
- SPSS
- Minitab
- STATISTICA
- SYSTAT
- StatView
- WinNN (Neural Networks)
Conformational Search:
- Grid Search
- Random Search
- Boltzman Search
- Systematic Search
Alignment of Molecules
- RMS atoms alignment - pairwise model alignment based on superimposition
- Moments alignment - using electrostatic moments or principle moments of inertia
- Field alignment - maximizing the overlaps between steric and electrostatic fields calculated using probe potential
Selection of Descriptors
- QSAR model should be reduced to a set of descriptors which is as information rich but as small as possible
- Rule of thumb: 5-6 structural points should fall per structural descriptors
- Objective selection
- Correlations
- Pairwise selections
- Identical tests
- Vector space descriptor analysis
- Subjective selection
- Descriptor selection considering biological activity
- Genetic algorithm based feature selection
Statistics in QSAR
- Multiple linear regression:
- Least square error minimization
- N, ANOVA, R, F test, p value
- Examine multi-collinearity: tolerance=1-R2, VIF=1/(1-R2)
- Yields linear models with coefficients that can be interpreted for relative importance
- Step-wise multiple linear regression
- Forward, backward, stepwise
- Principle components regression analysis
- Partial least square analysis
- Artificial Neural Network method
- Genetic function approximation
- Principle component analysis
- Factor analysis
- Discriminant analysis
- Cluster analysis
how we can define outlier in QSAR so that result to be published?
bye