Data mining algorithms : explained using R /
"This book narrows down the scope of data mining by adopting a heavily modeling-oriented perspective"--
Κύριος συγγραφέας: | |
---|---|
Μορφή: | Ηλ. βιβλίο |
Γλώσσα: | English |
Έκδοση: |
Chichester, West Sussex ; Malden, MA :
John Wiley & Sons Inc.,
2015.
|
Θέματα: | |
Διαθέσιμο Online: | Full Text via HEAL-Link |
Πίνακας περιεχομένων:
- Machine generated contents note: pt. I Preliminaries
- 1. Tasks
- 1.1. Introduction
- 1.1.1. Knowledge
- 1.1.2. Inference
- 1.2. Inductive learning tasks
- 1.2.1. Domain
- 1.2.2. Instances
- 1.2.3. Attributes
- 1.2.4. Target attribute
- 1.2.5. Input attributes
- 1.2.6. Training set
- 1.2.7. Model
- 1.2.8. Performance
- 1.2.9. Generalization
- 1.2.10. Overfitting
- 1.2.11. Algorithms
- 1.2.12. Inductive learning as search
- 1.3. Classification
- 1.3.1. Concept
- 1.3.2. Training set
- 1.3.3. Model
- 1.3.4. Performance
- 1.3.5. Generalization
- 1.3.6. Overfitting
- 1.3.7. Algorithms
- 1.4. Regression
- 1.4.1. Target function
- 1.4.2. Training set
- 1.4.3. Model
- 1.4.4. Performance
- 1.4.5. Generalization
- 1.4.6. Overfitting
- 1.4.7. Algorithms
- 1.5. Clustering
- 1.5.1. Motivation
- 1.5.2. Training set
- 1.5.3. Model
- 1.5.4. Crisp vs. soft clustering
- 1.5.5. Hierarchical clustering
- 1.5.6. Performance
- 1.5.7. Generalization
- 1.5.8. Algorithms.
- 1.5.9. Descriptive vs. predictive clustering
- 1.6. Practical issues
- 1.6.1. Incomplete data
- 1.6.2. Noisy data
- 1.7. Conclusion
- 1.8. Further readings
- References
- 2. Basic statistics
- 2.1. Introduction
- 2.2. Notational conventions
- 2.3. Basic statistics as modeling
- 2.4. Distribution description
- 2.4.1. Continuous attributes
- 2.4.2. Discrete attributes
- 2.4.3. Confidence intervals
- 2.4.4.m-Estimation
- 2.5. Relationship detection
- 2.5.1. Significance tests
- 2.5.2. Continuous attributes
- 2.5.3. Discrete attributes
- 2.5.4. Mixed attributes
- 2.5.5. Relationship detection caveats
- 2.6. Visualization
- 2.6.1. Boxplot
- 2.6.2. Histogram
- 2.6.3. Barplot
- 2.7. Conclusion
- 2.8. Further readings
- References
- pt. II Classification
- 3. Decision trees
- 3.1. Introduction
- 3.2. Decision tree model
- 3.2.1. Nodes and branches
- 3.2.2. Leaves
- 3.2.3. Split types
- 3.3. Growing
- 3.3.1. Algorithm outline.
- 3.3.2. Class distribution calculation
- 3.3.3. Class label assignment
- 3.3.4. Stop criteria
- 3.3.5. Split selection
- 3.3.6. Split application
- 3.3.7.Complete process
- 3.4. Pruning
- 3.4.1. Pruning operators
- 3.4.2. Pruning criterion
- 3.4.3. Pruning control strategy
- 3.4.4. Conversion to rule sets
- 3.5. Prediction
- 3.5.1. Class label prediction
- 3.5.2. Class probability prediction
- 3.6. Weighted instances
- 3.7. Missing value handling
- 3.7.1. Fractional instances
- 3.7.2. Surrogate splits
- 3.8. Conclusion
- 3.9. Further readings
- References
- 4. Naive Bayes classifier
- 4.1. Introduction
- 4.2. Bayes rule
- 4.3. Classification by Bayesian inference
- 4.3.1. Conditional class probability
- 4.3.2. Prior class probability
- 4.3.3. Independence assumption
- 4.3.4. Conditional attribute value probabilities
- 4.3.5. Model construction
- 4.3.6. Prediction
- 4.4. Practical issues
- 4.4.1. Zero and small probabilities.
- 4.4.2. Linear classification
- 4.4.3. Continuous attributes
- 4.4.4. Missing attribute values
- 4.4.5. Reducing naivety
- 4.5. Conclusion
- 4.6. Further readings
- References
- 5. Linear classification
- 5.1. Introduction
- 5.2. Linear representation
- 5.2.1. Inner representation function
- 5.2.2. Outer representation function
- 5.2.3. Threshold representation
- 5.2.4. Logit representation
- 5.3. Parameter estimation
- 5.3.1. Delta rule
- 5.3.2. Gradient descent
- 5.3.3. Distance to decision boundary
- 5.3.4. Least squares
- 5.4. Discrete attributes
- 5.5. Conclusion
- 5.6. Further readings
- References
- 6. Misclassification costs
- 6.1. Introduction
- 6.2. Cost representation
- 6.2.1. Cost matrix
- 6.2.2. Per-class cost vector
- 6.2.3. Instance-specific costs
- 6.3. Incorporating misclassification costs
- 6.3.1. Instance weighting
- 6.3.2. Instance resampling
- 6.3.3. Minimum-cost rule
- 6.3.4. Instance relabeling.
- 6.4. Effects of cost incorporation
- 6.5. Experimental procedure
- 6.6. Conclusion
- 6.7. Further readings
- References
- 7. Classification model evaluation
- 7.1. Introduction
- 7.1.1. Dataset performance
- 7.1.2. Training performance
- 7.1.3. True performance
- 7.2. Performance measures
- 7.2.1. Misclassification error
- 7.2.2. Weighted misclassification error
- 7.2.3. Mean misclassification cost
- 7.2.4. Confusion matrix
- 7.2.5. ROC analysis
- 7.2.6. Probabilistic performance measures
- 7.3. Evaluation procedures
- 7.3.1. Model evaluation vs. modeling procedure evaluation
- 7.3.2. Evaluation caveats
- 7.3.3. Hold-out
- 7.3.4. Cross-validation
- 7.3.5. Leave-one-out
- 7.3.6. Bootstrapping
- 7.3.7. Choosing the right procedure
- 7.3.8. Evaluation procedures for temporal data
- 7.4. Conclusion
- 7.5. Further readings
- References
- pt. III Regression
- 8. Linear regression
- 8.1. Introduction
- 8.2. Linear representation.
- 8.2.1. Parametric representation
- 8.2.2. Linear representation function
- 8.2.3. Nonlinear representation functions
- 8.3. Parameter estimation
- 8.3.1. Mean square error minimization
- 8.3.2. Delta rule
- 8.3.3. Gradient descent
- 8.3.4. Least squares
- 8.4. Discrete attributes
- 8.5. Advantages of linear models
- 8.6. Beyond linearity
- 8.6.1. Generalized linear representation
- 8.6.2. Enhanced representation
- 8.6.3. Polynomial regression
- 8.6.4. Piecewise-linear regression
- 8.7. Conclusion
- 8.8. Further readings
- References
- 9. Regression trees
- 9.1. Introduction
- 9.2. Regression tree model
- 9.2.1. Nodes and branches
- 9.2.2. Leaves
- 9.2.3. Split types
- 9.2.4. Piecewise-constant regression
- 9.3. Growing
- 9.3.1. Algorithm outline
- 9.3.2. Target function summary statistics
- 9.3.3. Target value assignment
- 9.3.4. Stop criteria
- 9.3.5. Split selection
- 9.3.6. Split application
- 9.3.7.Complete process
- 9.4. Pruning.
- 9.4.1. Pruning operators
- 9.4.2. Pruning criterion
- 9.4.3. Pruning control strategy
- 9.5. Prediction
- 9.6. Weighted instances
- 9.7. Missing value handling
- 9.7.1. Fractional instances
- 9.7.2. Surrogate splits
- 9.8. Piecewise linear regression
- 9.8.1. Growing
- 9.8.2. Pruning
- 9.8.3. Prediction
- 9.9. Conclusion
- 9.10. Further readings
- References
- 10. Regression model evaluation
- 10.1. Introduction
- 10.1.1. Dataset performance
- 10.1.2. Training performance
- 10.1.3. True performance
- 10.2. Performance measures
- 10.2.1. Residuals
- 10.2.2. Mean absolute error
- 10.2.3. Mean square error
- 10.2.4. Root mean square error
- 10.2.5. Relative absolute error
- 10.2.6. Coefficient of determination
- 10.2.7. Correlation
- 10.2.8. Weighted performance measures
- 10.2.9. Loss functions
- 10.3. Evaluation procedures
- 10.3.1. Hold-out
- 10.3.2. Cross-validation
- 10.3.3. Leave-one-out
- 10.3.4. Bootstrapping
- 10.3.5. Choosing the right procedure.
- 10.4. Conclusion
- 10.5. Further readings
- References
- pt. IV Clustering
- 11.(Dis)similarity measures
- 11.1. Introduction
- 11.2. Measuring dissimilarity and similarity
- 11.3. Difference-based dissimilarity
- 11.3.1. Euclidean distance
- 11.3.2. Minkowski distance
- 11.3.3. Manhattan distance
- 11.3.4. Canberra distance
- 11.3.5. Chebyshev distance
- 11.3.6. Hamming distance
- 11.3.7. Gower's coefficient
- 11.3.8. Attribute weighting
- 11.3.9. Attribute transformation
- 11.4. Correlation-based similarity
- 11.4.1. Discrete attributes
- 11.4.2. Pearson's correlation similarity
- 11.4.3. Spearman's correlation similarity
- 11.4.4. Cosine similarity
- 11.5. Missing attribute values
- 11.6. Conclusion
- 11.7. Further readings
- References
- 12.k-Centers clustering
- 12.1. Introduction
- 12.1.1. Basic principle
- 12.1.2.(Dis)similarity measures
- 12.2. Algorithm scheme
- 12.2.1. Initialization
- 12.2.2. Stop criteria
- 12.2.3. Cluster formation.
- 12.2.4. Implicit cluster modeling
- 12.2.5. Instantiations
- 12.3.k-Means
- 12.3.1. Center adjustment
- 12.3.2. Minimizing dissimilarity to centers
- 12.4. Beyond means
- 12.4.1.k-Medians
- 12.4.2.k-Medoids
- 12.5. Beyond (fixed) k
- 12.5.1. Multiple runs
- 12.5.2. Adaptive k-centers
- 12.6. Explicit cluster modeling
- 12.7. Conclusion
- 12.8. Further readings
- References
- 13. Hierarchical clustering
- 13.1. Introduction
- 13.1.1. Basic approaches
- 13.1.2.(Dis)similarity measures
- 13.2. Cluster hierarchies
- 13.2.1. Motivation
- 13.2.2. Model representation
- 13.3. Agglomerative clustering
- 13.3.1. Algorithm scheme
- 13.3.2. Cluster linkage
- 13.4. Divisive clustering
- 13.4.1. Algorithm scheme
- 13.4.2. Wrapping a flat clustering algorithm
- 13.4.3. Stop criteria
- 13.5. Hierarchical clustering visualization
- 13.6. Hierarchical clustering prediction
- 13.6.1. Cutting cluster hierarchies
- 13.6.2. Cluster membership assignment.
- 13.7. Conclusion
- 13.8. Further readings
- References
- 14. Clustering model evaluation
- 14.1. Introduction
- 14.1.1. Dataset performance
- 14.1.2. Training performance
- 14.1.3. True performance
- 14.2. Per-cluster quality measures
- 14.2.1. Diameter
- 14.2.2. Separation
- 14.2.3. Isolation
- 14.2.4. Silhouette width
- 14.2.5. Davies
- Bouldin Index
- 14.3. Overall quality measures
- 14.3.1. Dunn Index
- 14.3.2. Average Davies
- Bouldin Index
- 14.3.3.C Index
- 14.3.4. Average silhouette width
- 14.3.5. Loglikelihood
- 14.4. External quality measures
- 14.4.1. Misclassification error
- 14.4.2. Rand Index
- 14.4.3. General relationship detection measures
- 14.5. Using quality measures
- 14.6. Conclusion
- 14.7. Further readings
- References
- pt. V Getting Better Models
- 15. Model ensembles
- 15.1. Introduction
- 15.2. Model committees
- 15.3. Base models
- 15.3.1. Different training sets
- 15.3.2. Different algorithms.
- 15.3.3. Different parameter setups
- 15.3.4. Algorithm randomization
- 15.3.5. Base model diversity
- 15.4. Model aggregation
- 15.4.1. Voting/Averaging
- 15.4.2. Probability averaging
- 15.4.3. Weighted voting/averaging
- 15.4.4. Using as attributes
- 15.5. Specific ensemble modeling algorithms
- 15.5.1. Bagging
- 15.5.2. Stacking
- 15.5.3. Boosting
- 15.5.4. Random forest
- 15.5.5. Random Naive Bayes
- 15.6. Quality of ensemble predictions
- 15.7. Conclusion
- 15.8. Further readings
- References
- 16. Kernel methods
- 16.1. Introduction
- 16.2. Support vector machines
- 16.2.1. Classification margin
- 16.2.2. Maximum-margin hyperplane
- 16.2.3. Primal form
- 16.2.4. Dual form
- 16.2.5. Soft margin
- 16.3. Support vector regression
- 16.3.1. Regression tube
- 16.3.2. Primal form
- 16.3.3. Dual form
- 16.4. Kernel trick
- 16.5. Kernel functions
- 16.5.1. Linear kernel
- 16.5.2. Polynomial kernel
- 16.5.3. Radial kernel
- 16.5.4. Sigmoid kernel.
- 16.6. Kernel prediction
- 16.7. Kernel-based algorithms
- 16.7.1. Kernel-based SVM
- 16.7.2. Kernel-based SVR
- 16.8. Conclusion
- 16.9. Further readings
- References
- 17. Attribute transformation
- 17.1. Introduction
- 17.2. Attribute transformation task
- 17.2.1. Target task
- 17.2.2. Target attribute
- 17.2.3. Transformed attribute
- 17.2.4. Training set
- 17.2.5. Modeling transformations
- 17.2.6. Nonmodeling transformations
- 17.3. Simple transformations
- 17.3.1. Standardization
- 17.3.2. Normalization
- 17.3.3. Aggregation
- 17.3.4. Imputation
- 17.3.5. Binary encoding
- 17.4. Multiclass encoding
- 17.4.1. Encoding and decoding functions
- 17.4.2.1-ok-k encoding
- 17.4.3. Error-correcting encoding
- 17.4.4. Effects of multiclass encoding
- 17.5. Conclusion
- 17.6. Further readings
- References
- 18. Discretization
- 18.1. Introduction
- 18.2. Discretization task
- 18.2.1. Motivation
- 18.2.2. Task definition.
- 18.2.3. Discretization as modeling
- 18.2.4. Discretization quality
- 18.3. Unsupervised discretization
- 18.3.1. Equal-width intervals
- 18.3.2. Equal-frequency intervals
- 18.3.3. Nonmodeling discretization
- 18.4. Supervised discretization
- 18.4.1. Pure-class discretization
- 18.4.2. Bottom-up discretization
- 18.4.3. Top-down discretization
- 18.5. Effects of discretization
- 18.6. Conclusion
- 18.7. Further readings
- References
- 19. Attribute selection
- 19.1. Introduction
- 19.2. Attribute selection task
- 19.2.1. Motivation
- 19.2.2. Task definition
- 19.2.3. Algorithms
- 19.3. Attribute subset search
- 19.3.1. Search task
- 19.3.2. Initial state
- 19.3.3. Search operators
- 19.3.4. State selection
- 19.3.5. Stop criteria
- 19.4. Attribute selection filters
- 19.4.1. Simple statistical filters
- 19.4.2. Correlation-based filters
- 19.4.3. Consistency-based filters
- 19.4.4. RELIEF
- 19.4.5. Random forest
- 19.4.6. Cutoff criteria.
- 19.4.7. Filter-driven search
- 19.5. Attribute selection wrappers
- 19.5.1. Subset evaluation
- 19.5.2. Wrapper attribute selection
- 19.6. Effects of attribute selection
- 19.7. Conclusion
- 19.8. Further readings
- References
- 20. Case studies
- 20.1. Introduction
- 20.1.1. Datasets
- 20.1.2. Packages
- 20.1.3. Auxiliary functions
- 20.2. Census income
- 20.2.1. Data loading and preprocessing
- 20.2.2. Default model
- 20.2.3. Incorporating misclassification costs
- 20.2.4. Pruning
- 20.2.5. Attribute selection
- 20.2.6. Final models
- 20.3.Communities and crime
- 20.3.1. Data loading
- 20.3.2. Data quality
- 20.3.3. Regression trees
- 20.3.4. Linear models
- 20.3.5. Attribute selection
- 20.3.6. Piecewise-linear models
- 20.4. Cover type
- 20.4.1. Data loading and preprocessing
- 20.4.2. Class imbalance
- 20.4.3. Decision trees
- 20.4.4. Class rebalancing
- 20.4.5. Multiclass encoding
- 20.4.6. Final classification models
- 20.4.7. Clustering.
- 20.5. Conclusion
- 20.6. Further readings
- References
- Closing
- A. Notation
- A.1. Attribute values
- A.2. Data subsets
- A.3. Probabilities
- B.R packages
- B.1. CRAN packages
- B.2. DMR packages
- B.3. Installing packages
- References
- C. Datasets.