Data mining and statistics for decision making /

"This practical guide to understanding and implementing data mining techniques discusses traditional methods--cluster analysis, factor analysis, linear regression, PLS regression, and generalized linear models--and recent methods--bagging and boosting, decision trees, neural networks, support v...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Tuffery, Stephane
Μορφή: Ηλ. βιβλίο
Γλώσσα:English
Έκδοση: Chichester, West Sussex ; Hoboken, NJ. : Wiley, 2011.
Σειρά:Wiley series in computational statistics.
Θέματα:
Διαθέσιμο Online:Full Text via HEAL-Link
Πίνακας περιεχομένων:
  • Front Matter
  • Overview of Data Mining
  • The Development of a Data Mining Study
  • Data Exploration and Preparation
  • Using Commercial Data
  • Statistical and Data Mining Software
  • An Outline of Data Mining Methods
  • Factor Analysis
  • Neural Networks
  • Cluster Analysis
  • Association Analysis
  • Classification and Prediction Methods
  • An Application of Data Mining: Scoring
  • Factors for Success in a Data Mining Project
  • Text Mining
  • Web Mining
  • Appendix A: Elements of Statistics
  • Appendix B: Further Reading
  • Index.
  • Machine generated contents note: Preface
  • Foreword
  • Contents
  • Overview of data mining
  • 1.1. What is data mining?
  • 1.2. What is data mining used for?
  • 1.3. Data Mining and statistics
  • 1.4. Data mining and information technology
  • 1.5. Data mining and protection of personal data
  • 1.6. Implementation of data mining
  • The development of a data mining study
  • 2.1. Defining the aims
  • 2.2. Listing the existing data
  • 2.3. Collecting the data
  • 2.4. Exploring and preparing the data
  • 2.5. Population segmentation
  • 2.6. Drawing up and validating predictive models
  • 2.7. Synthesizing predictive models of different segments
  • 2.8. Iteration of the preceding steps
  • 2.9. Deploying the models
  • 2.10. Training the model users
  • 2.11. Monitoring the models
  • 2.12. Enriching the models
  • 2.13. Remarks
  • 2.14. Life cycle of a model
  • 2.15. Costs of a pilot project
  • Data exploration and preparation
  • 3.1. The different types of data
  • 3.2. Examining the distribution of variables
  • 3.3. Detection of rare or missing values
  • 3.4. Detection of aberrant values
  • 3.5. Detection of extreme values
  • 3.6. Tests of normality
  • 3.7. Homoscedasticity and heteroscedasticity
  • 3.8. Detection of the most discriminating variables
  • 3.9. Transformation of variables
  • 3.10. Choosing ranges of values of continuous variables
  • 3.11. Creating new variables
  • 3.12. Detecting interactions 89
  • 3.13. Automatic variable selection
  • 3.14. Detection of collinearity
  • 3.15. Sampling
  • Using commercial data
  • 4.1. Data used in commercial applications
  • 4.2. Special data
  • 4.3. Data used by business sector
  • Statistical and data mining software
  • 5.1. Types of data mining and statistical software
  • 5.2. Essential characteristics of the software
  • 5.3. The main software packages
  • 5.4. Comparison of R, SAS and IBM SPSS
  • 5.5. How to reduce processing time
  • An outline of data mining methods
  • 6.1. A note on terminology
  • 6.2. Classification of the methods
  • 6.3. Comparison of the methods
  • 6.4. Using these methods in the business world
  • Factor analysis
  • 7.1. Principal component analysis
  • 7.2. Variants of principal component analysis
  • 7.3. Correspondence analysis
  • 7.4. Multiple correspondence analysis
  • Neural networks
  • 8.1. General information on neural networks
  • 8.2. Structure of a neural network
  • 8.3. Choosing the training sample
  • 8.4. Some empirical rules for network design
  • 8.5. Data normalization
  • 8.6. Learning algorithms
  • 8.7. The main neural networks
  • Automatic clustering methods
  • 9.1. Definition of clustering
  • 9.2. Applications of clustering
  • 9.3. Complexity of clustering
  • 9.4. Clustering structures
  • 9.5. Some methodological considerations
  • 9.6. Comparison of factor analysis and clustering
  • 9.7. Intra-class and inter-class inertias
  • 9.8. Measurements of clustering quality
  • 9.9. Partitioning methods
  • 9.10. Hierarchical ascending clustering
  • 9.11. Hybrid clustering methods
  • 9.12. Neural clustering
  • 9.13. Clustering by aggregation of similarities
  • 9.14. Clustering of numeric variables
  • 9.15. Overview of clustering methods
  • Finding associations
  • 10.1. Principles
  • 10.2. Using taxonomy
  • 10.3. Using supplementary variables
  • 10.4. Applications
  • 10.5. Example of use
  • Classification and prediction methods
  • 11.1. Introduction
  • 11.2. Inductive and transductive methods
  • 11.3. Overview of classification and prediction methods
  • 11.4. Classification by decision tree
  • 11.5. Prediction by decision tree
  • 11.6. Classification by discriminant analysis
  • 11.7. Prediction by linear regression
  • 11.8. Classification by logistic regression
  • 11.9. Developments in logistic regression
  • 11.10. Bayesian methods
  • 11.11. Classification and prediction by neural networks
  • 11.12. Classification by support vector machines (SVMs)
  • 11.13. Prediction by genetic algorithms
  • 11.14. Improving the performance of a predictive model
  • 11.15. Bootstrapping and aggregation of models
  • 11.16. Using classification and prediction methods
  • An application of data mining: scoring
  • 12.1. The different types of score
  • 12.2. Using propensity scores and risk scores
  • 12.3. Methodology
  • 12.4. Implementing a strategic score
  • 12.5. Implementing an operational score
  • 12.6. The kinds of scoring solutions used in a business
  • 12.7. An example of credit scoring (data preparation)
  • 12.8. An example of credit scoring (modelling by logistic regression)
  • 12.9. An example of credit scoring (modelling by DISQUAL discriminant analysis)
  • 12.10. A brief history of credit scoring
  • Factors for success in a data mining project
  • 13.1. The subject
  • 13.2. The people
  • 13.3. The data
  • 13.4. The IT systems
  • 13.5. The business culture
  • 13.6. Data mining: eight common misconceptions
  • 13.7. Return on investment
  • Text mining
  • 14.1. Definition of text mining
  • 14.2. Text sources used
  • 14.3. Using text mining
  • 14.4. Information retrieval
  • 14.5. Information extraction
  • 14.6. Multi-type data mining
  • Web mining
  • 15.1. The aims of web mining
  • 15.2. Global analyses
  • 15.3. Individual analyses
  • 15.4. Personal analyses
  • Appendix: Elements of statistics
  • 16.1. A brief history
  • 16.2. Elements of statistics
  • 16.3. Statistical tables
  • Further reading
  • 17.1. Statistics and data analysis
  • 17.2. Data mining and statistical learning
  • 17.3. Text mining
  • 17.4. Web mining
  • 17.5. R software
  • 17.6. SAS software
  • 17.7. IBM SPSS software
  • 17.8. Websites
  • Index.