Leseprobe

Cluster Analysis

eBook - Wiley Series in Probability and Statistics

Everitt, Brian S/Leese, Morven/Stahl, Daniel et al

WILEY

Mathematik/Wahrscheinlichkeitstheorie, Stochastik, Mathematische Statistik

Erschienen am 13.12.2010, 5. Auflage 2010

65,99 €

(inkl. MwSt.)

Download

E-Book Download

Auf Wunschliste

Bibliografische Daten

ISBN/EAN: 9780470977804

Sprache: Englisch

Umfang: 352 S., 4.33 MB

E-Book
Format: PDF
DRM: Adobe DRM

Beschreibung

Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.

This fifth edition of the highly successfulCluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.

Real life examples are used throughout to demonstrate the application of the theory, and figures are used extensively to illustrate graphical techniques. The book is comprehensive yet relatively non-mathematical, focusing on the practical aspects of cluster analysis.

Key Features:

Presents a comprehensive guide to clustering techniques, with focus on the practical aspects of cluster analysisProvides a thorough revision of the fourth edition, including new developments in clustering longitudinal data and examples from bioinformatics and gene studies./li>Updates the chapter on mixture models to include recent developments and presents a new chapter on mixture modeling for structured data

Practitioners and researchers working in cluster analysis and data analysis will benefit from this book.

Autorenportrait

Brian S. Everitt, Head of the Biostatistics and Computing Department and Professor of Behavioural Statistics, Kings College London. He has authored/ co-authored over 50 books on statistics and approximately 100 papers and other articles, and is also joint editor ofStatistical Methods in Medical Research.

Dr Sabine Landau, Head of Department of Biostatistics, Institute of Psychiatry, Kings College London.

Dr Morven Leese, Health Service and Population Research, Institute of Psychiatry, Kings College London.

Dr Daniel Stahl, Deptartment of Biostatistics& Computing, Institute of Psychiatry, Kings College London.

Inhalt

Preface.

Acknowledgement.

1 An Introduction to classification and clustering.

1.1 Introduction.

1.2 Reasons for classifying.

1.3 Numerical methods of classification cluster analysis.

1.4 What is a cluster?

1.5 Examples of the use of clustering.

1.5.1 Market research.

1.5.2 Astronomy.

1.5.3 Psychiatry.

1.5.4 Weather classification.

1.5.5 Archaeology.

1.5.6 Bioinformatics and genetics.

1.6 Summary.

2 Detecting clusters graphically.

2.1 Introduction.

2.2 Detecting clusters with univariate and bivariate plots of data.

2.2.1 Histograms.

2.2.2 Scatterplots.

2.2.3 Density estimation.

2.2.4 Scatterplot matrices.

2.3 Using lower-dimensional projections of multivariate data for graphical representations.

2.3.1 Principal components analysis of multivariate data.

2.3.2 Exploratory projection pursuit.

2.3.3 Multidimensional scaling.

2.4 Three-dimensional plots and trellis graphics.

2.5 Summary.

3 Measurement of proximity.

3.1 Introduction.

3.2 Similarity measures for categorical data.

3.2.1 Similarity measures for binary data.

3.2.2 Similarity measures for categorical data with more than two levels.

3.3 Dissimilarity and distance measures for continuous data.

3.4 Similarity measures for data containing both continuous and categorical variables.

3.5 Proximity measures for structured data.

3.6 Inter-group proximity measures.

3.6.1 Inter-group proximity derived from the proximity matrix.

3.6.2 Inter-group proximity based on group summaries for continuous data.

3.6.3 Inter-group proximity based on group summaries for categorical data.

3.7 Weighting variables.

3.8 Standardization.

3.9 Choice of proximity measure.

3.10 Summary.

4 Hierarchical clustering.

4.1 Introduction.

4.2 Agglomerative methods.

4.2.1 Illustrative examples of agglomerative methods.

4.2.2 The standard agglomerative methods.

4.2.3 Recurrence formula for agglomerative methods.

4.2.4 Problems of agglomerative hierarchical methods.

4.2.5 Empirical studies of hierarchical agglomerative methods.

4.3 Divisive methods.

4.3.1 Monothetic divisive methods.

4.3.2 Polythetic divisive methods.

4.4 Applying the hierarchical clustering process.

4.4.1 Dendrograms and other tree representations.

4.4.2 Comparing dendrograms and measuring their distortion.

4.4.3 Mathematical properties of hierarchical methods.

4.4.4 Choice of partition the problem of the number of groups.

4.4.5 Hierarchical algorithms.

4.4.6 Methods for large data sets.

4.5 Applications of hierarchical methods.

4.5.1 Dolphin whistles agglomerative clustering.

4.5.2 Needs of psychiatric patients monothetic divisive clustering.

4.5.3 Globalization of cities polythetic divisive method.

4.5.4 Womens life histories divisive clustering of sequence data.

4.5.5 Composition of mammals milk exemplars, dendrogram seriation and choice of partition.

4.6 Summary.

5 Optimization clustering techniques.

5.1 Introduction.

5.2 Clustering criteria derived from the dissimilarity matrix.

5.3 Clustering criteria derived from continuous data.

5.3.1 Minimization of trace(W).

5.3.2 Minimization of det(W).

5.3.3 Maximization of trace (BW1).

5.3.4 Properties of the clustering criteria.

5.3.5 Alternative criteria for clusters of different shapes and sizes.

5.4 Optimization algorithms.

5.4.1 Numerical example.

5.4.2 More on k-means.

5.4.3 Software implementations of optimization clustering.

5.5 Choosing the number of clusters.

5.6 Applications of optimization methods.

5.6.1 Survey of student attitudes towards video games.

5.6.2 Air pollution indicators for US cities.

5.6.3 Aesthetic judgement of painters.

5.6.4 Classification of nonspecific back pain.

5.7 Summary.

6 Finite mixture densities as models for cluster analysis.

6.1 Introduction.

6.2 Finite mixture densities.

6.2.1 Maximum likelihood estimation.

6.2.2 Maximum likelihood estimation of mixtures of multivariate normal densities.

6.2.3 Problems with maximum likelihood estimation of finite mixture models using the EM algorithm.

6.3 Other finite mixture densities.

6.3.1 Mixtures of multivariate t-distributions.

6.3.2 Mixtures for categorical data latent class analysis.

6.3.3 Mixture models for mixed-mode data.

6.4 Bayesian analysis of mixtures.

6.4.1 Choosing a prior distribution.

6.4.2 Label switching.

6.4.3 Markov chain Monte Carlo samplers.

6.5 Inference for mixture models with unknown number of components and model structure.

6.5.1 Log-likelihood ratio test statistics.

6.5.2 Information criteria.

6.5.3 Bayes factors.

6.5.4 Markov chain Monte Carlo methods.

6.6 Dimension reduction variable selection in finite mixture modelling.

6.7 Finite regression mixtures.

6.8 Software for finite mixture modelling.

6.9 Some examples of the application of finite mixture densities.

6.9.1 Finite mixture densities with univariate Gaussian components.

6.9.2 Finite mixture densities with multivariate Gaussian components.

6.9.3 Applications of latent class analysis.

6.9.4 Application of a mixture model with different component densities.

6.10 Summary.

7 Model-based cluster analysis for structured data.

7.1 Introduction.

7.2 Finite mixture models for structured data.

7.3 Finite mixtures of factor models.

7.4 Finite mixtures of longitudinal models.

7.5 Applications of finite mixture models for structured data.

7.5.1 Application of finite mixture factor analysis to the categorical versus dimensional representation debate.

7.5.2 Application of finite mixture confirmatory factor analysis to cluster genes using replicated microarray experiments.

7.5.3 Application of finite mixture exploratory factor analysis to cluster Italian wines.

7.5.4 Application of growth mixture modelling to identify distinct developmental trajectories.

7.5.5 Application of growth mixture modelling to identify trajectories of perinatal depressive symptomatology.

7.6 Summary.

8 Miscellaneous clustering methods.

8.1 Introduction.

8.2 Density search clustering techniques.

8.2.1 Mode analysis.

8.2.2 Nearest-neighbour clustering procedures.

8.3 Density-based spatial clustering of applications with noise.

8.4 Techniques which allow overlapping clusters.

8.4.1 Clumping and related techniques.

8.4.2 Additive clustering.

8.4.3 Application of MAPCLUS to data on social relations in a monastery.

8.4.4 Pyramids.

8.4.5 Application of pyramid clustering to gene sequences of yeasts.

8.5 Simultaneous clustering of objects and variables.

8.5.1 Hierarchical classes.

8.5.2 Application of hierarchical classes to psychiatric symptoms.

8.5.3 The error variance technique.

8.5.4 Application of the error variance technique to appropriateness of behaviour data.

8.6 Clustering with constraints.

8.6.1 Contiguity constraints.

8.6.2 Application of contiguity-constrained clustering.

8.7 Fuzzy clustering.

8.7.1 Methods for fuzzy cluster analysis.

8.7.2 The assessment of fuzzy clustering.

8.7.3 Application of fuzzy cluster analysis to Roman glass composition.

8.8 Clustering and artificial neural networks.

8.8.1 Components of a neural network.

8.8.2 The Kohonen self-organizing map.

8.8.3 Application of neural nets to brainstorming sessions.

8.9 Summary.

9 Some final comments and guidelines.

9.1 Introduction.

9.2 Using clustering techniques in practice.

9.3 Testing for absence of structure.

9.4 Methods for comparing cluster solutions.

9.4.1 Comparing partitions.

9.4.2 Comparing dendrograms.

9.4.3 Comparing proximity matrices.

9.5 Internal cluster quality, influence and robustness.

9.5.1 Internal cluster quality.

9.5.2 Robustness split-sample validation and consensus trees.

9.5.3 Influence of individual points.

9.6 Displaying cluster solutions graphically.

9.7 Illustrative examples.

9.7.1 Indo-European languages a consensus tree in linguistics.

9.7.2 Scotch whisky tasting cophenetic matrices for comparing clusterings.

9.7.3 Chemical compounds in the pharmaceutical industry.

9.7.4 Evaluating clustering algorithms for gene expression data.

9.8 Summary.

Bibliography.

Index.

Informationen zu E-Books

„E-Book“ steht für digitales Buch. Um diese Art von Büchern lesen zu können wird entweder eine spezielle Software für Computer, Tablets und Smartphones oder ein E-Book Reader benötigt. Da viele verschiedene Formate (Dateien) für E-Books existieren, gilt es dabei, einiges zu beachten.
Von uns werden digitale Bücher in drei Formaten ausgeliefert. Die Formate sind EPUB mit DRM (Digital Rights Management), EPUB ohne DRM und PDF. Bei den Formaten PDF und EPUB ohne DRM müssen Sie lediglich prüfen, ob Ihr E-Book Reader kompatibel ist. Wenn ein Format mit DRM genutzt wird, besteht zusätzlich die Notwendigkeit, dass Sie einen kostenlosen Adobe® Digital Editions Account besitzen. Wenn Sie ein E-Book, das Adobe® Digital Editions benötigt herunterladen, erhalten Sie eine ASCM-Datei, die zu Digital Editions hinzugefügt und mit Ihrem Account verknüpft werden muss. Einige E-Book Reader (zum Beispiel PocketBook Touch) unterstützen auch das direkte Eingeben der Login-Daten des Adobe Accounts – somit können diese ASCM-Dateien direkt auf das betreffende Gerät kopiert werden.
Da E-Books nur für eine begrenzte Zeit – in der Regel 6 Monate – herunterladbar sind, sollten Sie stets eine Sicherheitskopie auf einem Dauerspeicher (Festplatte, USB-Stick oder CD) vorsehen. Auch ist die Menge der Downloads auf maximal 5 begrenzt.