Cluster1.pdf

  • Uploaded by: Arka Bose
  • 0
  • 0
  • July 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Cluster1.pdf as PDF for free.

More details

  • Words: 3,767
  • Pages: 98
Cluster Analysis I Presidency University

September,2016

Classication

I Aim of any multivariate study is to learn about the nature of some object.

Classication

I Aim of any multivariate study is to learn about the nature of some object.

I One way of learning about the nature is to nd out which class the object belongs .

Classication

I Aim of any multivariate study is to learn about the nature of some object.

I One way of learning about the nature is to nd out which class the object belongs .

I For example, a candidate is asked many questions in examination to classify him as either pass or fail.

Classication

I Aim of any multivariate study is to learn about the nature of some object.

I One way of learning about the nature is to nd out which class the object belongs .

I For example, a candidate is asked many questions in examination to classify him as either pass or fail.

I Hence classifying a object is one of the most important issue in multivariate statistics.

Types of classication

Classification

Types of classication

Classification

Supervised

Types of classication

Classification

Supervised

Unsupervised

Supervised vs Unsupervised Learning I Kid trying to learn alphabets.

Supervised vs Unsupervised Learning I Kid trying to learn alphabets. I The english letters are only pictures for him.

Supervised vs Unsupervised Learning I Kid trying to learn alphabets. I The english letters are only pictures for him.

I Teacher tells him the name of each letter.

Supervised vs Unsupervised Learning I Kid trying to learn alphabets. I The english letters are only pictures for him.

I Teacher tells him the name of each letter. I This is supervised classication.

Supervised vs Unsupervised Learning I Suppose the same kid grows up and visits china rst time.

Supervised vs Unsupervised Learning I Suppose the same kid grows up and visits china rst time. I At some place he sees

Supervised vs Unsupervised Learning I Suppose the same kid grows up and visits china rst time. I At some place he sees

I Even now, he cannot read the alphabets.

Supervised vs Unsupervised Learning I Suppose the same kid grows up and visits china rst time. I At some place he sees

I Even now, he cannot read the alphabets. I But at least can identify that some are repeated.

Supervised vs Unsupervised Learning I Suppose the same kid grows up and visits china rst time. I At some place he sees

I Even now, he cannot read the alphabets. I But at least can identify that some are repeated. I Can pick the unique ones.

Supervised vs Unsupervised Learning I Suppose the same kid grows up and visits china rst time. I At some place he sees

I Even now, he cannot read the alphabets. I But at least can identify that some are repeated. I Can pick the unique ones. I This is unsupervised classication.

Supervised vs Unsupervised Learning I Given data x1 , x2 , ....., xn with labels (i.e. group tag)

,

y1 y2

, ....., yn

we learn to predict the label associated with new

input xnew .

Chair

Duck

What is this?

Supervised vs Unsupervised Learning I Given only x1 , x2 , ....., xn , we try to infer some underlying structure (i.e. nd similarities and identifying groups).

Group these items according to common features

Supervised vs Unsupervised Learning

I Discriminant Analysis is supervised classication. I Clustering is Unsupervised classication.

What is clustering? And why?

I Clustering is the task of dividing up data into groups (clusters), so that points in any one group are more similar to each other than to points outside the group

What is clustering? And why?

I Clustering is the task of dividing up data into groups (clusters), so that points in any one group are more similar to each other than to points outside the group Two main uses are :

I Summary: deriving a reduced representation of the full data set.

I Discovery: looking for new insights into the structure of the data. e.g., nding groups of students that commit similar mistakes, or groups of 80's songs that sound alike

Example I Lets have a look at a satellite image

Example I Lets have a look at a satellite image

I The image shows a blue ocean and two land masses with green vegetation.

Example I Lets have a look at a satellite image

I The image shows a blue ocean and two land masses with green vegetation.

I Our human eye can detect large regions of similar colour in an image.

Example I Lets have a look at a satellite image

I The image shows a blue ocean and two land masses with green vegetation.

I Our human eye can detect large regions of similar colour in an image.

I Our brain is performing a simple clustering I It groups the pixels into two clusters: land and sea.

Proximity Measures I Cluster Analysis is the art of grouping similar items.

Proximity Measures I Cluster Analysis is the art of grouping similar items. I How to dene similarity of items?

Proximity Measures I Cluster Analysis is the art of grouping similar items. I How to dene similarity of items? I Usually we dene a dissimilarity measure measure

sij

between xi and xj .

dij

or a similarity

Proximity Measures I Cluster Analysis is the art of grouping similar items. I How to dene similarity of items? I Usually we dene a dissimilarity measure measure

sij

between xi and xj .

dij

or a similarity

dij is taken to be any distance measure, e.g. dij = ||xi − xj ||2

I Generally

Proximity Measures I Cluster Analysis is the art of grouping similar items. I How to dene similarity of items? I Usually we dene a dissimilarity measure measure

sij

between xi and xj .

dij

or a similarity

dij is taken to be any distance measure, e.g. dij = ||xi − xj ||2

I Generally

I This holds good if we are working with variables. I For categorical data, we will discuss the proximity measures later.

Types of Clustering

Clustering

Hierarchical Clustering

Agglomerative

NonHierarchical Clustering

Divisive

Hierarchical Clustering

Consider the four shapes Suppose we want to group the similar objects together.

Hierarchical Clustering

I Possibly put the two cubes in one cluster and the sphere and cone in the other as both have round surfaces.

I So the two clusters are now (sphere, cone) and (small cube, large cube)

Hierarchical Clustering

I Suppose we further split these clusters.

I Then we shall split the (sphere, cone) cluster into (sphere) and (cone) clusters.

Hierarchical Clustering

I If we are to further increase the number we will split the second cluster.

Hierarchical Clustering This step by step clustering process can be expressed using the following diagram:

Agglomerative vs divisive

Agglomerative (i.e., bottom-up):

I Start with all points in their own group I Until there is only one cluster, repeatedly: merge the two groups that have the smallest dissimilarity

Agglomerative vs divisive

Agglomerative (i.e., bottom-up):

I Start with all points in their own group I Until there is only one cluster, repeatedly: merge the two groups that have the smallest dissimilarity Divisive (i.e., top-down):

I Start with all points in one cluster I Until all points are in their own cluster, repeatedly split the group into two resulting in the biggest dissimilarity

Simple Example

Step 1: {1}, {2}, {3}, {4}, {5}, {6}, {7}

Simple Example

Step 2: {1}, {2, 3}, {4}, {5}, {6}, {7};

Simple Example

Step 3: {1, 7}, {2, 3}, {4}, {5}, {6}

Simple Example

Step 4: {1, 7}, {2, 3}, {4, 5}, {6}

Simple Example

Step 5: {1, 7}, {2, 3, 6}, {4, 5}

Simple Example

Step 6: {1, 7}, {2, 3, 6,4, 5}

Simple Example

Step 7: {1, 2, 3, 4, 5, 6, 7}

Simple Example We can simply represent this sequence of clustering assignments in a tree called dendogram.

What's a dendrogram? I Dendrogram is a convenient graphic to display a hierarchical sequence of clustering assignments.

What's a dendrogram? I Dendrogram is a convenient graphic to display a hierarchical sequence of clustering assignments. This is simply a tree where:

I Each node represents a group

What's a dendrogram? I Dendrogram is a convenient graphic to display a hierarchical sequence of clustering assignments. This is simply a tree where:

I Each node represents a group I Each leaf node is a singleton (i.e., a group containing a single data point)

What's a dendrogram? I Dendrogram is a convenient graphic to display a hierarchical sequence of clustering assignments. This is simply a tree where:

I Each node represents a group I Each leaf node is a singleton (i.e., a group containing a single data point)

I Root node is the group containing the whole data set

What's a dendrogram? I Dendrogram is a convenient graphic to display a hierarchical sequence of clustering assignments. This is simply a tree where:

I Each node represents a group I Each leaf node is a singleton (i.e., a group containing a single data point)

I Root node is the group containing the whole data set I Each internal node has two daughter nodes (children), representing the the groups that were merged to form it

What's a dendrogram? I Dendrogram is a convenient graphic to display a hierarchical sequence of clustering assignments. This is simply a tree where:

I Each node represents a group I Each leaf node is a singleton (i.e., a group containing a single data point)

I Root node is the group containing the whole data set I Each internal node has two daughter nodes (children), representing the the groups that were merged to form it

I If we x the leaf nodes at height zero, then each internal node is drawn at a height proportional to the dissmilarity between its two daughter nodes

Cutting a dendogram I Dendogram represents clustering allowing all possible clusters.

Cutting a dendogram I Dendogram represents clustering allowing all possible clusters. I In order to use it, we have to cut it at a particular height.

Cutting a dendogram I Dendogram represents clustering allowing all possible clusters. I In order to use it, we have to cut it at a particular height.

Cutting a dendogram I Dendogram represents clustering allowing all possible clusters. I In order to use it, we have to cut it at a particular height.

I The height at which we cut the dendogram represents the distance which we choose to call  dissimilar and practically depends on the number of clusters we want.

Cutting a dendogram I Dendogram represents clustering allowing all possible clusters. I In order to use it, we have to cut it at a particular height.

I The height at which we cut the dendogram represents the distance which we choose to call  dissimilar and practically depends on the number of clusters we want.

I Depending on the linkage we use, this cut has its own interpretation, which we will study.

Linkage

I We have studied distances between two data points xi and xj .

Linkage

I We have studied distances between two data points xi and xj . I While performing hierarchical clustering we need to merge the clusters which are similar.

Linkage

I We have studied distances between two data points xi and xj . I While performing hierarchical clustering we need to merge the clusters which are similar.

I This requires a idea of distance among the clusters, what is known as linkage.

Linkage I We are given the data points x1 , x2 , ......, xn and the distance between any pairs xi and xj as

dij .

Linkage I We are given the data points x1 , x2 , ......, xn and the distance between any pairs xi and xj as

dij .

I At any level, clustering assignments can be expressed by sets

G

= {i1 , i2 , ...., inG }

to that group.

giving the indices (

ir ) of points belonging

Linkage I We are given the data points x1 , x2 , ......, xn and the distance between any pairs xi and xj as

dij .

I At any level, clustering assignments can be expressed by sets

G

= {i1 , i2 , ...., inG }

giving the indices (

ir ) of points belonging

to that group.

I At the bottom level the groups are singleton, i.e.

G

= {i }

Linkage I We are given the data points x1 , x2 , ......, xn and the distance between any pairs xi and xj as

dij .

I At any level, clustering assignments can be expressed by sets

G

= {i1 , i2 , ...., inG }

giving the indices (

ir ) of points belonging

to that group.

I At the bottom level the groups are singleton, i.e. I At the topmost level the only one group is

G

= {1, 2, 3, ....., n}.

G

= {i }

Linkage I We are given the data points x1 , x2 , ......, xn and the distance between any pairs xi and xj as

dij .

I At any level, clustering assignments can be expressed by sets

G

= {i1 , i2 , ...., inG }

giving the indices (

ir ) of points belonging

to that group.

I At the bottom level the groups are singleton, i.e.

G

= {i }

I At the topmost level the only one group is

G

= {1, 2, 3, ....., n}.

I Linkage is a function

d (G , H ) that takes two groups G , H

returns a dissimilarity score between them

and

Linkage I We are given the data points x1 , x2 , ......, xn and the distance between any pairs xi and xj as

dij .

I At any level, clustering assignments can be expressed by sets

G

= {i1 , i2 , ...., inG }

giving the indices (

ir ) of points belonging

to that group.

I At the bottom level the groups are singleton, i.e.

G

= {i }

I At the topmost level the only one group is

G

= {1, 2, 3, ....., n}.

I Linkage is a function

d (G , H ) that takes two groups G , H

and

returns a dissimilarity score between them

I Remember: the choice of linkage determines how we measure dissimilarity between groups of points.

Agglomerative Clustering

Given the linkage, agglomerative clustering algorithm consists of the following steps:

Agglomerative Clustering

Given the linkage, agglomerative clustering algorithm consists of the following steps:

I Start with all points in their own group

Agglomerative Clustering

Given the linkage, agglomerative clustering algorithm consists of the following steps:

I Start with all points in their own group I Until there is only one cluster, repeatedly: merge the two groups

G, H

such that

d (G , H ) is smallest

Single Linkage I In single linkage (i.e., nearest-neighbor linkage), the dissimilarity between G, H is the smallest dissimilarity between two points in opposite groups:

dsingle (G , H ) =

min

i ∈G ,j ∈H

dij

Single Linkage

I Single linkage score

dsingle (G , H ) is the distance of the closest

pair.

I Cut interpretation: If we cut the denddogram at height 0.9, then we can say, for each point xi , there is another point xj in its cluster with

dij ≤ 0.9

Complete Linkage I In complete linkage (i.e., farthest-neighbor linkage), dissimilarity between G, H is the largest dissimilarity between two points in opposite groups:

dcomplete (G , H ) =

max

i ∈ G ,j ∈ H

dij

Complete Linkage

I Complete linkage score

dcomplete (G , H ) is the distance of the

farthest pair.

I Cut interpretation: for each point xi , every other point xj in its cluster satises

dij ≤ 0.9

Average Linkage

I In average linkage, the dissimilarity between G, H is the average dissimilarity over all points in opposite groups:

daverage (G , H ) =

1

nH nG

X

i ∈G ,j ∈H

dij

Average Linkage

I In average linkage, the dissimilarity between G, H is the average dissimilarity over all points in opposite groups:

daverage (G , H ) = I average linkage score across all pairs

1

nH nG

X

i ∈G ,j ∈H

dij

daverage (G , H ) is the average distance

Average Linkage

I In average linkage, the dissimilarity between G, H is the average dissimilarity over all points in opposite groups:

daverage (G , H ) = I average linkage score

1

nH nG

X

i ∈G ,j ∈H

dij

daverage (G , H ) is the average distance

across all pairs

I There is not really any good cut interpretation.

Common Properties Single, complete, average linkage share the following properties:

Common Properties Single, complete, average linkage share the following properties:

I These linkages operate on dissimilarities

dij

, and don't need

the points x1 , x2 , .....xn to be in Euclidean space

Common Properties Single, complete, average linkage share the following properties:

I These linkages operate on dissimilarities

dij

, and don't need

the points x1 , x2 , .....xn to be in Euclidean space

I Running agglomerative clustering with any of these linkages produces a dendrogram with no inversions

Common Properties Single, complete, average linkage share the following properties:

I These linkages operate on dissimilarities

dij

, and don't need

the points x1 , x2 , .....xn to be in Euclidean space

I Running agglomerative clustering with any of these linkages produces a dendrogram with no inversions

I Second property can be stated as : disimilarity scores between merged clusers only increases as we run the algorithm

Common Properties Single, complete, average linkage share the following properties:

I These linkages operate on dissimilarities

dij

, and don't need

the points x1 , x2 , .....xn to be in Euclidean space

I Running agglomerative clustering with any of these linkages produces a dendrogram with no inversions

I Second property can be stated as : disimilarity scores between merged clusers only increases as we run the algorithm

I This means that we can draw a proper dendrogram, where the height of a parent is always higher than height of its daughters

Shortcomings of single, complete linkage Single and complete linkage can have some practical problems:

Shortcomings of single, complete linkage Single and complete linkage can have some practical problems:

I Single linkage suers from chaining. In order to merge two groups, only need one pair of points to be close, irrespective of all others. Therefore clusters can be too spread out, and not compact enough

Shortcomings of single, complete linkage Single and complete linkage can have some practical problems:

I Single linkage suers from chaining. In order to merge two groups, only need one pair of points to be close, irrespective of all others. Therefore clusters can be too spread out, and not compact enough

I Complete linkage avoids chaining, but suers from crowding. Because its score is based on the worst-case dissimilarity between pairs, a point can be closer to points in other clusters than to points in its own cluster. Clusters are compact, but not far enough apart

Shortcomings of single, complete linkage Single and complete linkage can have some practical problems:

I Single linkage suers from chaining. In order to merge two groups, only need one pair of points to be close, irrespective of all others. Therefore clusters can be too spread out, and not compact enough

I Complete linkage avoids chaining, but suers from crowding. Because its score is based on the worst-case dissimilarity between pairs, a point can be closer to points in other clusters than to points in its own cluster. Clusters are compact, but not far enough apart

I Average linkage tries to strike a balance. It uses average pairwise dissimilarity, so clusters tend to be relatively compact and relatively far apart

Shortcomings of average linkage

Average linkage isn't perfect, it has its own problems:

Shortcomings of average linkage

Average linkage isn't perfect, it has its own problems:

I It is not clear what properties the resulting clusters have when we cut an average linkage tree at given height

h.

Single and

complete linkage trees each had simple interpretations

Shortcomings of average linkage

Average linkage isn't perfect, it has its own problems:

I It is not clear what properties the resulting clusters have when we cut an average linkage tree at given height

h.

Single and

complete linkage trees each had simple interpretations

I Results of average linkage clustering can change with a

dij . If g g (x ) ≤ g (y ) whenever x ≤ y , and we used dissimilarites h (dij ) instead of dij , then we could get dierent

monotone increasing transformation of dissimilarities is such that

answers

Centroid Linkage I Centroid linkage is commonly used. Let x ¯G , x ¯H denote group averages for

G , H.

Then

dcentroid (G , H ) = d¯xG ¯xH

Centroid Linkage

I Centroid linkage score

dcentroid (G , H ) is the distance between

the group cen- troids (i.e., group averages)

I There is not really any good cut interpretation.

Shortcomings of centroid linkage

I Can produce dendrograms with inversions, which really messes up the visualization

Shortcomings of centroid linkage

I Can produce dendrograms with inversions, which really messes up the visualization

I Even if were we lucky enough to have no inversions, still no interpretation for the clusters resulting from cutting the tree

Shortcomings of centroid linkage

I Can produce dendrograms with inversions, which really messes up the visualization

I Even if were we lucky enough to have no inversions, still no interpretation for the clusters resulting from cutting the tree

I Answers change with a monotone transformation of the dissimilarity measure

Linkages in a nutshell

Linkage

No Inversions?

Single

X X X ×

Complete Average Centroid

Unchanged with monotone transformations?

X X × ×

Cut Interpretation?

Note

X X × ×

Chaining

Crowding

Simple

Linkages in a nutshell

Linkage

No Inversions?

Single

X X X ×

Complete Average Centroid

Unchanged with monotone transformations?

X X × ×

I This doesn't tell us what best linkage is

Cut Interpretation?

Note

X X × ×

Chaining

Crowding

Simple

Linkages in a nutshell

Linkage

No Inversions?

Single

X X X ×

Complete Average Centroid

Unchanged with monotone transformations?

Cut Interpretation?

Note

X X × ×

Chaining

X X × ×

I This doesn't tell us what best linkage is I choosing a linkage can be very situation dependent

Crowding

Simple

More Documents from "Arka Bose"