Clustering and Association

Clustering

Clustering is the process of grouping data points into clusters based on their similarity. This technique is useful for identifying patterns and relationships in data without the need for labeled examples. Cluster analysis finds the commonalities between the data objects and categorizes them as per the presence and absence of those commonalities.



Here are some clustering algorithms:

·       K-Means Clustering algorithm

·       Mean-shift algorithm

·       DBSCAN Algorithm

·       Principal Component Analysis

·       Independent Component Analysis

 

Association:

Association learning is a machine learning method for discovering interesting relations, called “association rules”, between variables in large databases using some measures of “interestingness”.

It determines the set of items that occurs together in the dataset. Association rule makes marketing strategy more effective.

Such as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket Analysis.

 

Example

Consider a supermarket chain. The management of the chain is interested in knowing whether there are any patterns in the purchases of products by customers like the following:

“If a customer buys onions and potatoes together, then he/she is likely to also buy hamburger.”

From the standpoint of customer behaviour, this defines an association between the set of products {onion, potato} and the set {burger}. This association is represented in the form of a rule as follows:

{ onion, potato} => burger

 

The measure of how likely a customer, who has bought onion and potato, to buy burger also is given by the conditional probability

P ((onion, potato}|(burger}).

If this conditional probability is 0.8, then the rule may be stated more precisely as follows: “80% of customers who buy onion and potato also buy burger.”

 

Algorithms

There are several algorithms for generating association rules. Some of the well-known algorithms are listed below:

a)      Apriori algorithm

b)      Eclat algorithm

c)      FP-Growth Algorithm (FP stands for Frequency Pattern)



 

Post a Comment

0 Comments