An Overview of Market Basket Analysis
Table of Content
1. What's a Market Basket Analysis (MBA)? 2. What's an Association Rule? 3. Most Common MBA Algorithms 4. Apriori Algorithm 4.1. Example 4.2. Apriori Algorithms: Sample Code 4.3. Apriori Alternative Algorithms 5. FP-Growth 6. Apriori vs. FP-Growth 7. What's Lift in MBA? 7.1. Lift Calculation in Python 8. Market Basket Post Analysis 8.1. Product Affinity Index 8.2. Leverage Ratio 8.3. Conviction Analysis 9. Market Basket Analysis: Applications 9.1. MBA in Product Assortment
1. What's a Market Basket Analysis (MBA)?Market Basket Analysis (MBA) is a data mining technique used to discover associations or relationships between items that are frequently purchased together. It analyzes customer transactions or purchase data to uncover patterns and trends, which can be used to inform marketing strategies, product placement, promotions, or cross-selling opportunities. The main idea behind Market Basket Analysis is that if a customer buys a certain set of items, they are more likely to buy another set of related items. For example, if a customer buys bread and butter, they may also buy milk and eggs. One of the most popular algorithms used in Market Basket Analysis is the Apriori algorithm, which generates association rules based on the support, confidence, and lift measures. Market Basket Analysis has various applications in retail, e-commerce, and other industries. Some common use cases include:1. Product placement: Identifying items that are frequently bought together can help with organizing store layouts or online catalogs to facilitate cross-selling and upselling.2. Promotions: By understanding the relationships between products, businesses can create targeted promotions or bundle deals to encourage additional purchases.3. Inventory management: Market Basket Analysis can provide insights into which items should be stocked together or reordered simultaneously to optimize inventory management.4. Personalized recommendations: By analyzing customer purchase history, businesses can generate personalized recommendations to improve customer experience and increase sales. Back to top 2. What's an Association Rule?An association rule is a rule that identifies relationships or patterns between items in a dataset, usually in the context of transactional data, such as purchase histories in retail or e-commerce settings. Association rules are a key component of Market Basket Analysis, a data mining technique used to uncover associations between items that are frequently purchased together. An association rule is typically represented in the form → XYX → is the antecedent or the set of items on the left side of the rule, and Y → is the consequent or the set of items on the right side of the rule. The rule suggests that when items in X are purchased, the items in Y are also likely to be purchased.For example, an association rule could be: BreadButter* This rule implies that when customers buy bread, they are also likely to buy butter. Association rules are often evaluated using three measures: Support ConfidenceLift These measures help determine the strength and significance of the associations. 1. Support: (a) The proportion of transactions that contain both the antecedent (X) and the consequent (Y). (b) It indicates the popularity or frequency of the rule in the dataset.(b) 2. Confidence: (a) The proportion of transactions with the antecedent (X) that also contain the consequent (Y). (b) It measures the likelihood of the consequent being purchased when the antecedent is purchased.2. 3. Lift: (a) The ratio of the observed support of both antecedent and consequent to the expected support if the antecedent and consequent were independent. (b) Lift indicates the strength of the relationship between the antecedent and consequent, with a value greater than 1 suggesting a positive association and a value less than 1 suggesting a negative association. Back to top 3. Most Common MBA AlgorithmsThe two most common algorithms for Market Basket Analysis are the Apriori algorithm and the FP-Growth (Frequent Pattern Growth) algorithm. Both algorithms are used to identify frequent itemsets and generate association rules. Here's a comparison between the two algorithms in terms of scalability: 1. Apriori Algorithm:The Apriori algorithm is based on the Apriori principle, which states that if an itemset is frequent, then all its subsets must also be frequent. This principle allows the algorithm to prune the search space by removing candidate itemsets that have infrequent subsets.The Apriori algorithm works in an iterative, level-wise manner. * It starts by finding frequent 1-itemsets, then extends the frequent itemsets to 2-itemsets, and so on. At each step, the algorithm generates candidate itemsets and prunes them based on the Apriori principle.Scalability: * The main drawback of the Apriori algorithm is that it generates a large number of candidate itemsets and requires multiple passes over the entire dataset to calculate the support for each candidate. This can result in poor scalability, especially for large datasets or when dealing with a high number of distinct items. 2. FP-Growth Algorithm:The FP-Growth algorithm is an improvement over the Apriori algorithm in terms of scalability. It avoids the generation of a large number of candidate itemsets by using a compact data structure called the FP-Tree (Frequent Pattern Tree) to store the transaction data.The FP-Growth algorithm works in two main steps: 1. It constructs the FP-Tree from the input dataset. 2. It extracts frequent itemsets from the FP-Tree using a divide-and-conquer approach.Scalability: * The FP-Growth algorithm is more scalable than the Apriori algorithm, as it doesn't require multiple passes over the entire dataset and doesn't generate a large number of candidate itemsets. * The FP-Tree allows the algorithm to explore the search space more efficiently, which results in better performance for large datasets and high numbers of distinct items. In summary, the FP-Growth algorithm is generally considered more scalable and efficient than the Apriori algorithm for Market Basket Analysis, especially for large datasets and when dealing with a high number of distinct items. However, both algorithms have their pros and cons, and the choice of the algorithm depends on the specific requirements and constraints of the problem being solved. Back to top 4. Apriori AlgorithmThe Apriori algorithm is a popular data mining technique used for market basket analysis. It aims to uncover frequent itemsets in transactional databases (such as retail store purchase data) and derive association rules between those items.The Apriori algorithm works based on the principle that a subset of a frequent itemset must also be frequent. The algorithm can be broken down into the following steps:1. Set a minimum support threshold: The minimum support threshold is a user-defined parameter that determines the minimum frequency an itemset must have to be considered significant. It is typically expressed as a percentage or proportion of the total number of transactions.2. Generate candidate itemsets: Initially, each item in the dataset is considered as a single-item candidate itemset (also known as 1-itemset).3. Calculate the support of candidate itemsets: Support is a measure of how frequently an itemset appears in the dataset. The support of an itemset X is defined as: Support(X)=(No.oftransactionscontainingX)(Totalno.oftransactions)1. 4. Retain only frequent itemsets: An itemset is considered frequent if its support is greater than or equal to the minimum support threshold.5. Generate higher-order candidate itemsets: To generate higher-order itemsets (k-itemsets), we join frequent (k-1)-itemsets with themselves. The algorithm ensures that only subsets of frequent itemsets are combined. This process is called the Apriori property.6. Repeat steps 3-5: The process of generating candidate itemsets, calculating their support, and retaining frequent itemsets is repeated until no new frequent itemsets are found.7. Derive association rules: Association rules are created from the frequent itemsets. The rules must satisfy a user-defined minimum confidence threshold, which is a measure of the likelihood that the presence of items in the antecedent (X) implies the presence of items in the consequent (Y) of a rule. Confidence is calculated as: Confidence(XY)=Support(XY)Support(X)4.1. Example Consider a dataset with five transactions:Bread, MilkBread, Diapers, Beer, EggsMilk, Diapers, Beer, CokeBread, Milk, Diapers, BeerBread, Milk, Diapers, Coke Assume a minimum support threshold of 40% and a minimum confidence threshold of 60%. Step 1-4: Find frequent 1-itemsets:Bread: 4/5 = 80%Milk: 4/5 = 80%Diapers: 4/5 = 80%Beer: 3/5 = 60%Eggs: 1/5 = 20%Coke: 2/5 = 40% Only Eggs do not meet the support threshold, so we exclude it from further analysis. Step 5-6: Find frequent 2-itemsets:Bread, Milk: 3/5 = 60%Bread, Diapers: 3/5 = 60%Bread, Beer: 2/5 = 40%Milk, Diapers: 3/5 = 60%Milk, Beer: 2/5 = 40%Milk, Coke: 1/5 = 20%Diapers, Beer: 3/5 = 60%Diapers, Coke: 2/5 = 40%Beer, Coke: 1/5 = 20% Milk, Coke, and Beer, Coke do not meet the support threshold. 4.2. Apriori Algorithms: Sample CodeHere's a sample Python code implementing the Apriori algorithm using the "mlxtend" library:
!pip install mlxtend import pandas as pdfrom mlxtend.preprocessing import TransactionEncoderfrom mlxtend.frequent_patterns import apriori, association_rules # Sample transaction datasettransactions = [ ['A', 'B', 'D', 'E'], ['B', 'C', 'E'], ['A', 'B', 'D', 'E'], ['A', 'B', 'C', 'E'], ['A', 'B', 'C', 'D', 'E'], ['B', 'C', 'D']] # Transform the dataset to a one-hot encoded DataFramete = TransactionEncoder()te_ary = te.fit(transactions).transform(transactions)df = pd.DataFrame(te_ary, columns=te.columns_) # Define minimum support thresholdmin_support = 0.5 # Generate frequent itemsets using Apriorifrequent_itemsets = apriori(df, min_support=min_support, use_colnames=True) # Define minimum confidence thresholdmin_confidence = 0.7 # Generate association rules from frequent itemsetsrules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence) # Print frequent itemsets and association rulesprint("Frequent Itemsets:")print(frequent_itemsets)print("\nAssociation Rules:")print(rules)
4.3. Apriori Alternative AlgorithmsAnother popular library for implementing the Apriori algorithm is the "efficient_apriori" library. It provides an easy-to-use and efficient implementation of the algorithm. Note that the "efficient_apriori" library is designed to be more efficient than other libraries, especially for large datasets with many items, as it uses an optimized Apriori implementation to improve performance.Below is a sample Python code implementing the Apriori algorithm using the "efficient_apriori" library:
!pip install efficient_apriori from efficient_apriori import apriori # Sample transaction datasettransactions = [ ('A', 'B', 'D', 'E'), ('B', 'C', 'E'), ('A', 'B', 'D', 'E'), ('A', 'B', 'C', 'E'), ('A', 'B', 'C', 'D', 'E'), ('B', 'C', 'D')] # Define minimum support and confidence thresholdsmin_support = 0.5min_confidence = 0.7 # Generate frequent itemsets and association rules using Aprioriitemsets, rules = apriori(transactions, min_support=min_support, min_confidence=min_confidence) # Print frequent itemsets and association rulesprint("Frequent Itemsets:")print(itemsets)print("\nAssociation Rules:")print(rules)
Back to top 5. FP-GrowthThe FP-Growth (Frequent Pattern Growth) algorithm is a frequent itemset mining technique designed to overcome the limitations of the Apriori algorithm, particularly its scalability issues. The FP-Growth algorithm uses a compact data structure called FP-Tree (Frequent Pattern Tree) to store the transaction data and efficiently mine frequent itemsets. Here's an elaborate explanation of the FP-Growth algorithm:Scan the dataset to find the support count of individual items. Remove items that don't meet the minimum support threshold.Order the remaining items in descending order based on their support counts. This order is called the F-list.Create the FP-Tree* Initialize an empty FP-Tree with a root node called "null".* For each transaction in the dataset, sort the items by their support count and insert them into the FP-Tree. · If the tree already has a branch containing the same items, increment the support count of the existing nodes.· If the tree doesn't have a branch with the same items, create a new branch with the ordered items, and set the support count of each node to 1.* Link nodes containing the same item across different branches using node-link pointers. · This helps in efficient traversal during the mining process.* Mine the FP-Tree to extract frequent itemsets* Start with the lowest support item in the F-list.* Follow the node-link pointers to find all occurrences of the item in the FP-Tree.* For each occurrence, trace the path from the item node to the root node. This path represents a conditional pattern base, a sub-dataset that contains the item along with its associated items.* Create a conditional FP-Tree for each item using the conditional pattern bases. This tree represents the frequent co-occurrences of the item with other items.* Recursively mine the conditional FP-Tree to find frequent itemsets that include the item being considered.* Continue the process for all items in the F-list. Here's a sample Python code implementing the FP-Growth algorithm using the "pyfpgrowth" library:
!pip install pyfpgrowth import pyfpgrowth # Sample transaction datasettransactions = [ ['A', 'B', 'D', 'E'], ['B', 'C', 'E'], ['A', 'B', 'D', 'E'], ['A', 'B', 'C', 'E'], ['A', 'B', 'C', 'D', 'E'], ['B', 'C', 'D']] # Define minimum support thresholdmin_support = 3 # Generate frequent itemsets using FP-Growthpatterns = pyfpgrowth.find_frequent_patterns(transactions, min_support) # Define minimum confidence thresholdmin_confidence = 0.7 # Generate association rules from frequent itemsetsrules = pyfpgrowth.generate_association_rules(patterns, min_confidence) # Print frequent itemsets and association rulesprint("Frequent Itemsets:")print(patterns)print("\nAssociation Rules:")print(rules)
Back to top 6. Apriori vs. FP-GrowthThe key differences between the FP-Growth and Apriori methods are: Data structureFP-Growth uses a compact data structure called the FP-Tree (Frequent Pattern Tree) to represent the transaction database. * This tree structure allows FP-Growth to efficiently traverse and mine the frequent itemsets. In contrast, Apriori uses a more straightforward candidate generation and pruning approach, iterating through itemsets without a tree structure. Candidate generationThe Apriori algorithm generates candidate itemsets in multiple iterations by joining the frequent (k-1)-itemsets with themselves and then prunes the candidates that do not meet the minimum support threshold. * This process can lead to a large number of candidate itemsets, especially when the dataset has a high number of distinct items, which can negatively impact the algorithm's performance.On the other hand, the FP-Growth algorithm does not generate candidate itemsets explicitly. * Instead, it constructs the FP-Tree and mines frequent itemsets directly from the tree by recursively traversing and dividing it into smaller conditional FP-Trees. * This approach eliminates the need for candidate generation and pruning, resulting in a more efficient mining process. Scalability and performanceThe Apriori algorithm can be slow and memory-intensive when handling large datasets or low minimum support thresholds, primarily due to the multiple iterations, candidate generation, and pruning steps. The FP-Growth algorithm, with its compact FP-Tree representation and elimination of candidate generation, tends to be more scalable and efficient, especially in large-scale datasets or when the minimum support threshold is low. ComplexityWhile the FP-Growth algorithm is more efficient, it is also more complex to understand and implement compared to the Apriori algorithm. The Apriori algorithm's simplicity makes it easier to understand and implement, but it may not be the best choice for large-scale datasets or low support thresholds due to its performance limitations. In summary, the FP-Growth method is an alternative to the Apriori method for discovering frequent itemsets in transactional databases. It employs a different data structure (FP-Tree) and does not rely on explicit candidate generation, resulting in better scalability and performance, especially for large datasets or low support thresholds. However, FP-Growth is more complex to understand and implement compared to the Apriori method. Back to top 7. What's Lift in MBA?Lift is a metric used in market basket analysis to measure the strength of association between items in an association rule. It provides insights into how much more likely an item (or set of items) is to be purchased when another item (or set of items) is already in the basket, compared to its probability of being purchased independently. Lift is calculated using the following formula: Lift(XY)=Confidence(XY)Support(Y) X is the antecedent (the item or set of items already in the basket)Y is the consequent (the item or set of items we want to assess the association with)Confidence(XY) is the probability of Y being in the basket, given X is already there.Support(Y) is the probability of Y being in the basket independently (without considering X). If Lift(XY)>1, it means that the presence of X in the basket increases the likelihood of Y also being in the basket. A Lift(XY)=1 indicates that items X and Y are purchased independently, and there is no association between them. If Lift(X=>Y)<1, it means that the presence of X in the basket decreases the likelihood of Y being in the basket. Example: Using the previous example with the following transactions:Bread, MilkBread, Diapers, Beer, EggsMilk, Diapers, Beer, CokeBread, Milk, Diapers, BeerBread, Milk, Diapers, CokeLet's calculate the lift for the rule Bread => Milk. Support(Bread) = 4/5 = 80%Support(Milk) = 4/5 = 80%Support(Bread, Milk) = 3/5 = 60% Confidence(Bread => Milk) = Support(Bread, Milk) / Support(Bread) = 0.60 / 0.80 = 0.75 Now, we can calculate the lift:Lift(Bread => Milk) = Confidence(Bread => Milk) / Support(Milk) = 0.75 / 0.80 = 0.9375 Since the lift is less than 1, the presence of Bread in the basket slightly decreases the likelihood of Milk being in the basket. 7.1. Lift Calculation in PythonNow, let's demonstrate how to calculate the lift using Python:
transactions = [ ['Bread', 'Milk'], ['Bread', 'Diapers', 'Beer', 'Eggs'], ['Milk', 'Diapers', 'Beer', 'Coke'], ['Bread', 'Milk', 'Diapers', 'Beer'], ['Bread', 'Milk', 'Diapers', 'Coke']] def support(item, transactions): count = 0 for transaction in transactions: if item in transaction: count += 1 return count / len(transactions) def joint_support(item1, item2, transactions): count = 0 for transaction in transactions: if item1 in transaction and item2 in transaction: count += 1 return count / len(transactions) def confidence(item1, item2, transactions): p_item1 = support(item1, transactions) p_item1_item2 = joint_support(item1, item2, transactions) if p_item1 == 0: return 0 return p_item1_item2 / p_item1 def lift(item1, item2, transactions): p_item2 = support(item2, transactions) conf_item1_item2 = confidence(item1, item2, transactions) if p_item2 == 0: return 0 return conf_item1_item2 / p_item2 antecedent = 'Bread'consequent = 'Milk' lift_value = lift(antecedent, consequent, transactions)print(f"Lift({antecedent} -> {consequent}) = {lift_value}")
Back to top 8. Market Basket Post AnalysisThe results of an Apriori market basket analysis can serve as a foundation for further analyses and investigations into customer behavior, product performance, and business optimization. Here are some additional analyses you can perform based on the results: Lift analysis: Calculate the lift for each association rule to understand the strength of the relationship between items. This can help you identify the most significant product associations and prioritize marketing efforts, promotions, and product placements accordingly. Conviction analysis: Conviction is another measure of the association rule's strength, which can be calculated to identify important associations. Conviction measures how dependent the antecedent (X) is on the consequent (Y) and can reveal associations that might not be evident from the lift or confidence measures alone. Leverage and leverage ratio: Calculate the leverage and leverage ratio for each association rule to assess the difference between the observed support of the itemset and the support that would be expected if the items were independently distributed. These measures can help you identify associations that have a higher-than-expected frequency. Temporal analysis: Analyze the results over time to identify trends, seasonality, or changes in customer behavior. This can help you adapt your product assortment, promotions, and marketing efforts to evolving customer needs and preferences. Clustering analysis: Use clustering algorithms to group customers based on their purchase patterns, as revealed by the Apriori analysis. This can help you create more targeted marketing campaigns, improve customer segmentation, and personalize the shopping experience. Comparative analysis: Compare the results of market basket analysis across different locations, customer segments, or time periods to identify variations in product associations and customer behavior. This can help you tailor your strategies to cater to specific markets or customer groups. Profitability analysis: Incorporate profit margins and costs associated with each product in the analysis to determine the most profitable product associations. This can help you prioritize marketing efforts, product placements, and promotions that maximize revenue and profitability. Product affinity index: Calculate the product affinity index for each association rule, which measures the likelihood of items being purchased together compared to being purchased separately. This can help you identify the most attractive product pairs or sets to promote. Itemset visualization: Create visual representations of the frequent itemsets and association rules using graphs, heatmaps, or other visualization techniques. This can help you better understand the relationships between items and communicate the results to stakeholders. Text mining: If your dataset includes textual data such as product reviews or customer feedback, you can perform text mining to analyze the sentiment, topics, or themes associated with specific products or product associations. This can provide additional insights into customer preferences and potential areas for improvement. 8.1. Product Affinity IndexThe Product Affinity Index (PAI) is a metric used to measure the strength of the relationship between two products. It calculates the likelihood of two products being purchased together compared to the probability of them being purchased separately. The higher the PAI, the stronger the association between the products. The formula for calculating the Product Affinity Index is as follows: PAI(X,Y)=P(X,Y)P(X).P(Y)Here,X and Y are two products.P(X,Y) is the joint probability of X and Y being purchased together, which is the same as the Support(X,Y) in the Apriori algorithm.P(X) and P(Y) are the individual probabilities of X and Y being purchased separately, which are the same as the Support(X) and Support(Y) in the Apriori algorithm. To calculate the PAI, follow these steps:1. Calculate the support of individual items (X and Y) by counting the number of transactions containing each item and dividing by the total number of transactions.2. Calculate the support of the itemset (X, Y) by counting the number of transactions containing both X and Y, and dividing by the total number of transactions.3. Apply the PAI formula to determine the Product Affinity Index. Example: Consider a dataset with the following transactions:Bread, MilkBread, Diapers, Beer, EggsMilk, Diapers, Beer, CokeBread, Milk, Diapers, BeerBread, Milk, Diapers, Coke Let's calculate the PAI for the products Bread and Milk.Step 1:* P(Bread) = Support(Bread) = 4/5 = 0.8* P(Milk) = Support(Milk) = 4/5 = 0.8Step 2:* P(Bread, Milk) = Support(Bread, Milk) = 3/5 = 0.6Step 3:* PAI(Bread, Milk) = P(Bread, Milk) / (P(Bread) * P(Milk)) = 0.6 / (0.8 * 0.8) = 0.9375* A PAI of 0.9375 indicates that Bread and Milk have a relatively strong association, suggesting that customers are more likely to purchase these items together compared to purchasing them separately. Here's a Python code snippet that demonstrates how to calculate the Product Affinity Index (PAI) for a given dataset of transactions:
transactions = [ ['Bread', 'Milk'], ['Bread', 'Diapers', 'Beer', 'Eggs'], ['Milk', 'Diapers', 'Beer', 'Coke'], ['Bread', 'Milk', 'Diapers', 'Beer'], ['Bread', 'Milk', 'Diapers', 'Coke']] def support(item, transactions): count = 0 for transaction in transactions: if item in transaction: count += 1 return count / len(transactions) def joint_support(item1, item2, transactions): count = 0 for transaction in transactions: if item1 in transaction and item2 in transaction: count += 1 return count / len(transactions) def product_affinity_index(item1, item2, transactions): p_item1 = support(item1, transactions) p_item2 = support(item2, transactions) p_item1_item2 = joint_support(item1, item2, transactions) if p_item1 * p_item2 == 0: return 0 return p_item1_item2 / (p_item1 * p_item2) item1 = 'Bread'item2 = 'Milk' pai = product_affinity_index(item1, item2, transactions)print(f"PAI({item1}, {item2}) = {pai}")
8.2. Leverage RatioThe leverage ratio, also known as the Piatetsky-Shapiro measure, is a metric used in association rule mining to measure the strength of the association between two items. It compares the observed frequency of the two items being purchased together with the expected frequency if the items were purchased independently. The leverage ratio can range from -1 to 1, with a value of 0 indicating that the items are independent, while positive values indicate a positive association and negative values indicate a negative association. The formula for calculating the leverage ratio is as follows: Leverage(X,Y)=P(X,Y)-P(X).P(Y) Here,X and Y are two items.P(X,Y) is the joint probability of X and Y being purchased together, which is the same as the Support(X,Y) in the Apriori algorithm.P(X) and P(Y) are the individual probabilities of X and Y being purchased separately, which are the same as the Support(X) and Support(Y) in the Apriori algorithm. Now, let's demonstrate how to calculate the leverage ratio using Python:
transactions = [ ['Bread', 'Milk'], ['Bread', 'Diapers', 'Beer', 'Eggs'], ['Milk', 'Diapers', 'Beer', 'Coke'], ['Bread', 'Milk', 'Diapers', 'Beer'], ['Bread', 'Milk', 'Diapers', 'Coke']] def support(item, transactions): count = 0 for transaction in transactions: if item in transaction: count += 1 return count / len(transactions) def joint_support(item1, item2, transactions): count = 0 for transaction in transactions: if item1 in transaction and item2 in transaction: count += 1 return count / len(transactions) def leverage_ratio(item1, item2, transactions): p_item1 = support(item1, transactions) p_item2 = support(item2, transactions) p_item1_item2 = joint_support(item1, item2, transactions) return p_item1_item2 - (p_item1 * p_item2) item1 = 'Bread'item2 = 'Milk' leverage = leverage_ratio(item1, item2, transactions)print(f"Leverage({item1}, {item2}) = {leverage}")
8.3. Conviction AnalysisConviction is a metric used in association rule mining to measure the strength of an association rule. It measures how dependent the antecedent (X) is on the consequent (Y) by comparing the observed frequency of the antecedent occurring without the consequent to the expected frequency if the antecedent and consequent were independent. The formula for calculating the conviction of an association rule (XY) is as follows: Conviction(XY)=1-P(Y)1-Confidence(XY) Here,X is the antecedent, and Y is the consequent of the association rule.P(Y) is the probability of Y being purchased, which is the same as the Support(Y) in the Apriori algorithm.Confidence(XY) is the confidence of the association rule, which can be calculated as P(X,Y)P(X). Now, let's demonstrate how to calculate the conviction using Python:
transactions = [ ['Bread', 'Milk'], ['Bread', 'Diapers', 'Beer', 'Eggs'], ['Milk', 'Diapers', 'Beer', 'Coke'], ['Bread', 'Milk', 'Diapers', 'Beer'], ['Bread', 'Milk', 'Diapers', 'Coke']] def support(item, transactions): count = 0 for transaction in transactions: if item in transaction: count += 1 return count / len(transactions) def joint_support(item1, item2, transactions): count = 0 for transaction in transactions: if item1 in transaction and item2 in transaction: count += 1 return count / len(transactions) def confidence(item1, item2, transactions): p_item1 = support(item1, transactions) p_item1_item2 = joint_support(item1, item2, transactions) if p_item1 == 0: return 0 return p_item1_item2 / p_item1 def conviction(item1, item2, transactions): p_item2 = support(item2, transactions) conf_item1_item2 = confidence(item1, item2, transactions) if 1 - p_item2 == 0: return float('inf') return (1 - p_item2) / (1 - conf_item1_item2) antecedent = 'Bread'consequent = 'Milk' conv = conviction(antecedent, consequent, transactions)print(f"Conviction({antecedent} -> {consequent}) = {conv}")
Back to top 9. Market Basket Analysis: ApplicationsThe results of an Apriori market basket analysis can provide valuable insights into customer behavior and product associations, which can be used to make data-driven decisions to improve various aspects of your business. Here are some ideas on how to use the results: Cross-selling and up-selling: Leverage the discovered product associations to create targeted cross-selling and up-selling strategies, such as offering discounts on related products, bundling products together, or recommending additional items based on the customer's current shopping cart. Product placement: Use the insights from the analysis to optimize your store layout or online platform by placing associated products near each other or creating dedicated sections for complementary items. This can encourage customers to purchase related products together, increasing the average transaction value. Targeted promotions: Design promotions and marketing campaigns based on the discovered item associations. For example, offer a special promotion on a product when customers purchase another frequently associated product or create personalized offers for specific customer segments based on their purchase history. Inventory management: Use the results of the analysis to inform your inventory management strategies. Ensure that products with strong associations are stocked together or nearby in the warehouse or storage area, making it easier to restock shelves and minimizing the risk of out-of-stock situations. Customer segmentation: Analyze the purchase patterns of different customer segments to understand their preferences and needs better. Tailor your product assortment, pricing, promotions, and marketing efforts to cater to each segment's specific requirements. New product introduction: When considering adding new products to your inventory, use the market basket analysis results to assess how well they might fit with your existing product assortment. This can help predict potential sales and cross-selling opportunities, ensuring that the new product has a higher chance of success. Seasonal or regional trends: Identify seasonal or regional patterns in product associations to tailor your assortment, promotions, and marketing efforts accordingly. This can help you capitalize on specific trends and optimize your offerings for different times of the year or various locations. Pricing strategy: Utilize the insights from the analysis to develop pricing strategies that incentivize customers to purchase associated products together. For example, you could offer tiered pricing or quantity discounts on related items. Customer support: Enhance customer support by using the insights from the analysis to anticipate customers' needs and preferences. This can help customer support agents offer more relevant product recommendations and solutions during interactions with customers. Improve customer experience: Use the results to create a more intuitive and personalized shopping experience for your customers, both online and offline. This can lead to higher customer satisfaction, increased loyalty, and ultimately, higher revenue. 9.1. MBA in Product AssortmentMarket basket analysis can be an invaluable tool for optimizing product assortment in a grocery store. By analyzing the relationships and associations between products, you can make data-driven decisions to enhance the customer experience, increase sales, and optimize inventory management. Here's how you can use market basket analysis for product assortment decisions: Identify frequently purchased items: By determining the most frequently purchased items (high-support itemsets), you can prioritize these products in your assortment to ensure their availability and visibility. This may include adjusting inventory levels and prominent shelf placements. Discover product associations: Analyze the association rules generated by market basket analysis to understand which products are frequently purchased together. Use this information to make strategic decisions, such as:Cross-selling: Promote complementary products together (e.g., offering discounts or bundling them) to encourage customers to buy related items, increasing the average transaction value.Product placement: Place associated products near each other on store shelves to make it convenient for customers to find and purchase them together. This can improve the shopping experience and potentially increase sales.Inventory management: Stock products with strong associations together in the warehouse or storage area to streamline the restocking process and minimize out-of-stock occurrences. Optimize promotions: By understanding which products are frequently bought together, you can create targeted promotions that appeal to specific customer segments. For example, if you find that customers who purchase diapers also frequently buy baby food, you can offer a special promotion on baby food when diapers are purchased. Analyze product categories: Market basket analysis can help you understand the relationships between product categories, allowing you to optimize the overall product mix. For example, if certain product categories are consistently purchased together, consider allocating more space to those categories in your store layout or online platform. Improve customer segmentation: Analyze the product associations for different customer segments to understand their unique preferences and shopping habits. Tailor your product assortment, promotions, and marketing efforts to cater to each segment's specific needs. Evaluate new products: When considering the introduction of new products, use market basket analysis to evaluate their potential fit within your existing product assortment. Analyze the associations between the new product and your current products to predict the impact on sales and the potential for cross-selling opportunities.