Apriori and Eclat Frequent Itemset Mining
Definition
Frequent itemset mining finds groups of items that appear together in many baskets.
An itemset $X$ is frequent if:
\[\text{support}(X) \ge s_{min}\]where $s_{min}$ is a minimum support threshold.
Support Count
For itemset $X$:
\[\text{support_count}(X) = \sum_i 1\{X \subseteq I_i\}\]where $I_i$ is the set of items in basket $i$.
Support as a probability is:
\[\text{support}(X) = \frac{\text{support_count}(X)}{n}\]Association Rule
A rule has the form:
\[A \to y\]where $A$ is an antecedent itemset and $y$ is a candidate item.
The confidence is:
\[P(y \mid A) = \frac{P(A \cup \{y\})}{P(A)}\]This is directly useful for basket completion.
Apriori
Apriori uses the anti-monotonicity property of support:
If an itemset is frequent, then all of its subsets must also be frequent.
Equivalently, if an itemset is not frequent, no larger itemset containing it can be frequent.
This lets Apriori prune the search space.
Eclat
Eclat mines frequent itemsets using transaction-id sets.
For each item, it stores the set of baskets containing that item.
The support of an itemset is found by intersecting transaction-id sets:
\[T(A \cup B) = T(A) \cap T(B)\]Then:
\[\text{support_count}(A \cup B) = |T(A) \cap T(B)|\]Apriori vs Eclat
Apriori is often described as candidate-generation over itemsets.
Eclat is often efficient for sparse transaction data because it works through set intersections.
Both are ways to find frequent itemsets; association rules are then derived from those itemsets.
Basket Completion
For basket completion, the useful rule form is:
\[A \to y\]where:
- $A$ = items already observed in the basket
- $y$ = possible remaining item
The estimated completion probability is:
\[\hat P(y \in I \mid A \subseteq I)\]