Loading graph…

Apriori and Eclat Frequent Itemset Mining

Definition

Frequent itemset mining finds groups of items that appear together in many baskets.

An itemset $X$ is frequent if:

\[\text{support}(X) \ge s_{min}\]

where $s_{min}$ is a minimum support threshold.

Support Count

For itemset $X$:

\[\text{support_count}(X) = \sum_i 1\{X \subseteq I_i\}\]

where $I_i$ is the set of items in basket $i$.

Support as a probability is:

\[\text{support}(X) = \frac{\text{support_count}(X)}{n}\]

Association Rule

A rule has the form:

\[A \to y\]

where $A$ is an antecedent itemset and $y$ is a candidate item.

The confidence is:

\[P(y \mid A) = \frac{P(A \cup \{y\})}{P(A)}\]

This is directly useful for basket completion.

Apriori

Apriori uses the anti-monotonicity property of support:

If an itemset is frequent, then all of its subsets must also be frequent.

Equivalently, if an itemset is not frequent, no larger itemset containing it can be frequent.

This lets Apriori prune the search space.

Eclat

Eclat mines frequent itemsets using transaction-id sets.

For each item, it stores the set of baskets containing that item.

The support of an itemset is found by intersecting transaction-id sets:

\[T(A \cup B) = T(A) \cap T(B)\]

Then:

\[\text{support_count}(A \cup B) = |T(A) \cap T(B)|\]

Apriori vs Eclat

Apriori is often described as candidate-generation over itemsets.

Eclat is often efficient for sparse transaction data because it works through set intersections.

Both are ways to find frequent itemsets; association rules are then derived from those itemsets.

Basket Completion

For basket completion, the useful rule form is:

\[A \to y\]

where:

  • $A$ = items already observed in the basket
  • $y$ = possible remaining item

The estimated completion probability is:

\[\hat P(y \in I \mid A \subseteq I)\]

See

25

25
Ready to start
Apriori and Eclat Frequent Itemset Mining
Session: 1 | Break: Short
Today: 0 sessions
Total: 0 sessions