Apriori and Eclat Frequent Itemset Mining

Definition

Frequent itemset mining finds groups of items that appear together in many baskets.

An itemset $X$ is frequent if:

\[\text{support}(X) \ge s_{min}\]

where $s_{min}$ is a minimum support threshold.

Support Count

For itemset $X$:

\[\text{support_count}(X) = \sum_i 1\{X \subseteq I_i\}\]

where $I_i$ is the set of items in basket $i$.

Support as a probability is:

\[\text{support}(X) = \frac{\text{support_count}(X)}{n}\]

Association Rule

A rule has the form:

\[A \to y\]

where $A$ is an antecedent itemset and $y$ is a candidate item.

The confidence is:

\[P(y \mid A) = \frac{P(A \cup \{y\})}{P(A)}\]

This is directly useful for basket completion.

Apriori

Apriori uses the anti-monotonicity property of support:

If an itemset is frequent, then all of its subsets must also be frequent.

Equivalently, if an itemset is not frequent, no larger itemset containing it can be frequent.

This lets Apriori prune the search space.

Eclat

Eclat mines frequent itemsets using transaction-id sets.

For each item, it stores the set of baskets containing that item.

The support of an itemset is found by intersecting transaction-id sets:

\[T(A \cup B) = T(A) \cap T(B)\]

Then:

\[\text{support_count}(A \cup B) = |T(A) \cap T(B)|\]

Apriori vs Eclat

Apriori is often described as candidate-generation over itemsets.

Eclat is often efficient for sparse transaction data because it works through set intersections.

Both are ways to find frequent itemsets; association rules are then derived from those itemsets.

Basket Completion

For basket completion, the useful rule form is:

\[A \to y\]

where:

$A$ = items already observed in the basket
$y$ = possible remaining item

The estimated completion probability is:

\[\hat P(y \in I \mid A \subseteq I)\]

Apriori and Eclat Frequent Itemset Mining

Definition

Support Count

Association Rule

Apriori

Eclat

Apriori vs Eclat

Basket Completion

See

Conditional Probability

Statistics Review

Data Mining

Apriori and Eclat Frequent Itemset Mining

Definition

Support Count

Association Rule

Apriori

Eclat

Apriori vs Eclat

Basket Completion

See

Conditional Probability

Statistics Review

Data Mining

Sessions by Day

Productivity by Hour

Session Completion Rate

Time Spent by Task

Sessions by Day of Week

Session Duration Distribution