Customer Behavior Feature Engineering
Definition
Customer behavior feature engineering converts raw transaction histories into customer-level variables.
The unit changes from a transaction line to a customer.
Customer Order Table
First, transaction lines are aggregated to invoice-level orders.
For customer $ and order $:
7648 \text{OrderRevenue}{ij} = \sum_l \text{Quantity}{ijl} \times \text{UnitPrice}_{ijl} 7648
7648 \text{OrderItemQuantity}{ij} = \sum_l \text{Quantity}{ijl} 7648
7648 \text{OrderDistinctItems}{ij} = |{\text{StockCode}{ijl}}| 7648
Common Customer Features
For a cutoff date $, useful customer features include:
- order count
- total revenue
- average order value
- average items per order
- recency in days
- orders per month
- repeat gap in days
- seasonality index
- one-time buyer flag
Recency
Recency measures how long it has been since the last observed order:
7648 \text{Recency}i = t_0 - \max_j(\text{OrderDate}{ij}) 7648
Smaller recency means the customer bought more recently.
Frequency / Order Rate
A customer order rate can be estimated as:
7648 \text{OrdersPerMonth}_i = \frac{\text{OrderCount}_i}{\text{CustomerAgeDays}_i / 30.4375} 7648
This is a rate, not a probability.
Average Order Value
Average order value is:
7648 \text{AOV}i = \frac{1}{n_i}\sum{j=1}^{n_i}\text{OrderRevenue}_{ij} 7648
where $ is the number of historical orders for customer $.
Repeat Gap
For repeat customers, average repeat gap is the average time between consecutive orders:
7648 \text{RepeatGap}i = \frac{1}{n_i - 1}\sum{j=2}^{n_i}(t_{ij} - t_{i,j-1}) 7648
For one-time customers, this gap is not directly observed and must be handled explicitly.
Seasonality Index
A simple seasonality index is:
7648 \text{SeasonalityIndex}i = \frac{\max_m \text{Orders}{i,m}}{\text{OrderCount}_i} 7648
It is close to 1 when a customer orders mostly in one month and lower when orders are spread across months.