Random ForestData EngineeringScikit-Learn

Luxury Timepiece Valuation

A comparative machine learning pipeline analyzing 45,000+ secondary market watches to predict pricing in a highly subjective, high-variance market.

Valuation Engine

Models Standby

Select watch features and click "Run Prediction Models" to compare the Linear Baseline against the Random Forest ensemble.

The Objective

The secondary market for luxury watches is dictated by subjective brand premiums and non-linear pricing structures (e.g., the price jump from a steel Rolex to a platinum Rolex is not simple addition). The goal was to build an ML pipeline capable of accurately predicting current market values based purely on physical characteristics, brand heritage, and condition.

Technical Implementation

Target Normalization

Initial EDA revealed a severe right-skew caused by ultra-luxury outliers. A Logarithmic Transformation was applied to normalize the distribution into a standard bell curve for distance-based algorithms.

Cardinality Reduction

To prevent the "curse of dimensionality" during One-Hot Encoding, cardinality reduction was applied. Only the top 30 brands and top 10 case materials were retained, aggregating the rest into an "Other" category.

Comparative Modeling

Evaluated two distinct families: a parametric Linear Regression baseline, and a non-parametric Random Forest Regressor (100 decision trees) to handle complex hierarchical categorical splits.

Inverse Transformation

Because models were trained on log-transformed data, predictions were inversely transformed using the exponential function to ensure Mean Absolute Error (MAE) could be interpreted in actual USD.

The Outcome

As hypothesized, the non-parametric Random Forest significantly outperformed the linear baseline. It improved the R-Squared score to 0.7262 and reduced the MAE to $6,682, excelling at learning complex pricing rules.