Luxury Timepiece Valuation
A comparative machine learning pipeline analyzing 45,000+ secondary market watches to predict pricing in a highly subjective, high-variance market.
Valuation Engine
Models Standby
Select watch features and click "Run Prediction Models" to compare the Linear Baseline against the Random Forest ensemble.
The Objective
The secondary market for luxury watches is dictated by subjective brand premiums and non-linear pricing structures (e.g., the price jump from a steel Rolex to a platinum Rolex is not simple addition). The goal was to build an ML pipeline capable of accurately predicting current market values based purely on physical characteristics, brand heritage, and condition.
Technical Implementation
Target Normalization
Initial EDA revealed a severe right-skew caused by ultra-luxury outliers. A Logarithmic Transformation was applied to normalize the distribution into a standard bell curve for distance-based algorithms.
Cardinality Reduction
To prevent the "curse of dimensionality" during One-Hot Encoding, cardinality reduction was applied. Only the top 30 brands and top 10 case materials were retained, aggregating the rest into an "Other" category.
Comparative Modeling
Evaluated two distinct families: a parametric Linear Regression baseline, and a non-parametric Random Forest Regressor (100 decision trees) to handle complex hierarchical categorical splits.
Inverse Transformation
Because models were trained on log-transformed data, predictions were inversely transformed using the exponential function to ensure Mean Absolute Error (MAE) could be interpreted in actual USD.
The Outcome
As hypothesized, the non-parametric Random Forest significantly outperformed the linear baseline. It improved the R-Squared score to 0.7262 and reduced the MAE to $6,682, excelling at learning complex pricing rules.