Overview
The Model Retraining API allows authorized users to trigger retraining of the demand prediction model with updated transaction data. This endpoint fetches the latest purchase and sales data, trains a new model, and deploys it for predictions.Base URL
Retrain Model
Request Body
Start date for training data (ISO 8601: YYYY-MM-DD). Optional - defaults to earliest available data.
End date for training data (ISO 8601: YYYY-MM-DD). Optional - defaults to most recent data.
If both
start_date and end_date are omitted, the model will train on all available historical data.Response
Version identifier of the newly trained model
Model performance metrics on test set
Mean Absolute Error
Root Mean Square Error
Mean Absolute Percentage Error
R-squared (coefficient of determination)
Training completion timestamp (ISO 8601)
Dataset split information
Number of training samples
Number of test samples
Total number of samples
Status Codes
200 OK- Model retrained successfully400 Bad Request- Invalid date range or parameters401 Unauthorized- Missing or invalid authentication403 Forbidden- Insufficient permissions500 Internal Server Error- Training failed
Example
Retraining Process
The retraining pipeline follows these steps:1. Data Extraction
- Fetches purchase and sales transactions from
sgivu-purchase-saleservice - Filters by date range if specified
- Aggregates sales by vehicle segment (brand, model, line) and month
2. Feature Engineering
- Creates time series features (month, year, seasonality)
- Computes segment-level statistics
- Generates lag features and rolling aggregates
3. Model Training
- Splits data into training (80%) and test (20%) sets
- Trains time series forecasting model (typically ARIMA, Prophet, or ML-based)
- Validates on test set
4. Model Evaluation
- Calculates performance metrics (MAE, RMSE, MAPE, R²)
- Compares against previous model if available
5. Model Deployment
- Persists model artifacts using
joblib - Optionally saves to PostgreSQL
- Updates model metadata
- Activates as the current production model
Model Versioning
Each trained model is assigned a version identifier: Format:v{major}.{minor}.{patch}-{timestamp}
Example: v1.3.0-20260306
- major.minor.patch - Semantic version based on algorithm changes
- timestamp - Training date in YYYYMMDD format
Version Management
- New models automatically become the active model for predictions
- Previous model versions are retained for rollback if needed
- Model artifacts can be stored in both filesystem and database
Training Data Requirements
Minimum Data Volume
For reliable models, ensure:- Time range: At least 12 months of historical data
- Segment coverage: Multiple vehicle segments with sufficient transactions
- Sample size: Minimum 100 transactions per segment recommended
Data Quality
- Missing or null values are handled automatically
- Outliers are detected and may be excluded
- Segments with insufficient data are skipped
Performance Metrics Explained
Mean Absolute Error (MAE)
Average absolute difference between predicted and actual sales.- Units: Same as target variable (number of vehicles)
- Lower is better
- Example: MAE of 1.32 means predictions are off by ~1-2 vehicles on average
Root Mean Square Error (RMSE)
Square root of average squared errors. Penalizes large errors more heavily.- Units: Same as target variable
- Lower is better
- Typical range: Slightly higher than MAE
Mean Absolute Percentage Error (MAPE)
Average percentage error relative to actual values.- Units: Percentage (0-1)
- Lower is better
- Example: MAPE of 0.14 means 14% average error
R-squared (R²)
Proportion of variance explained by the model.- Range: 0-1 (can be negative for poor models)
- Higher is better
- Example: R² of 0.87 means model explains 87% of variance
Interpreting Metrics
Good Model:- MAE < 2.0
- RMSE < 3.0
- MAPE < 0.20 (20%)
- R² > 0.75
- MAE < 1.0
- RMSE < 1.5
- MAPE < 0.10 (10%)
- R² > 0.90
Best Practices
Retraining Frequency
Recommended schedule:- Weekly: For high-volume dealerships with frequent transactions
- Monthly: For moderate-volume operations
- Quarterly: For low-volume or stable markets
When to Retrain
Trigger retraining when:- New transaction data is available
- Model performance degrades (predictions become less accurate)
- Significant market changes occur (seasonality, economic shifts)
- New vehicle models are introduced
Monitoring Model Performance
Track these indicators:- Prediction accuracy on recent data
- Residual analysis (actual vs. predicted)
- Metric trends over time
- Coverage (percentage of segments with predictions)
Retraining Schedule
You can automate retraining using cron jobs or scheduled tasks:Example: Weekly Cron Job
Example: Python Scheduled Task
Error Responses
400 Bad Request - Invalid Date Range
400 Bad Request - Insufficient Data
403 Forbidden
500 Internal Server Error - Training Failed
Storage and Persistence
Model Artifacts
Trained models are persisted usingjoblib:
- Location: Configured via
MODEL_DIRenvironment variable - Format: Pickled scikit-learn/XGBoost models
- Naming:
model_{version}.pkl
Database Persistence
Optionally, model metadata and artifacts can be stored in PostgreSQL:- Table:
ml_model_artifacts - Fields: version, metrics, trained_at, artifact_data (binary)
Feature Snapshots
Training features can be saved for reproducibility:- Table:
ml_training_features - Purpose: Audit, debugging, retraining experiments
Rollback and Model Management
If a newly trained model performs poorly:- Manual Rollback: Replace current model file with previous version
- Database Rollback: Update active model pointer in database
- Verify: Test predictions with
/predictendpoint
Future versions will include an API endpoint for automated model rollback.