{% extends "layout.html" %} {% block content %} Decision Tree Regression - Interactive Flow Visualization

Decision Tree Regression (DTR) Visualization

Explore how DTR predicts continuous values with interactive examples.

What is Decision Tree Regression?

It’s a smart algorithm that predicts numbers — like house prices or temperatures — by splitting data into smaller groups based on features. Imagine a tree where each branch asks a question, and leaves give the final prediction.

How Does It Work?

The tree splits data recursively, choosing the best points to divide so that each group is as similar as possible. It stops splitting when the groups are small or deep enough.

Variance Reduction: Splits aim to reduce differences within groups.
Mean Squared Error (MSE): The algorithm picks splits that minimize prediction errors.

Making Predictions

To predict a new value, the data point travels down the tree following the split rules until it reaches a leaf. The prediction is the average of all training points in that leaf.

Key Hyperparameters

max_depth: Limits tree height to avoid overfitting.
min_samples_split: Minimum data points to split a node.
min_samples_leaf: Minimum data points in a leaf.

Comparison with Other Models

Decision Tree vs. Linear Regression: DTR can model non-linear relationships, whereas Linear Regression assumes a linear relationship. DTR is generally more flexible but also more prone to overfitting.
Decision Tree vs. SVR (Support Vector Regression): SVR is a powerful model that finds the best fit line (or hyperplane) while tolerating some error. SVR can be very effective but is often more complex to tune than DTR.
Decision Tree vs. Random Forest: Random Forest is an ensemble of Decision Trees. It builds multiple trees and averages their predictions. This significantly reduces variance and improves stability, making it a much better and more common choice in practice than a single DTR.

Key Hyperparameters (Detailed)

max_depth: How deep the tree can grow.
👉 Bigger depth = tree keeps splitting → very detailed → more overfitting.
👉 Smaller depth = tree stops early → simpler → less overfitting.
min_samples_split: Minimum samples needed to split a node.
👉 Smaller value (like 2) = splits happen easily → more overfitting.
👉 Larger value (like 10) = splits happen only with many samples → less overfitting.
Example: If min_samples_split=2, even 2 points can split → tree memorizes tiny patterns. If min_samples_split=10, need 10+ points to split → tree generalizes.
min_samples_leaf: Minimum samples in a leaf node.
👉 Smaller value (like 1) = tiny leaves → more overfitting.
👉 Larger value (like 5 or 10) = bigger leaves → less overfitting.
Example: If min_samples_leaf=1, each data point might get its own leaf. If min_samples_leaf=10, each leaf must cover at least 10 points → tree generalizes.
max_features: Number of features considered at each split.
👉 Smaller value = fewer features per split → adds randomness → can reduce overfitting (especially in Random Forests).
👉 Larger value = considers all features → risk of overfitting.

✅ Summary Memory Trick:
Big numbers (min_samples_split ↑, min_samples_leaf ↑) + small max_depth ↓ → simpler tree → less overfitting.
Small numbers (min_samples_split ↓, min_samples_leaf ↓) + big max_depth ↑ → complex tree → more overfitting.

decis

Why Use DTR?

Easy to understand and visualize.
Captures complex, non-linear relationships.
No need to scale features.

Limitations

Can overfit if not controlled.
Small data changes can cause big model changes.
Less stable than ensemble methods like Random Forest.

Evaluation Metrics

MSE: Average squared error.
RMSE: Square root of MSE, same units as target.
R² Score: How well predictions fit actual data (1 = perfect).

Applications

Predicting house prices.
Estimating medical costs.
Forecasting sales.
Predicting energy consumption.

try your self

Decision Tree Regression: Step-by-Step Flow

Input Data

Find Split

Split Data

Build Tree

Predict

1. Input Data Points

The algorithm starts with your input data points (X, y). These points represent the relationship we want to model.

2. Find Best Split

The algorithm tries different thresholds on the feature (X) to split the data into two groups, aiming to reduce variance in each group.

3. Split Data

Data is split into left and right groups based on the threshold. Each group is more homogeneous (less variance).

4. Recursively Build Subtrees

The algorithm repeats the splitting process on each group until max depth or minimum variance is reached, building a tree structure.

5. Make Predictions

To predict a new point, the tree is traversed from root to leaf by comparing the input to thresholds, returning the leaf's average value.

{% endblock %}