{% extends "layout.html" %}

{% block title %}Decision Tree Theory{% endblock %}

{% block content %}
<script src="https://cdn.plot.ly/plotly-2.32.0.min.js"></script>
<script src="https://cdn.tailwindcss.com"></script>
<script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<div class="main-wrapper">
    <h2 class="text-3xl font-bold mb-6 text-center text-blue-800">Understanding Decision Trees</h2>

    <div class="mt-10 p-6 bg-yellow-50 rounded-xl border border-yellow-200">
        <h3 class="text-2xl font-bold mb-6 text-center text-yellow-700">How Decision Tree Classifies Your Data</h3>
        <div class="flex flex-wrap justify-center items-center gap-4">
            <div class="flow-box bg-yellow-100">
                <span class="text-5xl mb-2">📊</span>
                <p class="text-lg font-semibold text-yellow-800">1. Input Labeled Data</p>
                <p class="text-sm text-yellow-600">Features & Classes</p>
            </div>
            <div class="flow-arrow">&rarr;</div>
            <div class="flow-box bg-yellow-100">
                <span class="text-5xl mb-2">🌳</span>
                <p class="text-lg font-semibold text-yellow-800">2. Build Tree</p>
                <p class="text-sm text-yellow-600">Recursive splitting</p>
            </div>
            <div class="flow-arrow">&rarr;</div>
            <div class="flow-box bg-yellow-100">
                <span class="text-5xl mb-2">❓</span>
                <p class="text-lg font-semibold text-yellow-800">3. Traverse Tree</p>
                <p class="text-sm text-yellow-600">Follow rules for new data</p>
            </div>
            <div class="flow-arrow block md:hidden">&darr;</div>
            <div class="flow-arrow hidden md:block">&rarr;</div>
            <div class="flow-box bg-yellow-100">
                <span class="text-5xl mb-2">🍃</span>
                <p class="text-lg font-semibold text-yellow-800">4. Reach Leaf Node</p>
                <p class="text-sm text-yellow-600">Contains class prediction</p>
            </div>
            <div class="flow-arrow">&rarr;</div>
            <div class="flow-box bg-yellow-100">
                <span class="text-5xl mb-2">✔️</span>
                <p class="text-lg font-semibold text-yellow-800">5. Final Classification</p>
                <p class="text-sm text-yellow-600">Based on leaf's majority</p>
            </div>
        </div>
        <p class="mt-6 text-center text-gray-600 text-sm">
            A Decision Tree learns a series of if-then-else rules from your data, forming a tree structure. When new data comes in, it simply follows these rules down the tree to arrive at a classification.
        </p>
    </div>
     <a href='/game' class="
    inline-block  /* Ensures the link behaves like a block but only takes up necessary width */
    bg-blue-600   /* Background color */
    hover:bg-blue-700 /* Darker background on hover */
    text-white    /* White text color */
    font-bold     /* Bold text */
    py-3          /* Vertical padding */
    px-6          /* Horizontal padding */
    rounded-lg    /* Rounded corners */
    shadow-lg     /* Large shadow */
    hover:shadow-xl /* Extra large shadow on hover */
    transition    /* Smooth transitions for hover effects */
    duration-300  /* Transition duration */
    ease-in-out   /* Easing function */
    transform     /* Enable transform properties */
    hover:-translate-y-1 /* Slight lift effect on hover */
    hover:scale-105 /* Slight scale up on hover */
    tracking-wide /* Slightly wider letter spacing */
    text-lg       /* Larger text size */
    flex          /* Use flexbox for icon and text alignment */
    items-center  /* Center items vertically in flex container */
    justify-center /* Center items horizontally in flex container */
    gap-2         /* Space between icon and text */
">
    Play the Decision Tree Game! 🚀
</a>
    <div class="mt-8 p-6 bg-gray-50 rounded-lg border border-gray-200">
        <h3 class="text-xl font-bold mb-4 text-center text-blue-700">Understanding Decision Trees: A Deeper Dive</h3>
        <p class="mb-4 text-gray-700">
            A Decision Tree is a powerful, intuitive, and versatile supervised learning algorithm that models decisions in a tree-like structure. It provides a clear, flowchart-like representation of the choices and their potential outcomes, making it highly interpretable. By traversing its "branches," one can easily compare different paths and understand the reasoning behind a particular classification or prediction.
        </p>

        <h4 class="text-xl font-semibold mb-3 border-b pb-2 text-gray-800">Types of Decision Trees:</h4>
        <ul class="list-disc list-inside text-gray-700 mb-4">
            <li class="mb-2">
                <span class="highlight-bold">Classification Trees:</span> These are used when the target variable is <span class="highlight-bold">categorical</span>. For instance, classifying an email as 'spam' or 'not spam', or predicting if a customer will 'churn' or 'stay'. The tree partitions the data into regions, and each region is assigned a class label based on the majority class of data points falling into it.
            </li>
            <li class="mb-2">
                <span class="highlight-bold">Regression Trees:</span> Employed when the target variable is <span class="highlight-bold">continuous</span>. Examples include predicting house prices, stock values, or a patient's recovery time. Instead of assigning categories, leaf nodes in regression trees hold a numerical value (e.g., the average of the target variable for data points in that region).
            </li>
        </ul>

        <h4 class="text-xl font-semibold mb-3 border-b pb-2 text-gray-800">Key Components of a Decision Tree:</h4>
        <ul class="list-disc list-inside text-gray-700 mb-4">
            <li class="mb-2">
                <span class="highlight-bold">Root Node:</span> The starting point of the tree, representing the entire dataset. It's the first decision node from which all other branches originate.
            </li>
            <li class="mb-2">
                <span class="highlight-bold">Decision Node (Internal Node):</span> A node that represents a test on a specific feature (attribute). Based on the outcome of this test, the data is split into subsets, leading to new branches.
            </li>
            <li class="mb-2">
                <span class="highlight-bold">Branch:</span> Represents the outcome of a decision node's test. It connects a parent node to a child node (either another decision node or a leaf node).
            </li>
            <li class="mb-2">
                <span class="highlight-bold">Leaf Node (Terminal Node):</span> A node that does not split further. It represents the final decision or prediction (a class label for classification or a numerical value for regression).
            </li>
            <li class="mb-2">
                <span class="highlight-bold">Max Depth:</span> A crucial hyperparameter that limits the maximum number of levels or splits from the root to the deepest leaf. It's a primary control for preventing overfitting.
            </li>
            <div class="my-6 text-center">
                <img src="{{ url_for('static', filename='decision_tree.png') }}" alt="Basic Decision Tree Structure" class="mx-auto max-w-sm rounded-lg shadow-md border border-gray-200">
                <p class="mt-3 text-sm text-gray-600">
                    <span class="highlight-bold">Figure 1:</span> A simplified representation of a Decision Tree's basic structure, showing a root node, branches, and leaf nodes.
                </p>
            </div>
        </ul>

        <h4 class="text-xl font-semibold mb-3 border-b pb-2 text-gray-800">How Decision Trees Work (The Learning Process):</h4>
        <p class="mb-4 text-gray-700">
            The Decision Tree algorithm builds its structure by recursively partitioning the feature space into distinct, often rectangular, regions.
        </p>
        <ol class="list-decimal list-inside text-gray-700 space-y-3 mb-4">
            <li class="mb-2">
                <span class="highlight-bold">1. Start at the Root:</span> The entire training dataset begins at the root node. The tree considers all features to find the optimal initial split.
            </li>
            <li class="mb-2">
                <span class="highlight-bold">2. Find the Best Split:</span> At each node, the algorithm evaluates various possible splits for all available features. The goal is to find the split that best separates the data into purer subsets (meaning subsets where data points predominantly belong to a single class). This evaluation is based on a specific "splitting criterion." For 2D data, these splits result in <span class="highlight-bold">axis-aligned</span> (horizontal or vertical) lines.
            </li>
            <li class="mb-2">
                <span class="highlight-bold">3. Branching:</span> Based on the chosen best split, the data is divided into two (or more) subsets, and corresponding branches are created, leading to new child nodes.
            </li>
            <li class="mb-2">
                <span class="highlight-bold">4. Continue Partitioning:</span> Steps 2 and 3 are recursively applied to each new child node. This process continues until a stopping condition is met, such as:
                <ul class="list-disc list-inside ml-4 mt-1">
                    <li>All data points in a node belong to the same class.</li>
                    <li>The predefined `max_depth` limit is reached.</li>
                    <li>The number of data points in a node falls below a minimum threshold.</li>
                    <li>No further informative splits can be made.</li>
                </ul>
            </li>
            <li class="mb-2">
                <span class="highlight-bold">5. Form Leaf Nodes:</span> Once a stopping condition is met for a particular branch, that node becomes a leaf node. It's then assigned the class label (for classification) or numerical value (for regression) that is most representative of the data points within that final region.
            </li>
        </ol>
        <p class="mb-4 text-gray-700">
            When a new, unlabeled data point needs classification, it traverses the tree from the root. At each decision node, it follows the path corresponding to its feature values, finally arriving at a leaf node which provides the model's prediction.
        </p>

        <h4 class="text-xl font-semibold mb-3 border-b pb-2 text-gray-800">Splitting Criteria in Decision Trees:</h4>
        <p class="mb-4 text-gray-700">
            The effectiveness of a Decision Tree heavily relies on its ability to find the best feature and split point at each node. This is determined by mathematical metrics called splitting criteria:
        </p>
        <ul class="list-disc list-inside text-gray-700 mb-4">
            <li class="mb-2">
                <span class="highlight-bold">Gini Impurity:</span> This criterion measures the probability of incorrectly classifying a randomly chosen element from the dataset if it were randomly labeled according to the distribution of labels in the subset. A Gini Impurity of 0 means the node is "pure" (all elements belong to the same class). Decision Trees aim to minimize Gini Impurity at each split.
                <div class="math">
                    $$ G = 1 - \sum_{i=1}^{C} (p_i)^2 $$
                    Where $p_i$ is the probability of an element belonging to class $i$, and $C$ is the total number of classes.
                </div>
            </li>
            <li class="mb-2">
                <span class="highlight-bold">Information Gain (Entropy):</span> Based on the concept of entropy from information theory, Information Gain measures the reduction in uncertainty or "randomness" after a split. The algorithm seeks splits that provide the maximum information gain.
                <div class="math">
                    $$ \text{Entropy}(S) = - \sum_{i=1}^{C} p_i \log_2(p_i) $$
                    $$ \text{Information Gain}(S, A) = \text{Entropy}(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} \text{Entropy}(S_v) $$
                    Where $S$ is the set of examples, $A$ is an attribute (feature), $Values(A)$ are the possible values for attribute $A$, $S_v$ is the subset of $S$ for which attribute $A$ has value $v$, and $p_i$ is the proportion of class $i$ in $S$.
                </div>
            </li>
        </ul>

        <h4 class="text-xl font-semibold mb-3 border-b pb-2 text-gray-800">Advantages of Decision Trees:</h4>
        <ul class="list-disc list-inside text-gray-700 mb-4">
            <li><span class="highlight-bold">Interpretability:</span> Easy to understand and visualize, often referred to as "white box" models.</li>
            <li><span class="highlight-bold">Minimal Data Preprocessing:</span> Can handle both numerical and categorical data, and often don't require feature scaling or normalization.</li>
            <li><span class="highlight-bold">Versatility:</span> Can be used for both classification and regression tasks.</li>
            <li><span class="highlight-bold">Non-linear Relationships:</span> Capable of capturing non-linear relationships between features and target.</li>
        </ul>

        <h4 class="text-xl font-semibold mb-3 border-b pb-2 text-gray-800">Disadvantages and Challenges:</h4>
        <ul class="list-disc list-inside text-gray-700 mb-4">
            <li><span class="highlight-bold">Overfitting:</span> Can easily overfit noisy data, leading to trees that are too complex and don't generalize well. <span class="info-icon">i<span class="tooltip">Overfitting occurs when a model learns the training data too well, including its noise and outliers, making it perform poorly on new, unseen data.</span></span></li>
            <li><span class="highlight-bold">Instability:</span> Small variations in the data can lead to a completely different tree structure.</li>
            <li><span class="highlight-bold">Bias with Imbalanced Data:</span> Can be biased towards dominant classes if the dataset is imbalanced.</li>
            <li><span class="highlight-bold">Local Optima:</span> The greedy approach of finding the best split at each step doesn't guarantee a globally optimal tree.</li>
        </ul>

        <h4 class="text-xl font-semibold mb-3 border-b pb-2 text-gray-800">Mitigating Overfitting (Pruning):</h4>
        <p class="mb-4 text-gray-700">
            To combat overfitting, various techniques are employed, most notably "pruning." Pruning involves removing branches that have little predictive power, simplifying the tree.
        </p>
        <ul class="list-disc list-inside text-gray-700 mb-4">
            <li><span class="highlight-bold">Pre-pruning (Early Stopping):</span> Stopping the tree construction early based on thresholds like `max_depth`, `min_samples_leaf` (minimum number of samples required to be at a leaf node), or `min_impurity_decrease`.</li>
            <li><span class="highlight-bold">Post-pruning:</span> Growing the full tree first, then removing branches that provide little value using metrics like cross-validation error or statistical tests.</li>
        </ul>

        <h4 class="text-xl font-semibold mb-3 border-b pb-2 text-gray-800">Ensemble Methods (Beyond Single Trees):</h4>
        <p class="mb-4 text-gray-700">
            Despite their challenges, Decision Trees form the building blocks for more powerful algorithms, especially ensemble methods:
        </p>
        <ul class="list-disc list-inside text-gray-700 mb-4">
            <li><span class="highlight-bold">Random Forests:</span> Builds multiple Decision Trees during training and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. This reduces overfitting and improves accuracy.</li>
            <li><span class="highlight-bold">Gradient Boosting (e.g., XGBoost, LightGBM):</span> Builds trees sequentially, where each new tree tries to correct the errors of the previous ones. Highly powerful and widely used.</li>
        </ul>
        <p class="mb-4 text-gray-700">
            By understanding the fundamentals of Decision Trees, you gain a solid foundation for comprehending these more advanced and robust machine learning models.
        </p>
    </div>

  
</div>
{% endblock %}