Supervised learning uses labeled data to train models that make predictions. Below are common algorithms, interview questions, and mini case studies to understand how they work in the real world.
- Predicts continuous values based on linear relationships between features.
- Uses least squares to minimize error between predicted and actual values.
- What is the cost function used in Linear Regression?
- How do you interpret the coefficients?
- What are the assumptions of Linear Regression?
Problem: Predict house prices based on features like square footage, number of bedrooms, and location.
Model: A linear regression model can be trained on historical housing data to estimate price as a continuous variable.
- Used for binary classification problems.
- Outputs probabilities using the sigmoid function.
- How is Logistic Regression different from Linear Regression?
- What’s the role of the sigmoid function?
Problem: Classify whether a given email is spam or not.
Model: Logistic regression assigns a probability to the email being spam based on features like keyword frequency and sender reputation.
- Non-parametric models that split data based on feature thresholds.
- Easy to visualize and interpret.
- How does a decision tree decide which feature to split on?
- What is entropy or Gini impurity?
Problem: Approve or reject a bank loan application.
Model: A decision tree learns rules based on income, credit score, and employment status to classify applicants.
- Ensemble method using multiple decision trees (bagging).
- Reduces overfitting and improves accuracy.
- How does Random Forest improve over Decision Trees?
- What is bagging?
Problem: Predict customer churn in a telecom company.
Model: A random forest model combines trees trained on different subsets of customers to predict churn with high accuracy.
- Finds the optimal hyperplane that maximally separates classes.
- Uses kernel tricks for non-linear classification.
- What is the margin in SVM?
- How do kernels work?
Problem: Classify handwritten digits (e.g., MNIST dataset).
Model: An SVM with an RBF kernel can learn to distinguish between 0–9 digits based on pixel intensity values.
- Instance-based learning.
- Classifies a new point based on the majority class of its k-nearest neighbors.
- How do you choose the value of K?
- What are the downsides of KNN?
Problem: Recommend a movie based on user preferences.
Model: KNN finds similar users and recommends movies liked by those with similar taste.
- Probabilistic classifier based on Bayes’ theorem.
- Assumes features are conditionally independent.
- Why is it called “naive”?
- When does Naive Bayes perform well?
Problem: Classify customer reviews as positive or negative.
Model: Naive Bayes uses word frequencies to assign sentiment labels efficiently, especially useful in large text datasets.
| Algorithm | Type | Strengths | Weaknesses |
|---|---|---|---|
| Linear Regression | Regression | Simple, interpretable | Assumes linearity |
| Logistic Regression | Classification | Probabilistic output | Struggles with non-linear data |
| Decision Tree | Both | Easy to interpret, fast | Prone to overfitting |
| Random Forest | Both | Robust, handles non-linearity | Slower, less interpretable |
| SVM | Classification | Works in high-dimensions | Hard to tune, less scalable |
| KNN | Both | No training phase | Slow at prediction time |
| Naive Bayes | Classification | Fast, works well with text | Assumes independence |
Next Steps:
Proceed tounsupervised_learning.mdto explore clustering and dimensionality reduction techniques.