Back to Catalog

Evaluating an NBA Player's Team Impact via Regression

BeginnerGuided Project

Learn how to apply Random Forest and Kernel Ridge Regression by analyzing an NBA player's real game performance. In this guided project, you’ll engineer features, tune model hyperparameters, and analyze feature importance to understand what truly drives a player’s performance on the court. You'll leverage pandas for data manipulation, scikit-learn for model training and evaluation, and matplotlib for visualization. By the end, you’ll be able to confidently build, configure, and interpret industry standard regression models for a wide range of analytical use cases.

Language

  • English

Topic

  • Machine Learning

Skills You Will Learn

  • Regression, Scikit-learn, Feature Engineering, Pandas, Random forest, Machine Learning

Offered By

  • IBMSkillsNetwork

Estimated Effort

  • 30 minutes

Platform

  • SkillsNetwork

Last Update

  • August 19, 2025
About this Guided Project
Regression modelling is a cornerstone of data science, used to understand relationships between variables and make predictions across different domains. In this project, you’ll use NBA game data as a simple case study to apply and compare two powerful regression techniques: Random Forest and Kernel Ridge Regression. You'll learn how to extract insights from complex datasets, engineer your own data, evaluate model performance, and draw significant conclusions. You'll acquire skills that translate to virtually any data driven field. Whether you're building predictive systems or uncovering what drives key outcomes, the techniques you learn here will be widely applicable.

A Look at the Project Ahead

In this guided project, you will:
  • Ingest and Explore Data: Load game logs, inspect distributions, and spot trends in his performance.
  • Engineer Predictive Features: Create new metrics (e.g. rolling averages, usage rates) that capture hidden aspects of a player's game.
  • Train and Tune Models: Build Random Forest and Kernel Ridge Regression pipelines in scikit‑learn and optimize hyperparameters.
  • Analyze Results: Use feature‑importance scores to interpret which factors drive a player's performance.
  • Visualize Results: Plot your findings with matplotlib.

Learning Objectives

  • Feature Engineering & Pipelines: 
    • Design and integrate custom processes and pipelines to prepare real sports data for regression.
  • Model Training, Tuning & Interpretation:
    • Configure, evaluate, and interpret both tree‑based and kernel‑based regression models to extract actionable insights. 
    • Learn how each model trains and how hyperparameters affect the training process.

What You'll Need

  • Familiarity with Python programming
  • Understanding of fundamental machine learning concepts (e.g. regression, overfitting, cross‑validation)
  • Basic familiarity with pandas DataFrames and its methods
  • A modern browser (Chrome, Edge, Firefox, Safari)

Instructors

Joshua Zhou

Data Scientist

I like building fun and practical things.

Read more