Back to Catalog

Build a Phishing Filter with Finetuned BERT and Granite

IntermediateGuided Project

Build an interactive Gradio web app by fine-tuning BERT, DistilBERT, and IBM Granite LLM to detect phishing emails and strengthen AI-driven cybersecurity defenses. Use Huggingface pre-trained LLMs to fine-tune your own BERT-based phishing email classifier from data loading to model delivery. Integrate the IBM Granite API to design an LLM assistant that highlights malicious patterns, detects suspicious phrases, and reveals evolving attack strategies. By the end, you will have created a shareable web app with Gradio that deploys your fine-tuned model and LLM assistant to protect users worldwide.

Language

  • English

Topic

  • Artificial Intelligence

Skills You Will Learn

  • Generative AI, LLM, Python, Artificial Intelligence, CyberSecurity, Machine Learning

Offered By

  • IBMSkillsNetwork

Estimated Effort

  • 45 minutes

Platform

  • SkillsNetwork

Last Update

  • October 15, 2025
About this Guided Project
Have you been annoyed and scared by phishing emails? Phishing is one of the most common and dangerous cyberattacks, designed to trick users into giving up sensitive information or clicking malicious links. In this project, you’ll build an AI-powered phishing detection system that goes beyond traditional spam filters. By fine-tuning transformer models like DistilBERT and integrating IBM’s Granite as an explanation layer, you’ll create a two-step defense that not only classifies emails as legitimate or suspicious but also selects and explains the exact words and phrases that make them dangerous. Along the way, you’ll gain hands-on experience with Hugging Face fine-tuning, explainable AI, and deploying interactive web apps with Gradio, which are skills directly applicable to real-world cybersecurity challenges.

What You’ll Learn

By the end of this project, you will be able to: 
  • Fine-tune BERT-based models for phishing detection: Train and evaluate lightweight transformers to classify emails as legitimate or phishing. 
  • Integrate LLMs for explainability: Use IBM Granite as a second defense layer to extract and explain suspicious phrases that indicate phishing attempts. 
  • Build an interactive web interface: Develop a Gradio-powered demo where users can paste emails, get predictions, view confidence scores, and read explanations. 
  • Apply explainable AI in cybersecurity: Learn how combining classifiers with reasoning-capable LLMs improves both accuracy and transparency.

Who Should Enroll

  • Early-career data scientists or ML engineers who want to apply NLP to cybersecurity problems.
  • Cybersecurity enthusiasts and professionals looking to understand how AI can enhance phishing detection. 
  • Students and researchers interested in explainable AI and real-world model deployment.

Why Enroll

This project bridges machine learning and cybersecurity by showing how explainable AI can strengthen defenses against phishing. Instead of just flagging emails as "safe" or "suspicious", your system will provide transparent reasoning that helps users understand and trust the model’s predictions. By the end, you’ll have a working phishing detection assistant, practical experience with Hugging Face fine-tuning, and a deeper understanding of how LLMs can be used as a second layer of defense in AI security systems.

What You’ll Need

To get the most out of this project, you should have:
  • Basic Python programming knowledge. 
  • Some familiarity with NLP or Hugging Face models (helpful but not required). 
  • Interest in practical AI applications.
All dependencies are pre-configured in the environment, and the project runs best on current versions of Chrome, Edge, Firefox, or Safari.

Instructors

Jianping Ye

Data Scientist Intern at IBM

I'm Jianping Ye, currently a Data Scientist Intern at IBM and a PhD candidate at the University of Maryland. I specialize in designing AI solutions that bridge the gap between research and real-world application. With hands-on experience in developing and deploying machine learning models, I also enjoy mentoring and teaching others to unlock the full potential of AI in their work.

Read more

Contributors

Wojciech "Victor" Fulmyk

Data Scientist at IBM

I am a data scientist and economist with a strong background in econometrics, time series analysis, causal inference, and statistics. I stand out for my ability to combine technical expertise with clear communication, turning complex data findings into practical insights for stakeholders at every level. Follow my projects to learn about data science principles, machine learning algorithms, and artificial intelligence agents.

Read more

Joseph Santarcangelo

Senior Data Scientist at IBM

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Read more

Matthew Wu

Marketer at IBM

Supporting Cognitive Class and IBM through digital marketing and tailored content creation

Read more