Back to Catalog

Easy Speech-to-Text with Python

IntermediateGuided Project

This project explores the multilingual automatic speech recognition (ASR) system and the architecture of signal processing using Python. Today, ASR systems are available through multiple sources, including IBM Watson® Speech to Text, and some publicly available systems through Open AI.

3.9 (35 Reviews)

Language

  • English

Topic

  • Artificial Intelligence

Enrollment Count

  • 578

Skills You Will Learn

  • Data Analysis, Python, Data Science, Machine Learning, Embeddable AI, PyTorch

Offered By

  • IBM

Estimated Effort

  • add pytroch

Platform

  • SkillsNetwork

Last Update

  • May 19, 2024
About This Guided Project

Why you should do this guided project?

Let’s say you are a podcast creator, and you want to transcribe your podcast so that it can be translated into multiple languages or so that hearing-impaired people can read your content. Additionally, let’s say you want to improve the discovery of your podcasts through search engine optimization (SEO). Transcribing your podcast will enable search engines to index the text, making it easier to find it.

The purpose of this guided project is to introduce you to the ASR (automatic speech recognition) system, to help you understand how the signal processing works. The project also includes architecture of the transformer model behind ASR, and some examples of how to easily recognize, transcribe, and translate audio and video files using a publicly available ASR tool.


A look at the project ahead

After completing this project, you will be able to:
  • Understand how signal processing works.
  • Load an audio file and detect the spoken language.
  • Transcribe and translate an audio or YouTube file.
 

Prerequisites 

You just need a web browser! Basic Python programming knowledge is recommended but not required.
Everything else is provided to you via the IBM Skills Network Labs environment, where you will have access to the Cloud IDE and Python runtimes that we offer as part of the IBM Skills Network Labs environment. The IBM Skills Network Labs environment comes with many things pre-installed (e.g., Docker) to save them the hassle of setting everything up. Also, note that this platform works best with current versions of Chrome, Edge, Firefox, Internet Explorer, or Safari.

Instructors

Svitlana Kramar

Data Scientist

I’m a passionate data science educator whose goal is to learn by teaching innovative data science tools that can improve our day-to-day tasks and our quality of life. My interests are in Natural Language Processing: text classification, summarization, and generation. Research can take a long time because there are a lot of resources and new opinions posted every day. Having tools to summarize and extract the information can save a lot of time. I hope we can all learn, approve, and apply the data science tools to cut down on the repetitive and tedious tasks, to make more informed decisions in life, to differentiate fake from real, and to open communication spaces to language-diverse or hearing-impaired audiences. The applications are limitless! My personality: I am a foodie and I love cooking and learning different cuisines. I also love travelling and connecting with people by learning a little bit of their language, about their food and music. I hold Data Science and Analytics master’s degree, specializing in Machine Learning, from University of Calgary.

Read more

Contributors

Joseph Santarcangelo

Senior Data Scientist at IBM

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Read more

Roxanne Li

Data Scientist at IBM

I am an aspiring Data Scientist at IBM with extensive theoretical/academic, research, and work experience in different areas of Machine Learning, including Classification, Clustering, Computer Vision, NLP, and Generative AI. I've exploited Machine Learning to build data products for the P&C insurance industry in the past. I also recently became an instructor of the Unsupervised Machine Learning course by IBM on Coursera!

Read more