Posts

Showing posts with the label Whisper

2023-11-13: Transcribing Audio using SeamlessM4T

Image
  Introduction There are so many applications for speech-to-text capabilities. Online meeting tools like Microsoft Teams use speech-to-text capabilities to transcribe meetings. Transcriptions of even live meetings may be performed to automate note taking . Video streaming websites and applications transcribe audio to support closed captioning (CC). Music files are transcribed to provide lyrics to support your favorite karaoke night. Transcribing podcasts, audiograms, and other video and audio files posted to social media may also be performed in the process of web scraping.  There are a lot of Python libraries available for individuals wishing to incorporate speech-to-text capabilities in their own applications and research.  Whisper , developed by OpenAI, is one such library that can not only perform transcriptions, but translations into multiple languages as well. Other major companies like Google and IBM have released their own libraries that also provide these capabil...