Background

Our client is a specialized provider of medical billing, coding, and business analytics services for healthcare providers throughout the United States. Their Quality Management System (QMS) encompasses a substantial volume of recorded call audios that necessitate auditing for diverse purposes.

The manual process of reviewing these recordings is both inefficient and time-consuming, underscoring the imperative for substantial automation.

Objectives

Build and integrate an audio transcription service with Privacy, Security, Speed & Diarization

In collaboration with our customers, we began defining the project objectives, ensuring clear end goals and success criteria while considering their domain regulations, budget constraints, and business plans

Privacy and Security of Audio Data

Ensuring the privacy and security of audio data is critical given the client’s operation in the healthcare sector.
Preventing data from leaving their private network was also critical.

Integration of Audio Transcription with Existing QMS

The auditing process is conducted through their in-house Quality Management System. Ensuring that the audio is transcribed and integrated with their QMS was essential.
Transcripts were initially saved and periodically deleted to optimize storage usage and system performance.

Speaker diarization

Incorporating diarization (identifying who spoke at specific times) into the final transcription results was important for quality checks as differentiating between speakers' voices improves the efficiency of auditing process.

Rapid Audio Transcription Results

The end user required transcription results as quickly as possible.
The system filters and transcribes audio data hourly, promptly saving the results

Challenges

Thousands of hours of audio content, Fault tolerance & Multi cloud infrastructure

Volume of audio content

Since we were dealing with thousands of hours of audio recordings, building a system that can scale quickly and process audio cost effectively was critical

Fault tolerance and Recovery

When a transcribing fails for any reason, ability to automatically identify, retry and recover from the failures was very important for smooth functioning of the transcription service

Multi cloud strategy infrastructure

The client was using multi cloud strategy (Azure and AWS) and this complicated the whole design as the audio files and the quality management system were with different cloud providers
Maintaining robust security measures throughout the transcription process was also crucial.

Approach

Explore, Evaluate & Verify

Large Language Model

Considering the constraints and requirements gathered from our clients, we are inclined towards an LLM-powered solution for audio transcription due to the advanced capabilities and rapid development of large language models in this domain. We have begun exploring various options that can deliver the accuracy and speed necessary to meet the demands of this project

Some of the top contenders we explored

Wav2Vec 2.0 by Meta
Whisper by OpenAI
DeepSpeech by Mozilla
Kaldi by Johns Hopkins University

System Architecture

We also started exploring different architectures that can seamlessly scale to sudden demand outbursts and keeping the cost to minimum

Seamless integration to existing systems

Since the audio transcription service needs to read from existing storage system and the results should be available in the current quality management systems, We started analysing their existing system architecture and built POC validations to identify the potential problems

Solution

OpenAI Whisper & Azure Batch Transcription Service

Among all the Audio Speech Recognition (ASR) models we evaluated, we found that OpenAI's Whisper model provides exceptional performance in audio transcription with speech diarization.

Deploying the Whisper model required a dedicated GPU/CPU instance, which incurs significant costs. Considering the client's preference for a pay-as-you-go model, we chose to use Azure Batch Transcription, which supports the Whisper model.

Azure Batch Transcription

We utilized Azure Batch Transcription for its robust features, including:

Customization Options: Tailoring the process to our needs with features like diarization and selecting the Whisper model for US English.
Bring Your Own Storage (BYOS): Keeping audio data in our existing storage infrastructure (AWS S3) while using Azure for transcription, enhancing data security.

Serverless Architecture

We built an efficient workflow using serverless technologies:

Data Transfer: Recordings are filtered and copied from an S3 bucket to an Azure storage container for transcription every hour.
Batch Processing: A batch transcription job is triggered using the list of recording URLs copied to the Azure storage container.
Result Storage: Results are stored back in an S3 bucket, and metadata is saved in the QMS Application DB for future reference.
Orchestration: All logic is written in Python and deployed on AWS Lambda. The entire workflow is orchestrated by AWS Step Functions.

Conclusion

A scalable and fault-tolerant audio transcription service that reduced auditing effort by over 50%

The system we built and deployed is transcribing numerous recordings concurrently and dramatically reducing the audit time compared to pervious manual methods.

This efficiency allowed our clients to complete audits and extract valuable insights from the audio data much faster.

With the advanced transcription system in place, we've reduced auditing time by an impressive 50%, significantly enhancing the efficiency of revenue life cycle management. The integration of this technology not only streamlines operations but also frees up valuable resources, enabling our clients to focus on strategic initiatives and drive further growth.