Bluetick's Sales call Evaluator

More from Author
Akash Kumar
Akash

Engineering Lead

6 min read

Introduction

In the fast-paced world of sales, every conversation is a potential turning point. Sales executives spend hours analyzing sales calls, extracting insights, and summarizing their content to improve their strategies. Manual call evaluation is both time-consuming and subject to human error, making it a prime candidate for automation through artificial intelligence. In this blog, we'll explore the need for AI in sales call evaluation and how technologies like OpenAI Whisper, Pyannote-Audio, and Pretrained Speaker Embeddings, such as SpeechBrain/spkrec-ecapa-voxceleb, are transforming the sales landscape.

Source code

Demo

Sales Call Evaluation

*Kindly be aware that this application is currently running on a CPU server for demonstration purposes. Processing times may extend to a minute or two.

** To utilize preloaded audio files, simply choose from the available examples.

The Need for AI in Sales Call Evaluation

Sales call evaluation is a critical part of a sales team's success. It helps identify what went right and what could be improved, ultimately leading to better sales strategies and more successful interactions with customers. However, manually listening to and summarizing sales calls is a time-consuming process that can be significantly optimized by leveraging generative AI tools.

Consider a scenario where a sales executive spends an average of 10 minutes manually listening to a sales call and writing a summary of the key points. In a busy workday, this can add up to hours of valuable time that could be better utilized for client interactions, strategizing, and other essential tasks. This is where AI-driven tools like the Bluetick Sales Call Evaluator come into play, streamlining the process and making it more efficient.

Key Components used:

1. OpenAI Whisper

OpenAI's Whisper is a state-of-the-art automatic speech recognition (ASR) system that has opened up new horizons in the field of natural language processing. Whisper excels at transcribing spoken language into written text, making it a valuable asset for sales call evaluation. By transcribing sales calls accurately and efficiently, Whisper lays the foundation for more advanced AI-driven analysis.

Sales teams can use Whisper to automatically generate text transcripts of sales calls, enabling them to quickly search for specific keywords or phrases, identify important moments, and gain a better understanding of the conversation's context. This not only saves time but also ensures that no crucial details are missed.

2. Pyannote-Audio for Speaker Diarization

Pyannote-Audio is an open-source toolkit designed for the analysis of audio data. One of its standout features is speaker diarization, a process that divides a conversation into segments, each associated with a specific speaker. In sales call evaluation, this technology allows sales teams to identify who said what during a call, which is crucial for understanding the dynamics of the conversation.

Speaker diarization provided by Pyannote-Audio is a game-changer for sales executives. It allows them to attribute specific statements to either the salesperson or the customer, aiding in the identification of effective communication strategies, clarifying misunderstandings, and enhancing customer relationships.

3. Pretrained Speaker Embeddings - SpeechBrain/spkrec-ecapa-voxceleb

Pretrained speaker embeddings, such as those provided by SpeechBrain/spkrec-ecapa-voxceleb, are the icing on the cake when it comes to sales call evaluation. These embeddings are trained on vast amounts of data, allowing the AI model to understand and recognize distinct speaker characteristics.

Sales teams can use pretrained speaker embeddings to identify speakers, track customer interactions over time, and assess the effectiveness of different sales representatives. This insight can lead to more personalized sales strategies, better-targeted customer engagement, and ultimately, higher conversion rates.

Let's dive right into the coding

Install all the dependencies

                            
 pip install git+https://github.com/openai/whisper.git -q
 pip install -q git+https://github.com/pyannote/pyannote-audio
 pip install --quiet openai
 pip install textstat -q
                            
                        

Download and load the whisper model

                            
 from whisper import _download, _MODELS


 _download(_MODELS["medium"], "/models/", False)
 100%|██████████████████████████████████████| 1.42G/1.42G [00:13<00:00,
 113MiB/s] '/models/medium.pt                           
                            
                        
                            
 import whisper
 model = whisper.load_model('/models/medium.pt')                             
                            
                        

Convert the mp3 audio to .wav

To use this audio click here

                            
 import wave
 import subprocess


 audio = "/content/customer_support.mp3"
 if audio[-3:] != 'wav':
     audio_file_name = audio.split("/")[-1]
     audio_file_name = audio_file_name.split(".")[0] + ".wav"
     subprocess.call(['ffmpeg', '-i', audio, audio_file_name, '-y'])
     path = audio_file_name                                
                            
                        

Transcribe the audio

                            
 result = model.transcribe(path)
 segments = result["segments"]


 print(segments)
                            
                        

[{'id': 0, 'seek': 0, 'start': 0.0, 'end': 5.48, 'text': " Thank you for calling Martha's Flowers Towne SST. Hello I'd like to order flowers"},

{'id': 1, 'seek': 0, 'start': 5.48, 'end': 9.040000000000001, 'text': " and I think you have what I'm looking for. I'd be happy to take care of your"} …

Load the embedding model and perform diarization

                            
 import contextlib


 with contextlib.closing(wave.open(path,'r')) as f:
     frames = f.getnframes()
     rate = f.getframerate()
     duration = frames / float(rate)
                            
                        
                            
 from pyannote.audio.pipelines.speaker_verification import 
 PretrainedSpeakerEmbedding
 import torch


 embedding_model = PretrainedSpeakerEmbedding(
     "speechbrain/spkrec-ecapa-voxceleb",
     device=torch.device("cuda"))
                            
                        
                            
 from pyannote.audio import Audio
 from pyannote.core import Segment
 import numpy as np


 audio = Audio()


 def segment_embedding(segment):
     start = segment["start"]
     # Whisper overshoots the end timestamp in the last segment
     end = min(duration, segment["end"])
     clip = Segment(start, end)
     waveform, sample_rate = audio.crop(path, clip)
     waveform = waveform.mean(dim=0, keepdim=True)
     return embedding_model(waveform.unsqueeze(0))




 embeddings = np.zeros(shape=(len(segments), 192))
 for i, segment in enumerate(segments):
     embeddings[i] = segment_embedding(segment)
 

 embeddings = np.nan_to_num(embeddings)                                
                            
                        

Perform diarization

                            
 from sklearn.cluster import AgglomerativeClustering


 num_speakers = 2  # Set the number of speakers manually


 clustering = AgglomerativeClustering(num_speakers).fit(embeddings)
 labels = clustering.labels_
 for i in range(len(segments)):
    segments[i]["speaker"] = 'SPEAKER '+str(labels[i]+1)                            
                            
                        
                            
 import datetime


 def time(secs):
     return datetime.timedelta(seconds=round(secs))


 conversation = ""


 for (i, segment) in enumerate(segments):
     if i == 0 or segments[i - 1]["speaker"] != segment["speaker"]:
         conversation += "\n" + segment["speaker"] + ' ' + 
 str(time(segment["start"])) + '\n'
     conversation += segment["text"][1:] + ' '




 print(conversation)
                            
                        
  • SPEAKER 1 0:00:00 Thank you for calling Martha's Flowers Towne SST. Hello I'd like to order flowers
  • SPEAKER 2 0:00:05 and I think you have what I'm looking for. I'd be happy to take care of your
  • SPEAKER 1 0:00:09 order may have your name please. Randall Thomas. Randall Thomas can you spell that for me?
  • SPEAKER 2 0:00:14 Randall R-A-N-B-A-L-L Thomas T-H-O-M-A-S.
    .. .

Perform Analysis with AI

Perform Sentiment Analysis

                        
 import nltk
 from nltk.sentiment.vader import SentimentIntensityAnalyzer


 nltk.download('vader_lexicon')

 # Perform sentiment analysis
 sid = SentimentIntensityAnalyzer()
 sentiment_scores = sid.polarity_scores(conversation)


 print(f"Sentiment Analysis:\nPositive: {sentiment_scores['pos']} | 
 Negative: {sentiment_scores['neg']} | Neutral: 
 {sentiment_scores['neu']}\n\n")                            
                        
                    
  • Sentiment Analysis: Positive: 0.183 | Negative: 0.037 | Neutral: 0.78

Calculate the Flesch-Kincaid readability score

                        
 import textstat


 # Calculate the Flesch-Kincaid readability score
 readability_score = textstat.flesch_reading_ease(conversation)


 print(f"Readability Score(Flesch-Kincaid):{readability_score}\n\n")                     
                        
                    
  • Readability Score (Flesch-Kincaid): 88.33

Generate call summary

                        
 import openai


 openai.api_key = ""


 messages=[
     {
         "role": "system",
         "content": """
         You will be provided with a conversation. Your task is to give 
 a summary and mention all the main details in bullet points.
         Replace speaker 1 and speaker 2 with sales excutive or comapny 
 name and customer name if available.
         """
     },
     {
         "role": "user",
         "content": conversation
     }
     ]


 response = openai.ChatCompletion.create(
     model="gpt-3.5-turbo",
     messages=messages,
     temperature=0,
     max_tokens=256,
     top_p=1,
     frequency_penalty=0,
     presence_penalty=0
 )
 print(response["choices"][0]["message"]["content"])                            
                        
                    
  • Customer wants to order flowers from Martha's Flowers Towne SST
  • Customer's name is Randall Thomas
  • Customer's phone number is 409-866-5088
  • Customer's email address is Randall.Thomas@gmail.com
  • Customer's shipping address is 6800 Gladys Avenue, Beaumont Texas, zip code 77706
  • Customer wants to purchase one dozen long stem red roses - Total amount of the order is $40
  • Customer wants the roses to be delivered within 24 hours - Customer doesn't need any further assistance
  • Customer thanks Martha's Flowers and ends the call

Conclusion

The integration of generative AI technologies like OpenAI Whisper, Pyannote-Audio, and Pretrained Speaker Embeddings is revolutionizing the way sales calls are evaluated. By automating the transcription, diarization, and speaker recognition processes, these tools save sales executives valuable time and provide deeper insights into the dynamics of their sales calls. In a highly competitive sales environment, where every interaction counts, embracing AI-driven solutions can be a game-changer. As technology continues to advance, we can expect even more innovative AI solutions to shape the future of sales call evaluation and, ultimately, sales success.

- Making Every Conversation Count.

Back To Blogs


contact us