Essays, Posts & Presentations

Predicting COVID-19 using cough sounds classification

Siria Sadeddin

Author

Siria Sadeddin

Title

Predicting COVID-19 using cough sounds classification

Description

Using cough sounds recordings for training a machine learning model that can detect COVID-19

Predicting COVID-19 using cough sounds classification

Siria Sadeddin, Wolfram Research Inc.

Sound classification can be a hard task, especially when sound samples have just little variations that can be imperceptible to the human ear. The use of machines, and recently machine learning models, has shown to be an effective approach on solving the problem of classifying sounds. In medicine, these applications can help to improve diagnoses and have been a topic of research in areas as cardiology and pneumonology. Recent innovations such as a convolutional neural network identifying COVID-19 coughs and the MIT AI model detecting asymptomatic COVID-19 infections using cough recordings show that is possible to identify COVID-19 patients just by the sound of their coughs. We want to reproduce their results, using a labeled COVID-19 open source cough sound dataset. Our approach was to construct a recurrent neural network and feed in preprocessed audio signals using “MFCC” feature extraction. This approach gave us an accuracy of around 96% which is similar to the results obtained on different published studies, even when our data was highly limited to 121 samples.

Exploring the cough sound data

The data set, which can be found HERE, consists of 121 segmented samples of cough sounds in .mp3 format, the samples corresponding to each class are inside a different folder (“pos” folder and “neg” folder).

This data set has an small unbalance of classes, the 2 classes are distributed as follows:

◼

pos class: 48 samples positive for COVID-19 disease.

◼

neg class: 73 samples negative for COVID-19 disease.

Define data directory:

dir="covid-master\\data\\segmented";

Define a function that creates relation maps between the files and the corresponding labels:

In[]:=

loadFiles[dir_]:=Map[File[#]FileNameTake[#,{-2}]&,FileNames["*.mp3",dir,Infinity]];

Load the positive and negative data from the folder and then join them to make one dataset with all the data:

posData=loadFiles[FileNameJoin[{dir,"pos"}]];negData=loadFiles[FileNameJoin[{dir,"neg"}]];

It is better if we make sure the data that feeds the model is randomized, in that way each batch on the training process will have equally class distributed data. We make sure the random appearance of the sample using the RandomSample Function

data=Join[negData,posData]//RandomSample;

Examine the first three elements of the data set:

In[]:=

data[[1;;3]]

Out[]=

File

C:\Users\ECF0124A\Downloads\covid-master\data\segmented\neg\neg-0422-098-cough-f-24-1.mp3

neg,File

C:\Users\ECF0124A\Downloads\covid-master\data\segmented\neg\neg-0421-090-cough-f-17-6.mp3

neg,File

C:\Users\ECF0124A\Downloads\covid-master\data\segmented\neg\neg-0422-095-cough-m-53-3.mp3

neg

As a last check we can hear the actual sound of the data selecting randomly one of the audios for pos and neg classes:

Play the audio for a positive data point:

In[]:=

Audio[Keys[RandomChoice[posData]]]

Out[]=

00:00

00:02

Data in File[

...-cough-f-40-0.mp3

]

Play the audio for a negative data point:

In[]:=

Audio[Keys[RandomChoice[negData]]]

Out[]=

00:00

00:02

Data in File[

...ough-m-53-8.mp3

]

Class distribution

We can make a BarChart and observe the distribution of the two classes in study. There is an imbalance in sample sizes, but the difference is small enough that the model should still be effective.

In[]:=

Counts[Values[data]]

Out[]=

neg73,pos48

BarChartCounts[Values[data]],

Options



Out[]=

Split the data

We use the TrainTestSplit function to crate the train and test set, for default it will split the data into 80% train and 20 % test.

Call and apply the function TrainTestSplit over the data set, obtaining two separated data sets: the first for training with 80% of the data samples and the second for testing with 20% of the data samples:

In[]:=

TrainTestSplit=ResourceFunction["TrainTestSplit"];splitData=TrainTestSplit[data];

Get the train and test sets from the data splitting we just did:

train=splitData[[1]];test=splitData[[2]];train[[1;;5]]

Check the train and test distribution of samples:

In[]:=

BarChart{Length[train],Length[test]},



Out[]=

Create Audio MFCC Encoding

Audio encoding is an important step for audio classification, as any sound generated by humans is determined by the shape of their vocal tract (including tongue, teeth, etc). If this shape can be determined correctly, any sound produced can be accurately represented. The same happens with musical instruments -- even when two different instruments can be generating the same sound frequency, they will sound different because of the physical characteristics of the instrument (piano, guitar, flout, etc). The envelope of the time power spectrum of the speech signal is representative of the vocal tract, which MFCC accurately represents. Some diseases as a pulmonary disease, can affect the way the air travels through our respiratory system, so it may cause a difference in sound between a healthy patient and a sick patient.

MFCC or Mel Frequency Cepstral Coefficients was first introduced to characterize the seismic echoes resulting due to earthquakes. In order to obtain the Mel Frequency Cepstral Coefficients, we apply the Fourier Transform over the original sound wave on the time domain and then apply over the Fourier resulting spectrum the logarithm of the magnitude, finally applying a cosine transformation. This resulting spectrum, called the cepstrum in the quefrency domain, is neither in the frequency domain nor in the time domain.

We will use the “AudioMFCC” option on the NetEncoder function to make all this process automatic. We can choose also the amount coefficients we want on the result with the “NumberOfCoefficients” option.

Build the NetEncoder for an AudioMFCC feature extraction:

In[]:=

encoder=NetEncoder[{"AudioMFCC","TargetLength"All,"NumberOfCoefficients"40,"SampleRate"16000,"WindowSize"1024,"Offset"571,"Normalization""Max"}]

Out[]=

NetEncoder



Type:	AudioMFCC
Output:	matrix (size: n ×40)



Check the result of the AudioMFCC NetEncoder applied on a random audio sample. The output of the encoder is a rank-2 tensor of dimensions {n,nc}, where n is the number of partitions after the preprocessing is applied and nc is the number of coefficients used for the computation. We will show this with a MatrixPlot

In[]:=

encoder[Audio[RandomChoice[Keys[train]]]]//MatrixPlot

Out[]=

We can see how the audio has been converted into a matrix which represents the cepstral features of the audio, this will be the input of out model.

Build and train the model

Recurrent Neural Networks (RNN)

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it model temporal dynamic behavior, making them applicable to tasks such as handwriting recognition, video and movement classification or speech recognition.
We will build a recurrent neural network using the NetBidirectionalOperator and GatedRecurrentLayer, as well as LinearLayer, DropoutLayer and Ramp to add complexity to the network and avoid overfitting.

◼

GatedRecurrentLayer: Gives the neural net the capacity of training over a sequence of coefficients instead of a one-by-one learning, this is usually done when the data has a sequence nature, as audio, video (sequence of images), and natural language (sequence of words).

◼

LinearLayer: Add neurons and complexity to the neural network.

◼

DropoutLayer: prevents overfitting by randomly turning off some of the neurons on the previous layer

Build the custom RNN

We build a custom RNN which hyperparameters have been tuned by hand iterating them on a tune-train-evaluate process, that means:

Choose a set of hyperparameters

Train the model

Evaluate the model

Repeat steps 1,2 and 3

We repeat this process till the model shows low overfitting and high evaluation metrics, the result is the RNN we have below:

Create recurrent neural network with a 12 feature sequence, joined with a 2 neuron linear layer and dropout of 0.5 (for overfitting prevention). The output will be “pos” for COVID-19 disease and “neg” for the absence of COVID-19 disease:

In[]:=

rnn=NetChain[{GatedRecurrentLayer[12,"Dropout"{"VariationalInput"0.2}],SequenceLastLayer[],LinearLayer[2],DropoutLayer[.5],Ramp,LinearLayer[],SoftmaxLayer[]},"Input"encoder,"Output"NetDecoder[{"Class",{"pos","neg"}}]]

Out[]=

NetChain



uniniti

aliz

Input port:	audio mfcc
Output port:	class



Train the RNN

Train the recurrent neural network over the train set and validates on the test set, this allow us to observe the training process and tune the hyperparameters of the network (amount of neurons on the LinearLayer, DropoutLayer number and GatedRecurrentLayer amount of features in sequence ).

In[]:=

resultObjectRNN=NetTrain[rnn,train,All,ValidationSettest]

Out[]=

NetTrain Results

summary

batches:1184,rounds:296,time:35s,examples/s:1110

data

training examples:97,validation examples:24,processed examples:37888,skipped examples:0

method

ADAMoptimizer,batch size32,CPU

round

loss:

4.15×

-1

error:

17.2%

validation

loss:

4.4×

-1

error:

8.33%

‹

loss

›

	rounds
loss

training set

validation set

Net evaluation

After the training we will make an evaluation of the model, applying it to the previously unseen test data and measuring its performance. For that, we will try different metrics as:

◼

Accuracy: It is the ratio of correctly predicted observations to the total observations.

◼

F1 score: Is the weighted average of Precision and Recall

◼

Precision and recall: Precision is the ratio of correctly predicted positive observations to the total predicted positive observations while recall is the ratio of correctly predicted positive observations to the all observations in actual class (see an example on the image below).

◼

Confusion matrix plot: allows to see the true positive, true negative, false positive and false negatives predicted values.

◼

ROC curve: It tells how much model is capable of distinguishing between classes (see figure bellow), the bigger the overlap between negative class and positive class curves, the poor the ROC curve will be. An optimal ROC curve will be the one with an area under the curve (AUC) equal to 1.

Confusion matrix

ROC curve

Accuracy

Compute the accuracy of the model applied on the test set:

In[]:=

NetMeasurements[resultObjectRNN["TrainedNet"],test,"Accuracy"]

Out[]=

0.958333

F1 Score

Compute the F1 score of the model applied on the test set:

In[]:=

NetMeasurements[resultObjectRNN["TrainedNet"],test,"F1Score"]

Out[]=

pos0.941176,neg0.967742

Precision and recall

Compute the precision and recall of the model applied on the test set:

In[]:=

NetMeasurements[resultObjectRNN["TrainedNet"],test,{<|"Measurement""Precision","ClassAveraging""Macro"|>,<|"Measurement""Recall","ClassAveraging""Macro"|>}]

Out[]=

{0.944444,0.96875}

Confusion matrix plot

Plot the confusion matrix of the model applied on the test set:

In[]:=

NetMeasurements[resultObjectRNN["TrainedNet"],test,"ConfusionMatrixPlot"]

Out[]=

actual class
	predicted class

ROC curve plot

Plot the ROC curve of the model applied on the test set:

In[]:=

NetMeasurements[resultObjectRNN["TrainedNet"],test,"ROCCurvePlot"]

Out[]=

recall
	false positive rate

We have obtained overall a great performance in all over the metrics we have evaluated, they tell us that the model has the capacity of correctly identify or discard the presence of COVID-19 disease from the patient cough sounds.

Concluding remarks

We have construct a model that has the ability of classifying cough sounds with around 96% of accuracy, detecting COVID-19 . This shows not only the power of recurrent neural networks to solve sound classification task, but for solving medical tasks as diagnosis pulmonary disease. We were able to reproduce the results published by the MIT team [5] and Manchester Team [1]. Our dataset was small (121 samples), but the results are promising and opens the possibility of future research.

Acknowledgement

I’d like to thank Mads Bahrami (WRI) and Laney Moy (WRI) for reviewing an initial draft of this essay and providing helpful comments to improve it.

Citation

"COVID-19 Cough Classification using Machine Learning and Global Smartphone Recordings" Madhurananda Pahar, Marisa Klopper, Robin Warren, and Thomas Niesler (2020)

"High accuracy classification of COVID-19 coughs using Mel-frequency cepstral coefficients and a Convolutional Neural Network with a use case for smart home devices" Rob Dunne, Tim Morris and Simon Harper (2020) DOI: https://doi.org/10.21203/rs.3

"From Frequency to Quefrency: A History of the Cepstrum" Alan V. Oppenheim and Ronald W. Schafer (2004)

The dummy's guide to MFCC

Artificial intelligence model detects asymptomatic Covid-19 infections through cellphone-recorded coughs

Cite this as: Siria Sadeddin, "Predicting COVID-19 using cough sounds classification" from the Notebook Archive (2021), https://notebookarchive.org/2021-01-8cf8ksr