October 29, 2021

Voice Data Classification using Deep Learning

Shubham Kumar ShuklaShubham9455

DURATION

10min

Importing Libraries:

Before proceeding further with the deep learning model we will import all the useful machine learning and deep learning libraries.

1
2
3
4
5
6
7
8
9
10
11
12
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.layers
import Input, Lambda, Conv2D, BatchNormalization
from tensorflow.keras.layers
import Activation, MaxPool2D, Flatten, Dropout, Dense
from IPython.display
import Audio
from matplotlib
import pyplot as plt
from tqdm
import tqdm

Data Loading

After importing all the useful libraries we will load the complete dataset by just calling the folder of all the files.

1
dataset = tfds.load("C:\Users\Shubham\Desktop\Topcoder challenges\torque prediction\training\training")

We have already seen the data visualization so we will move directly to data processing. Here we get data in sound form that we have to convert into a numerical data form because our computations are done in the numerical form. We must also scale the numerical features between a range, like log scaling, etc.

Data Processing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
sr = 22050
chunk = 5
def preprocess(ex):
  audio = ex.get("audio")
label = ex.get("label")
x_batch, y_batch = None, None
for i in range(0, 6):
  start = i * chunk * sr
end = (i + 1) * chunk * sr
audio_chunk = audio[start: end]
audio_spec = spectrogram(audio_chunk)
audio_spec = tf.expand_dims(audio_spec, axis = 0)
current_label = tf.expand_dims(label, axis = 0)
x_batch = audio_spec
if x_batch is None
else tf.concat([x_batch, audio_spec], axis = 0)
y_batch = current_label
if y_batch is None
else tf.concat([y_batch, current_label], axis = 0)
return x_batch, y_batch

After extracting numerical value from the data we will split it into training forms and reshape our data for fitting into our deep learning model. There may be a need for
creating test size and train size.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
x_train, y_train = None, None
for ex in tqdm(iter(train)):
  x_batch, y_batch = preprocess(ex)
x_train = x_batch
if x_train is None
else tf.concat([x_train, x_batch], axis = 0)
y_train = y_batch
if y_train is None
else tf.concat([y_train, y_batch], axis = 0)
indices = tf.random.shuffle(list(range(0, 768)))
x_train = tf.gather(x_train, indices)
y_train = tf.gather(y_train, indices)
n_val = 300
x_valid = x_train[: n_val, ...]
x_valid = x_train[: n_val, ...]
x_train = x_train[: n_val, ...] y_train = y_train[: n_val, ...]

Here we have completed the tasks for data processing and data splitting. Now we will move to model creation and we will create a deep convolutional network for data classification. I’ve illustrated this below with a diagram.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
input_ = Input(shape = (129, 212))
x = Lambda(lambda x: tf.expand_dims(x, axis = -1))(input_)
for i in range(0, 4):
  num_filters = 2 ** (5 + i)
x = Conv2D(num_filters, 3)(x)
x = BatchNormalization()(x)
x = Activation("tanh")(x)
x = MaxPool2D(2)(x)
x = Flatten()(x)
x = Dropout(0.4)(x)
x = Dense(128, activation = "relu")(x)
x = Dropout(0.4)(x)
x = Dense(1, activation = "sigmoid")(x)
model = tf.keras.models.Model(input_, x)

Above we can see that we have created the first layer as the convolution layer, then we insert the batch normalization, this is to avoid overfitting the model. Then we insert the max pooling layer. Here we take a layer to scan our feature and downsize the image. Lastly, we apply the activation function to work as a threshold for the feature.

1
2
3
4
5
6
model.compile(
  loss = "binary_crossentropy",
  optimizer = tf.keras.optimizers.Adam(learning_rate = 3e-6),
  metrics = ["accuracy"]
)
model.summary()

Model Training:

Now, after building the model we will insert our train features and label and perform iteration on it using the epoch and batch size.

1
2
model.fit(x_train, y_train, validation_data = (x_valid, y_valid), batch_size = 12,
epochs = 500, verbose = False])

Model Testing:

After training our model we will test it on some data points and features.

1
2
x_test, y_test = preprocess(ex)
preds = model.predict(x_test)

Chat on Discord

October 29, 2021

Voice Data Classification using Deep Learning

DURATION

categories

Tags

share

Join Topcoder Challenges

Importing Libraries:

Data Loading

Data Processing

Model Training:

Model Testing:

Recommended for you

Audio Data Analysis using Python

Top Three Tensorflow Tools for Data Scientists

Data Visualization with Streamlit - Part I