Speech Signal Formant Estimation and Clustering
A MATLAB program to estimate Formants
A crude estimation of the location of the formants can be obtained using the LPC function in MATLAB along with z plane pole frequency evaluation. We approximate format frequency estimates with the angle of the poles. Run the program below to see a running estimate of poles in a speech file.
EXERCISE 1:
Use the program to generate the formants for say 200 frames of data.
Plot the first two formants across time (formants vs frame number).
Can you distinguish voiced vs unvoiced frames of speech from the signal signatures?
Create a table and provide the following information for frames 50-80.
EXERCISE 2:
Clustering of Formants
In the previous exercise, formant estimation from speech signal was introduced. You built a formant detector that estimates F1 and F2 from an incoming speech signal. In this exercise, you will study how to cluster different formants using a well-known clustering algorithm, the K-means clustering algorithm [4]. Clustering is an example of unsupervised machine learning algorithm. You will make use of the JDSP-HTML 5 for performing speech formant clustering.
Deliverables:
1. Write a brief report (3 pages max) summarizing both the content and results of the two parts. One report per group of two students.
PART 1: MATLAB Exercise on Formants
2. For the first MATLAB exercise include the plots, table.
3. Provide a paragraph that discusses the results and summarized what have you learned from the exercise.
(use the file provided on black board named: cleanspeech.wav)
PART 2. J-DSP Exercise
You will run the simulations for two data sets namely, "speech formant dataset 1" and "speech formant dataset 2".
4. What are the values of the cluster centroids after convergence for dataset 1 and dataset 2. Report the cluster centroid values for each of the mentioned vowels (report F1 and F2 values for both datasets).
5. What is the value of the within-cluster sum of squares observed after the convergence for dataset 1 and dataset 2? [ This value is obtained in the JDSP k-means block]
6. Include screen shots of clustering and screen shots of the convergence curve.
7. What is the deviation observed for each vowel's formant frequencies after convergence for dataset 1 and dataset 2 if the average values of the formants for the following vowels are as follows
aa: F1 = 750 Hz F2 = 940 Hz
ae: F1 = 750 Hz F2 = 1610 Hz
u: F1 = 250 Hz F2 = 595 Hz
i: F1 = 240 Hz F2 = 2400 Hz
[Report the differences in F1 and F2 values for each formant. For example, if the k-means block obtains the centroid value for the vowel /u/ as 200 Hz (F1) and 550 Hz (F2). Report the difference for F1 as 50 Hz (250 - 200) and difference for F2 as 45 Hz (595 - 550)]
8. Discuss how well the formants are clustered for dataset 1 and dataset 2.
9. Discuss briefly what you have learned from exercise 2.
Attachment:- Assignment Files.rar