Shazam and Match Filter Shazam is a mobile phone app that helps users determine a song that may be currently playing whereever they are. Lets try to understand the basic working principle of the application. To keep things simple suppose the following:
- there are only two possible songs and they are equally likely
• the application records a few seconds of the song, which we will assume, again for simplicty, correspond to the first few seconds of the song, giving a vector of length n samples. These are represented by two equal length vectors:
a = (a1,a2,....an) or b = (b1,b2,....bn)
- assume that the vectors have equal energy i.e., a
a^(2) = b^(2)
. and assume that they are roughly orthogonal
i.e., < a,b >= δ.
• if song a is playing the recording is modelled by a vector X corresponding to the samples a plus IID Gaussian noise with meanzero and variance σ ^(2)
, i.e.,
X = a+N where Xi = ai +Ni
and Ni ∼ N(0,σ^(2)). A similar model applies if song b is playing.
A basic strategy in determing which song is playing is suggested to you. Given you record the vector X = x determine if < x,a >≥< x,b >
if so it is song a otherwise it is b. This particular structure for deciding is called a matched filter as we match the received signal (x1, x2...xn) with each one of the possible songs by inner products and then select the one which gives the highest value.
1. Prove that this rule corresponds to the MAP decision rule.
2. In practice the snippet you record is not the first few seconds of the song, but could be any snippet of a given length. How would you address this problem?