Introduction
Hidden Markov Models (HMMs) are a widely used tool in solving all sorts of problems from natural language processing to signal reconstruction. Here I give an overview of the fundamental concepts involved. Let’s go!
Markov Processes
HMMs make the assumption that whatever is going on is following a Markov process, so let’s start with that. This isn’t complicated at all: it
just means that what happens at time
So we have a bunch of states, with transition probabilities between them. We can represent this compactly as a transition diagram. Remember that the indices here are for different states, not different time periods (make sure you understand this):
The transition probabilities can of course be encoded in a matrix
And voilà, we have a Markov process!
HMMs
Hidden Markov Models assume there is an underlying Markov Process going on. The word hidden then comes from the fact that we do not observe the actual process, but rather a proxy for it. One can imagine all sorts of scenarios that fit this model: receiving a message that might be distorted, choosing a meal based on the previous day and then choosing a drink based on that…
The states in say set
Note that we may have different numbers of states in the two sets. Remember that it is the states in
Question: can you tell me what
The Problem
Let’s be a bit more concrete about what we’re trying to solve. We assume a Markov process
is going on, let’s say for times
Let
We simply marginalised
Some Approaches
Quick Step Back
Before we dive in, let’s make sure we understand what the “hard” part is. We’re looking to
maximise over all the possible joint probabilities of
This problem is begging for some dynamic programming, where we remember paths taken to avoid total recalculation at each time step, kind of like Hansel and Gretel.
The Forward Algorithm
Let’s say we just want to estimate
We’re looking to get something recursive. We can use the chain rule on the above:
The first term in the sum can be simplified since our transition matrix
We have our recursion! This means that at each time step, we can do a quick calculation
based on our transition probabilities in
The Viterbi Algorithm
Remember that we want to find the most likely
One thing we haven’t mentionned that we’ll need is an inital distribution for the states at time 0. Call this
That is the total “value” from getting to each state at time
Recall there
Question: Can you figure out the big-O complexity of this algorithm in terms of
Conclusion & Extensions
Hidden Markov Models are extremely useful in practice. By making a few assumptions, they can provide a good balance of informational richness and tractability. HMMs are also a great stepping stone to more generalised probabilistic graphical models!
Now that we’ve explored the basics, can you think of some extensions to our model? Here are some that initially come to my mind:
- Consider different transition matrices
and for each time step - Allow for multiple hidden layers
- Solve using Monte Carlo methods
- Extend the Markov assumption to 2, 3, or more time steps back!
I’m sure there are many more. Hope you had fun, and thanks for reading!