Markovian discrimination

[1] A bag-of-words model contains only a dictionary of legal words and their relative probabilities in spam and genuine messages.

Put another way, a bag-of-words filter discriminates based on relative probabilities of single words alone regardless of phrase structure, while a Markovian word-based filter discriminates based on relative probabilities of either pairs of words, or, more commonly, short sequences of words.

Neither naive Bayes nor Markovian filters are limited to the word level for tokenizing messages.

[3] Since those more obscure conditional relationships are more typical of natural language messages including both genuine messages and spam, hidden Markov models are generally preferred over visible Markov models for spam filtering.

Due to storage constraints, the most commonly employed model is a specific type of hidden Markov model known as a Markov random field, typically with a 'sliding window' or clique size ranging between four and six tokens.