Furnas et al. (1987) were perhaps the first to quantitatively study the vocabulary mismatch problem.
This research motivated the work on latent semantic indexing.
Zhao and Callan (2010)[2] were perhaps the first to quantitatively study the vocabulary mismatch problem in a retrieval setting.
They developed novel term weight prediction methods that can lead to potentially 50-80% accuracy gains in retrieval over strong keyword retrieval models.
Further research along the line shows that expert users can use Boolean Conjunctive Normal Form expansion to improve retrieval performance by 50-300% over unexpanded keyword queries.