The first aims at finding speaker change points in an audio stream.
The second aims at grouping together speech segments on the basis of speaker characteristics.
With the increasing number of broadcasts, meeting recordings and voice mail collected every year, speaker diarisation has received much attention by the speech community, as is manifested by the specific evaluations devoted to it under the auspices of the National Institute of Standards and Technology for telephone speech, broadcast news and meetings.
[4] A leading list tracker of speaker diarization research can be found at Quan Wang's github repo.
More recently, speaker diarisation is performed via neural networks leveraging large-scale GPU computing and methodological developments in deep learning.