Since we don't know the alignment of the observed sequence with the target labels we predict a probability distribution at each time step.
[3] A CTC network has a continuous output (e.g. softmax), which is fitted through training to model the probability of a label.
CTC does not attempt to learn boundaries and timings: Label sequences are considered equivalent if they differ only in alignment, ignoring blanks.
Equivalent label sequences can occur in many ways – which makes scoring a non-trivial task, but there is an efficient forward–backward algorithm for that.
Alternative approaches to a CTC-fitted neural network include a hidden Markov model (HMM).