In computer science and graph theory, the Canadian traveller problem (CTP) is a generalization of the shortest path problem to graphs that are partially observable.
The name supposedly originates from conversations of the authors who learned of a difficulty Canadian drivers had: traveling a network of cities with snowfall randomly blocking roads.
The stochastic version, where each edge is associated with a probability of independently being in the graph, has been given considerable attention in operations research under the name "the Stochastic Shortest Path Problem with Recourse" (SSPPR).
For a given instance, there are a number of possibilities, or realizations, of how the hidden graph may look.
The CTP task is to compute the expected cost of the optimal policies.
To compute an actual description of an optimal policy may be a harder problem.
Note that the walk is not necessarily a path since the best strategy may be to, e.g., visit every vertex of a cycle and return to the start.
This differs from the shortest path problem (with strictly positive weights), where repetitions in a walk implies that a better solution exists.
There are primarily five parameters distinguishing the number of variants of the Canadian Traveller Problem.
The first parameter is how to value the walk produced by a policy for a given instance and realization.
For the Canadian Traveller Problem, the task is to minimize the competitive ratio of the walk; i.e., to minimize the number of times longer the produced walk is to the shortest path in the realization.
The second parameter is how to evaluate a policy with respect to different realizations consistent with the instance under consideration.
For average case analysis, one must furthermore specify an a priori distribution over the realizations.
In the Stochastic Canadian Traveller Problem and in the Edge-independent Stochastic Shortest Path Problem (i-SSPPR), each uncertain edge (or cost) has an associated probability of being in the realization and the event that an edge is in the graph is independent of which other edges are in the realization.
This is called the Distribution Stochastic Shortest Path Problem (d-SSPPR or R-SSPPR) and is NP-complete.
In the Stochastic Shortest Path Problem with Recourse and Resets or the Expected Shortest Path problem, a new realization is chosen from the distribution after each step taken by the policy.
In traditional variants of CTP, the agent uncovers the exact weight (or status) of an edge upon reaching an adjacent vertex.
A new variant was recently suggested where an agent also has the ability to perform remote sensing from any location on the realization.
That is, the goal is to minimize the competitive ratio in the worst case.
be the cost of the shortest path in the graph from s to t. This is called the off-line problem because an algorithm for such a problem would have complete information of the graph.
In other words, we evaluate the policy based on the edges we currently know are in the graph (
Papadimitriou and Yannakakis noted that this defines a two-player game, where the players compete over the cost of their respective paths and the edge set is chosen by the second player (nature).
The original paper analysed the complexity of the problem and reported it to be PSPACE-complete.
It was also shown that finding an optimal path in the case where each edge has an associated probability of being in the graph (i-SSPPR) is a PSPACE-easy but ♯P-hard problem.
[1] It was an open problem to bridge this gap, but since then both the directed and undirected versions were shown to be PSPACE-hard.
The problem is said to have applications in operations research, transportation planning, artificial intelligence, machine learning, communication networks, and routing.
A variant of the problem has been studied for robot navigation with probabilistic landmark recognition.
[3] Despite the age of the problem and its many potential applications, many natural questions still remain open.
An even more fundamental question has been left unanswered: is there a polynomial-size description of an optimal policy, setting aside for a moment the time necessary to compute the description?