Section 2 - Analyzing Dijkstra

This page is a little theoretical. All I can say is, eat it, it's good for you.

10B.2.1 Does it work?

We can answer this question in stages:

Will it terminate?
If a vertex, W, is reachable from S, will it be sure to have a finite distance when the algorithm terminates?
Will the final distance for each vertex be minimal?

Will it terminate?

The only way it could not terminate is if Dijkstra "pushes" more than he "pops." However, for each vertex in the graph, it only gets "pushed" at most once for each of its neighbors. Therefore, there is an upper limit of |E| (the number of edges) "pushes" in the course of this algorithm (for a directed graph; for a non-directed graph it would be 2|E| since each edge appears twice). Thus, we have a finite number of possible "pushes", so the while loop must eventually terminate.

Note: the next two paragraphs assume knowledge of a basic mathematical technique called "reductio ad absurdum" or "proof by contradiction". If you don't understand this concept, you won't be able to follow the logic, so you can either look it up, or skip the paragraphs.

If a vertex is reachable from S, will it be sure to have a finite distance when the algorithm terminates?

This is the same question as asking whether every vertex (reachable from S) appears on the "partially processed vertices" queue at some point (because the first time it appears, it receives a finite value). To prove this, we assume it is false and we'll show a contradiction. If W is reachable from S but is never placed on the "partially processed vertices" queue, then we look at the shortest path from S to W. Let's say that it is S → ... → V → W. We can assume that V, the immediate predecessor of W in the shortest path did appear in the "partially processed vertices" at some point (if not, back up on that path ... and use a different W to start). So W does not appear, but V, the next-to-last vertex on the shortest path from S to W does. That means V was, at one time, "popped" from the "partially processed vertices" queue. When that happened, W's distance of infinity was compared with the distance in V (finite) plus the cost of the edge (V, W), also finite, resulting in W receiving a finite distance and being "pushed" onto the "partially processed vertices" queue. But this contradicts our hypothesis, proving the conjecture.

Will the final distance for each vertex be minimal?

This is demonstrated using the same exact logic as we just did above: proof by contradiction. I leave it as an exercise for the reader. The steps are practically identical.

10B.2.2 Changes and Choices in Dijkstra

Storing Pointers, Not Vertices

This choice was already mentioned, but it is worth repeating. The implications are favorable for us. Specifically, if we change the distance of any vertex, W, when we see it needs improving, this change must happen in the original object (accessible by its pointer). That means that if two pointers, representing the same vertex are in the collection of "partially processed vertices", they both point to the same vertex object. Therefore, we never have to worry about whether the pointer, V, we "pop" off this queue happens to have the most current and up-to-date distance for that V. It must, since any and all changes that were made to that V are stored in only one place, the vertexSet, and all of its pointers point to this same V.

Picking the Smallest V in the Outer Loop

I mentioned that this algorithm differs from the purest form. We explore that now. The way in which all Dijkstra-style algorithms differ is in the temporary data structure used to store the "partially processed vertices". To understand this, let's go back and modify one step in the algorithm to make it look like Dijkstra's original (and the most often published) version. The step in question is highlighted and colored:

initialize all vertex distances to infinity, except for starting vertex, S, which is set to 0.
"push" the one vertex S onto a partProcVerts queue.
loop while partProcVerts is not empty
- "pop" vertex V off partProcVerts
- loop for every vertex W, in V's adjacency list
  - if vertex W's distance can be shortened by going through V, do so and "push" W onto partProcVerts

Typically, this step is seen as

"pop" the vertex V having the shortest distance so far off partProcVerts

This change apparently saves some time by producing the best results for V's neighbors (the Ws) immediately, necessitating fewer improvements in future loop passes. If there are two ways to get to a W (through V and V') we would expect that the V with the smaller distance would get us to W faster. However, there are two problems with this choice:

There is no guarantee that picking the shorter V will lead to a shorter path to W. The cost of edge (V, W) will help determine this, and that cost is not considered when we choose the V with the shortest distance so far.
Getting the shortest distance requires certain extra time for every "push" or "pop". If a Deque or PriorityQueue is used to store the "partially processed vertices", for example, then a constant time "pop" is replaced by a logN time "pop," where N is the average number of items on the "partially processed vertices" container.

This is an explanation of why, for some graphs, it is not faster or better to pull off the V having the shortest distance, even though, this is how the original algorithm reads.

We have an opportunity to use our recently acquired knowledge of data structures to make a choice of container that we want to use for our "partially processed vertices". I mentioned that we could use set, stack, queue or priority queue. How did I arrive at queue?

First, what will this container "partially processed vertices" actually contain?

ANSWER: VertPtrs. We will always use the graph's vertex pointers, not the vertex objects. The reasons for this have been presented.

Now that we know that we're talking about a container of pointers (not objects) we can go on to ask why we chose a queue. What are the alternatives?

Using a Set

Do we want to allow duplicates for the "partially processed vertices"? Ideally, no. There is no reason to "push" a vertex onto this container if one already existed there. To see this, we first remind ourselves why we are "pushing" the Ws as we improve their distances to S. Dijkstra tells us to "push" them because, if we are able to make some vertex, W, smaller, putting it back on the "partially processed vertices" container allows this improvement propagate to a better path for some of W's successors. But if W happened to be "pushed" by a prior loop pass, and it has not yet been "popped", there is no reason to add it again: The W that is in queue already will work just fine when it is eventually "popped". All we care about is the fact that we will have an opportunity to use W at a future time as the centerpiece of an outer-while-loop "pop".

Because we are storing pointers, not objects, in "partially processed vertices", even an old W that was "pushed" when its distance was much larger than it is "now", will be accurate and up-to-date; it is merely a pointer to the actual W data, which is always kept current.

The only two data structures that support non-duplicates in STL are set and map. How do we compare the cost of allowing duplicates with that of using a set, say? It's not easy to give a quantitative answer in general, but you can go about it like this. "Pushing" an unneeded W (because it is already on the queue) actually takes less time than using set and having the insertion not take place. That's because the set class has to take logN time to search the tree before rejecting the insertion of an existing W. And, when we do have to legitimately add a W to the container (because it was not there), "pushing" onto a queue/stack is faster than inserting into a set (tree), for the same reason. So there is time saved in the "pushes" in queue/stack compared with a set. However, when we "pop" and process that extra W in a stack/queue regime, we take time to process its neighbors, needlessly. These two competing facts need to be examined in both sparse and dense graphs. In my implementation, I chose to allow duplicates (queue) and therefore skip the expensive logN certain search times and endure the "popping" and processing of duplicate vertices.

You can do the math yourself by counting. For example:

The maximum number of extra "pops" in a queue for each vertex is bounded by the number of edges containing that vertex, making the total extra number of "pops" < |E| - |V| (there is always one "pop" that was needed per vertex). Multiply this by the time needed to iterate through the adjacency list. Meanwhile, the extra time for adding and removing items to the set is going to be (C - 1), where C is the extra time to search and reject an existing vertex (logN for tree and constant, but still non-0, for hash table). Multiply this by the number of vertices "pushed" during the course of the algorithm, ... etc.

This is how the analysis will look. You have to do it based on the kind of graph you have.

Anyway, I opted against set.

Using a queue vs. a stack

After having decided to use a constant-time data structure for "partially processed vertices", we have to choose the right one. Our algorithm does not depend on order, however, if we use a stack, we will be "popping" recently "pushed" vertices compared to a queue where we "pop" the oldest vertices. A little thought with a picture of a graph will show you that the breadth-first nature is partially compromised (time-wise) by using a stack, while a queue preserves the nature of the breath-first processing. If we "pop" a vertex, S, then "push" all of its neighbors, we would like to process those neighbor vertices next, before "popping" and processing some deeper vertices. Thus, we choose queue, not stack.

10B.2.3 Get Optimal

Having completed this course (just about) you are not subject to the vagaries of an STL algorithm or data structure. With minor tweaks, you can customize one of your existing ADTs to have superior performance in the graph regime. For instance, you can get both smallest-V "pops" of the original Dijkstra algorithm combined with no-duplicate space saving by making a prioirty queue that does not allow duplicates. This produces two of the improvements above, in exchange for accepting the logN search times. But you may have decided that this works best for your particular data set.