Text Book - Algorithm Design by Jon Kleinberg and Eva Tardos
Chapter 3 - Graphs
Exercises
Q1. Consider the directed acyclic graph G in Figure 3.10. How many topological orderings does it have?
Q2. Give an algorithm to detect whether a given undirected graph contains a cycle. If the graph contains a cycle, then your algorithm should output one. (It should not output all cycles in the graph, just one of them.) The running time of your algorithm should be O(m + n) for a graph with n nodes and m edges.
Q3. The algorithm described for computing a topological ordering of a DAG repeatedly finds a node with no incoming edges and deletes it. This will eventually produce a topological ordering, provided that the input graph really is a DAG.
But suppose that we're given an arbitrary graph that may or may not be a DAG. Extend the topological ordering algorithm so that, given an input directed graph G, it outputs one of two things: (a) a topological ordering, thus establishing that G is a DAG; or (b) a cycle in G, thus establishing that G is not a DAG. The running time of your algorithm should be O(m + n) for a directed graph with n nodes and m edges.
Q4. Inspired by the example of that great Cornellian, Vladimir Nabokov, some of your friends have become amateur lepidopterists (they study butter- flies). Often when they return from a trip with specimens of butterflies, it is very difficult for them to tell how many distinct species they've caught-thanks to the fact that many species look very similar to one another.
One day they return with n butterflies, and they believe that each belongs to one of two different species, which we'll call A and B for purposes of this discussion. They'd like to divide the n specimens into two groups-those that belong to A and those that belong to B-but it's very hard for them to directly label any one specimen. So they decide to adopt the following approach.
For each pair of specimens i and j, they study them carefully side by side. If they're confident enough in their judgment, then they label the pair (i, j) either "same" (meaning they believe them both to come from the same species) or "different" (meaning they believe them to come from different species). They also have the option of rendering no judgment on a given pair, in which case we'll call the pair ambiguous.
So now they have the collection of n specimens, as well as a collection of m judgments (either "same" or "different") for the pairs that were not declared to be ambiguous. They'd like to know if this data is consistent with the idea that each butterfly is from one of species A or B. So more concretely, we'll declare the m judgments to be consistent if it is possible to label each specimen either A or B in such a way that for each pair (i, j) labeled "same," it is the case that i and j have the same label; and for each pair (i, j) labeled "different," it is the case that i and j have different labels. They're in the middle of tediously working out whether their judgments are consistent, when one of them realizes that you probably have an algorithm that would answer this question right away.
Give an algorithm with running time O(m + n) that determines whether the m judgments are consistent.
Q5. A binary tree is a rooted tree in which each node has atmost two children. Show by induction that in any binary tree the number of nodes with two children is exactly one less than the number of leaves.
Q6. We have a connected graph G = (V, E), and a specific vertex u ∈ V. Suppose we compute a depth-first search tree rooted at u, and obtain a tree T that includes all nodes of G. Suppose we then compute a breadth-first search tree rooted at u, and obtain the same tree T. Prove that G = T. (In other words, if T is both a depth-first search tree and a breadth-first search tree rooted at u, then G cannot contain any edges that do not belong to T.)
7. Some friends of yours work on wireless networks, and they're currently studying the properties of a network of n mobile devices. As the devices move around (actually, as their human owners move around), they define a graph at any point in time as follows: there is a node representing each of the n devices, and there is an edge between device i and device j if the physical locations of i and j are no more than 500 meters apart. (If so, we say that i and j are "in range" of each other.)
They'd like it to be the case that the network of devices is connected at all times, and so they've constrained the motion of the devices to satisfy the following property: at all times, each device i is within 500 meters of at least n/2 of the other devices. (We'll assume n is an even number.)
What they'd like to know is: Does this property by itself guarantee that the network will remain connected?
Here's a concrete way to formulate the question as a claim about graphs.
Claim: Let G be a graph on n nodes, where n is an even number. If every node of G has degree at least n/2, then G is connected.
Decide whether you think the claim is true or false, and give a proof of either the claim or its negation.
Q8. A number of stories in the press about the structure of the Internet and the Web have focused on some version of the following question: How far apart are typical nodes in these networks? If you read these stories carefully, you find that many of them are confused about the difference between the diameter of a network and the average distance in a network; they often jump back and forth between these concepts as though they're the same thing.
As in the text, we say that the distance between two nodes u and v in a graph G = (V, E) is the minimum number of edges in a path joining them; we'll denote this by dist(u, v). We say that the diameter of G is the maximum distance between any pair of nodes; and we'll denote this quantity by diam(G).
Let's define a related quantity, which we'll call the average pair wise distance in G (denoted apd(G)). We define apd(G) to be the average, over all sets of two distinct nodes u and v, of the distance between u and v. That is,
apd(G) = [∑{u,v}⊆V dist(u, v)]/.
Here's a simple example to convince yourself that there are graphs G for which diam(G) ≠ apd(G). Let G be a graph with three nodes u, v, w, and with the two edges {u, v} and {v, w}. Then
diam(G) = dist(u, w) = 2,
while
apd(G) = [dist(u, v) + dist(u, w) + dist(v, w)]/3= 4/3.
Of course, these two numbers aren't all that far apart in the case of this three-node graph, and so it's natural to ask whether there's always a close relation between them. Here's a claim that tries to make this precise.
Claim: There exists a positive natural number c so that for all connected graphs G, it is the case that
diam(G)/apd(G) ≤ c.
Decide whether you think the claim is true or false, and give a proof of either the claim or its negation.
Q9. There's a natural intuition that two nodes that are far apart in a communication network-separated by many hops-have a more tenuous connection than two nodes that are close together. There are a number of algorithmic results that are based to some extent on different ways of making this notion precise. Here's one that involves the susceptibility of paths to the deletion of nodes.
Suppose that an n-node undirected graph G = (V, E) contains two nodes s and t such that the distance between s and t is strictly greater than n/2. Show that there must exist some node v, not equal to either s or t, such that deleting v from G destroys all s-t paths. (In other words, the graph obtained from G by deleting v contains no path from s to t.) Give an algorithm with running time O(m + n) to find such a node v.
Q10. A number of art museums around the country have been featuring work by an artist named Mark Lombardi (1951-2000), consisting of a set of intricately rendered graphs. Building on a great deal of research, these graphs encode the relationships among people involved in major political scandals over the past several decades: the nodes correspond to participants, and each edge indicates some type of relationship between a pair of participants. And so, if you peer closely enough at the drawings, you can trace out ominous-looking paths from a high-ranking U.S. government official, to a former business partner, to a bank in Switzerland, to a shadowy arms dealer.
Such pictures form striking examples of social networks, which, as we discussed in Section 3.1, have nodes representing people and organizations, and edges representing relationships of various kinds. And the short paths that abound in these networks have attracted considerable attention recently, as people ponder what they mean. In the case of Mark Lombardi's graphs, they hint at the short set of steps that can carry you from the reputable to the disreputable.
Of course, a single, spurious short path between nodes v and w in such a network may be more coincidental than anything else; a large number of short paths between v and w can be much more convincing. So in addition to the problem of computing a single shortest v-w path in a graph G, social networks researchers have looked at the problem of determining the number of shortest v-w paths.
This turns out to be a problem that can be solved efficiently. Suppose we are given an undirected graph G = (V, E), and we identify two nodes v and w in G. Give an algorithm that computes the number of shortest v-w paths in G. (The algorithm should not list all the paths; just the number suffices.) The running time of your algorithm should be O(m + n) for a graph with n nodes and m edges.
Q11. You're helping some security analysts monitor a collection of networked computers, tracking the spread of an online virus. There are n computers in the system, labeled C1, C2, ... , Cn, and as input you're given a collection of trace data indicating the times at which pairs of computers communicated. Thus the data is a sequence of ordered triples (Ci, Cj, Ck); such a triple indicates that Ci and Cj exchanged bits at time tk. There are m triples total.
We'll assume that the triples are presented to you in sorted order of time. For purposes of simplicity, we'll assume that each pair of computers communicates at most once during the interval you're observing.
The security analysts you're working with would like to be able to answer questions of the following form: If the virus was inserted into computer Ca at time x, could it possibly have infected computer Cb by time y? The mechanics of infection are simple: if an infected computer Ci communicates with an uninfected computer Cj at time tk (in other words, if one of the triples (Ci , Cj, tk) or (Cj, Ci, tk) appears in the trace data), then computer Cj becomes infected as well, starting at time tk. Infection can thus spread from one machine to another across a sequence of communications, provided that no step in this sequence involves a move backward in time. Thus, for example, if Ci is infected by time tk, and the trace data contains triples (Ci, Cj, tk) and (Cj, Cq, tr), where tk ≤ tr, then Cq will become infected via Cj. (Note that it is okay for tk to be equal to tr; this would mean that Cj had open connections to both Ci and Cq at the same time, and so a virus could move from Ci to Cq.)
For example, suppose n = 4, the trace data consists of the triples (C1, C2, 4), (C2, C4, 8), (C3, C4, 8), (C1, C4, 12), and the virus was inserted into computer C1 at time 2. Then C3 would be infected at time 8 by a sequence of three steps: first C2 becomes infected at time 4, then C4 gets the virus from C2 at time 8, and then C3 gets the virus from C4 at time 8. On the other hand, if the trace data were (C2, C3, 8), (C1, C4, 12), (C1, C2, 14), and again the virus was inserted into computer C1 at time 2, then C3 would not become infected during the period of observation: although C2 becomes infected at time 14, we see that C3 only communicates with C2 before C2 was infected. There is no sequence of communications moving forward in time by which the virus could get from C1 to C3 in this second example.
Design an algorithm that answers questions of this type: given a collection of trace data, the algorithm should decide whether a virus introduced at computer Ca at time x could have infected computer Cb by time y. The algorithm should run in time O(m + n).
12. You're helping a group of ethnographers analyze some oral history data they've collected by interviewing members of a village to learn about the lives of people who've lived there over the past two hundred years.
From these interviews, they've learned about a set of n people (all of them now deceased), whom we'll denote P1, P2,..., Pn. They've also collected facts about when these people lived relative to one another. Each fact has one of the following two forms:
- For some i and j, person Pi died before person Pj was born; or
- For some i and j, the life spans of Pi and Pj overlapped at least partially.
Naturally, they're not sure that all these facts are correct; memories are not so good, and a lot of this was passed down by word of mouth. So what they'd like you to determine is whether the data they've collected is at least internally consistent, in the sense that there could have existed a set of people for which all the facts they've learned simultaneously hold.
Give an efficient algorithm to do this: either it should produce proposed dates of birth and death for each of the n people so that all the facts hold true, or it should report (correctly) that no such dates can exist-that is, the facts collected by the ethnographers are not internally consistent.