We can now define the diverse subset problem as follows


Text Book - Algorithm Design by Jon Kleinberg and Eva Tardos

Chapter 8 - NP and Computational Intractability

Exercises

Q1. For each of the two questions below, decide whether the answer is (i) "Yes," (ii) "No," or (iii) "Unknown, because it would resolve the question of whether P = NP." Give a brief explanation of your answer.

(a) Let's define the decision version of the Interval Scheduling Problem from Chapter 4 as follows: Given a collection of intervals on a time-line, and a bound k, does the collection contain a subset of non-overlapping intervals of size at least k?

Question: Is it the case that Interval Scheduling ≤P Vertex Cover?

(b) Question: Is it the case that Independent Set ≤P Interval Scheduling?

Q2. A store trying to analyze the behavior of its customers will often maintain a two-dimensional array A, where the rows correspond to its customers and the columns correspond to the products it sells. The entry A[i, j] specifies the quantity of product j that has been purchased by customer i.

Here's a tiny example of such an array A.

 

liquid detergent

beer

diapers

cat litter

Raj

0

6

0

3

Alanis

2

3

0

0

Chelsea

0

0

0

7

One thing that a store might want to do with this data is the following. Let us say that a subset S of the customers is diverse if no two of the of the customers in S have ever bought the same product (i.e., for each product, at most one of the customers in S has ever bought it). A diverse set of customers can be useful, for example, as a target pool for market research.

We can now define the Diverse Subset Problem as follows: Given an m × n array A as defined above, and a number k ≤ m, is there a subset of at least k of customers that is diverse?

Show that Diverse Subset is NP-complete.

Q3. Suppose you're helping to organize a summer sports camp, and the following problem comes up. The camp is supposed to have at least one counselor who's skilled at each of the n sports covered by the camp (baseball, volleyball, and so on). They have received job applications from m potential counselors. For each of the n sports, there is some subset of the m applicants qualified in that sport. The question is: For a given number k < m, is it possible to hire at most k of the counselors and have at least one counselor qualified in each of the n sports? We'll call this the Efficient Recruiting Problem.

Show that Efficient Recruiting is NP-complete.

Q4. Suppose you're consulting for a group that manages a high-performance real-time system in which asynchronous processes make use of shared resources. Thus the system has a set of n processes and a set of m resources. At any given point in time, each process specifies a set of resources that it requests to use. Each resource might be requested by many processes at once; but it can only be used by a single process at a time. Your job is to allocate resources to processes that request them. If a process is allocated all the resources it requests, then it is active; otherwise it is blocked. You want to perform the allocation so that as many processes as possible are active. Thus we phrase the Resource Reservation

Problem as follows: Given a set of processes and resources, the set of requested resources for each process, and a number k, is it possible to allocate resources to processes so that at least k processes will be active?

Consider the following list of problems, and for each problem either give a polynomial-time algorithm or prove that the problem is NP-complete.

(a) The general Resource Reservation Problem defined above.

(b) The special case of the problem when k = 2.

(c) The special case of the problem when there are two types of resources-say, people and equipment-and each process requires at most one resource of each type (In other words, each process requires one specific person and one specific piece of equipment.)

(d) The special case of the problem when each resource is requested by at most two processes.

Q5. Consider a set A = {a1,..., an} and a collection B1, B2,..., Bm of subsets of A (i.e., Bi ⊆ A for each i).

We say that a set H ⊆ A is a hitting set for the collection B1, B2,..., Bm if H contains at least one element from each Bi -that is, if H ∩ Bi is not empty for each i (so H "hits" all the sets Bi).

We now define the Hitting Set Problem as follows. We are given a set A ={a1,..., an}, a collection B1, B2,..., Bm of subsets of A, and a number k. We are asked: Is there a hitting set H ⊆ A for B1, B2,..., Bm so that the size of H is at most k?

Prove that Hitting Set is NP-complete.

Q6. Consider an instance of the Satisfiability Problem, specified by clauses C1,..., Ck over a set of Boolean variables x1,..., xn. We say that the instance is monotone if each term in each clause consists of a non-negated variable; that is, each term is equal to xi, for some i, rather than xi. Monotone instances of Satisfiability are very easy to solve: They are always satisfiable, by setting each variable equal to 1.

For example, suppose we have the three clauses

(x1 ∨ x2), (x1 ∨ x3), (x2 ∨ x3).

This is monotone, and indeed the assignment that sets all three variables to 1 satisfies all the clauses. But we can observe that this is not the only satisfying assignment; we could also have set x1 and x2 to 1, and x3 to 0. Indeed, for any monotone instance, it is natural to ask how few variables we need to set to 1 in order to satisfy it.

Given a monotone instance of Satisfiability, together with a number k, the problem of Monotone Satisfiability with Few True Variables asks: Is there a satisfying assignment for the instance in which atmost k variables are set to 1? Prove this problem is NP-complete.

Q7. Since the 3-Dimensional Matching Problem is NP-complete, it is natural to expect that the corresponding 4-Dimensional Matching Problem is at least as hard. Let us define 4-Dimensional Matching as follows. Given sets W, X, Y, and Z, each of size n, and a collection C of ordered 4-tuples of the form (wi, xj, yk, zl), do there exist n 4-tuples from C so that no two have an element in common?

Prove that 4-Dimensional Matching is NP-complete.

Q8. Your friends' preschool-age daughter Madison has recently learned to spell some simple words. To help encourage this, her parents got her a colorful set of refrigerator magnets featuring the letters of the alphabet (some number of copies of the letter A, some number of copies of the letter B, and so on), and the last time you saw her the two of you spent a while arranging the magnets to spell out words that she knows.

Somehow with you and Madison, things always end up getting more elaborate than originally planned, and soon the two of you were trying to spell out words so as to use up all the magnets in the full set-that is, picking words that she knows how to spell, so that once they were all spelled out, each magnet was participating in the spelling of exactly one of the words. (Multiple copies of words are okay here; so for example, if the set of refrigerator magnets includes two copies each of C, A, and T, it would be okay to spell out CAT twice.)

This turned out to be pretty difficult, and it was only later that you realized a plausible reason for this. Suppose we consider a general version of the problem of Using Up All the Refrigerator Magnets, where we replace the English alphabet by an arbitrary collection of symbols, and we model Madison's vocabulary as an arbitrary set of strings over this collection of symbols. The goal is the same as in the previous paragraph.

Prove that the problem of Using Up All the Refrigerator Magnets is NP-complete.

Q9. Consider the following problem. You are managing a communication network, modeled by a directed graph G = (V, E). There are c users who are interested in making use of this network. User i (for each i = 1,2,..., c) issues a request to reserve a specific path Pi in G on which to transmit data.

You are interested in accepting as many of these path requests as possible, subject to the following restriction: if you accept both Pi and Pj, then Pi and Pj cannot share any nodes.

Thus, the Path Selection Problem asks: Given a directed graph G = (V, E), a set of requests P1, P2,..., Pc-each of which must be a path in G-and a number k, is it possible to select at least k of the paths so that no two of the selected paths share any nodes?

Prove that Path Selection is NP-complete.

Q10. Your friends at Web Exodus have recently been doing some consulting work for companies that maintain large, publicly accessible Web sites-contractual issues prevent them from saying which ones-and they've come across the following Strategic Advertising Problem.

A company comes to them with the map of a Web site, which we'll model as a directed graph G = (V, E). The company also provides a set of t trails typically followed by users of the site; we'll model these trails as directed paths P1, P2,..., Pt in the graph G (i.e., each Pi is a path in G).

The company wants Web Exodus to answer the following question for them: Given G, the paths {Pi}, and a number k, is it possible to place advertisements on at most k of the nodes in G, so that each path Pi includes at least one node containing an advertisement? We'll call this the Strategic Advertising Problem, with input G, {Pi: i = 1,..., t}, and k.

Your friends figure that a good algorithm for this will make them all rich; unfortunately, things are never quite this simple.

(a) Prove that Strategic Advertising is NP-complete.

(b) Your friends at Web Exodus forge ahead and write a pretty fast algorithm S that produces yes/no answers to arbitrary instances of the Strategic Advertising Problem. You may assume that the algorithm S is always correct.

Using the algorithm S as a black box, design an algorithm that takes input G, {Pi}, and k as in part (a), and does one of the following two things:

- Outputs a set of at most k nodes in G so that each path Pi includes at least one of these nodes, or

- Outputs (correctly) that no such set of at most k nodes exists.

Your algorithm should use at most a polynomial number of steps, together with at most a polynomial number of calls to the algorithm S.

Q11. As some people remember, and many have been told, the idea of hyper-text predates the World Wide Web by decades. Even hypertext fiction is a relatively old idea: Rather than being constrained by the linearity of the printed page, you can plot a story that consists of a collection of interlocked virtual "places" joined by virtual "passages."4 So a piece of hypertext fiction is really riding on an underlying directed graph; to be concrete (though narrowing the full range of what the domain can do), we'll model this as follows.

Let's view the structure of a piece of hypertext fiction as a directed graph G = (V, E). Each node u ∈ V contains some text; when the reader is currently at u, he or she can choose to follow any edge out of u; and if the reader chooses e = (u, v), he or she arrives next at the node v. There is a start node s ∈ V where the reader begins, and an end node t ∈ V; when the reader first reaches t, the story ends. Thus any path from s to t is a valid plot of the story. Note that, unlike one's experience using a Web browser, there is not necessarily a way to go back; once you've gone from u to v, you might not be able to ever return to u.

In this way, the hypertext structure defines a huge number of different plots on the same underlying content; and the relationships among all these possibilities can grow very intricate. Here's a type of problem one encounters when reasoning about a structure like this. Consider a piece of hypertext fiction built on a graph G = (V, E) in which there are certain crucial thematic elements: love, death, war, an intense desire to major in computer science, and so forth. Each thematic element i is represented by a set Ti ⊆ V consisting of the nodes in G at which this theme appears. Now, given a particular set of thematic elements, we may ask: Is there a valid plot of the story in which each of these elements is encountered? More concretely, given a directed graph G, with start node s and end node t, and thematic elements represented by sets T1, T2,..., Tk, the Plot Fulfillment Problem asks: Is there a path from s to t that contains at least one node from each of the sets Ti?

Prove that Plot Fulfillment is NP-complete.

Q12. Some friends of yours maintain a popular news and discussion site on the Web, and the traffic has reached a level where they want to begin differentiating their visitors into paying and nonpaying customers. A standard way to do this is to make all the content on the site available to customers who pay a monthly subscription fee; meanwhile, visitors who don't subscribe can still view a subset of the pages (all the while being bombarded with ads asking them to become subscribers).

Here are two simple ways to control access for nonsubscribers: You could (1) designate a fixed subset of pages as viewable by nonsubscribers, or (2) allow any page in principle to be viewable, but specify a maximum number of pages that can be viewed by a nonsubscriber in a single session. (We'll assume the site is able to track the path followed by a visitor through the site.)

Your friends are experimenting with a way of restricting access that is different from and more subtle than either of these two options. They want nonsubscribers to be able to sample different sections of the Web site, so they designate certain subsets of the pages as constituting particular zones-for example, there can be a zone for pages on politics, a zone for pages on music, and so forth. It's possible for a page to belong to more than one zone. Now, as a non-subscribing user passes through the site, the access policy allows him or her to visit one page from each zone, but an attempt by the user to access a second page from the same zone later in the browsing session will be disallowed. (Instead, the user will be directed to an ad suggesting that he or she become a subscriber.)

More formally, we can model the site as a directed graph G = (V, E),in which the nodes represent Web pages and the edges represent directed hyperlinks. There is a distinguished entry node s ∈ V, and there are zones Z1,..., Zk ⊆ V. A path P taken by a nonsubscriber is restricted to include at most one node from each zone Zi. One issue with this more complicated access policy is that it gets difficult to answer even basic questions about reach ability, including: Is it possible for a nonsubscriber to visit a given node t? More precisely, we define the Evasive Path Problem as follows: Given G, Z1,..., Zk, s ∈ V, and a destination node t ∈ V, is there an s-t path in G that includes at most one node from each zone Zi? Prove that Evasive Path is NP-complete.

13. A combinatorial auction is a particular mechanism developed by economists for selling a collection of items to a collection of potential buyers. (The Federal Communications Commission has studied this type of auction for assigning stations on the radio spectrum to broadcasting companies.)

Here's a simple type of combinatorial auction. There are n items for sale, labeled I1,..., In. Each item is indivisible and can only be sold to one person. Now, m different people place bids: The ith bid specifies a subset Si of the items, and an offering price xi that the bidder is willing to pay for the items in the set Si, as a single unit. (We'll represent this bid as the pair (Si, xi).)

An auctioneer now looks at the set of all m bids; she chooses to accept some of these bids and to reject the others. Each person whose bid i is accepted gets to take all the items in the corresponding set Si.

Thus the rule is that no two accepted bids can specify sets that contain a common item, since this would involve giving the same item to two different people.

The auctioneer collects the sum of the offering prices of all accepted bids. (Note that this is a "one-shot" auction; there is no opportunity to place further bids.) The auctioneer's goal is to collect as much money as possible.

Thus, the problem of Winner Determination for Combinatorial Auctions asks: Given items I1,..., In, bids (S1, x1),..., (Sm, xm), and a bound B, is there a collection of bids that the auctioneer can accept so as to collect an amount of money that is at least B?

Q14. We've seen the Interval Scheduling Problem in Chapters 1 and 4. Here we consider a computationally much harder version of it that we'll call Multiple Interval Scheduling. As before, you have a processor that is available to run jobs over some period of time (e.g., 9 A.M. to 5 P.M).

People submit jobs to run on the processor; the processor can only work on one job at any single point in time. Jobs in this model, however, are more complicated than we've seen in the past: each job requires a set of intervals of time during which it needs to use the processor. Thus, for example, a single job could require the processor from 10 A.M. to 11 A.M., and again from 2 P.M. to 3 P.M.. If you accept this job, it ties up your processor during those two hours, but you could still accept jobs that need any other time periods (including the hours from 11 A.M. to 2 A.M.).

Now you're given a set of n jobs, each specified by a set of time intervals, and you want to answer the following question: For a given number k, is it possible to accept at least k of the jobs so that no two of the accepted jobs have any overlap in time?

Show that Multiple Interval Scheduling is NP-complete.

15. You're sitting at your desk one day when a FedEx package arrives for you. Inside is a cell phone that begins to ring, and you're not entirely surprised to discover that it's your friend Neo, whom you haven't heard from in quite a while. Conversations with Neo all seem to go the same way: He starts out with some big melodramatic justification for why he's calling, but in the end it always comes down to him trying to get you to volunteer your time to help with some problem he needs to solve.

This time, for reasons he can't go into (something having to do with protecting an underground city from killer robot probes), he and a few associates need to monitor radio signals at various points on the electromagnetic spectrum. Specifically, there are n different frequencies that need monitoring, and to do this they have available a collection of sensors.

There are two components to the monitoring problem.

A set L of m geographic locations at which sensors can be placed; and

A set S of b interference sources, each of which blocks certain frequencies at certain locations. Specifically, each interference source i is specified by a pair (Fi, Li), where Fi is a subset of the frequencies and Li is a subset of the locations; it signifies that (due to radio interference) a sensor placed at any location in the set Li will not be able to receive signals on any frequency in the set Fi.

We say that a subset L' ⊆ L of locations is sufficient if, for each of the n frequencies j, there is some location in L' where frequency j is not blocked by any interference source. Thus, by placing a sensor at each location in a sufficient set, you can successfully monitor each of the n frequencies.

They have k sensors, and hence they want to know whether there is a sufficient set of locations of size at most k. We'll call this an instance of the Nearby Electromagnetic Observation Problem: Given frequencies, locations, interference sources, and a parameter k, is there a sufficient set of size at most k?

Q16. Consider the problem of reasoning about the identity of a set from the size of its intersections with other sets. You are given a finite set U of size n, and a collection A1,..., Am of subsets of U. You are also given numbers c1,..., cm. The question is: Does there exist a set X ⊂ U so that for each i = 1,2,..., m, the cardinality of X ∩ Ai is equal to ci? We will call this an instance of the Intersection Inference Problem, with input U, {Ai}, and {ci}.

Prove that Intersection Inference is NP-complete.

Q17. You are given a directed graph G = (V, E) with weights we on its edges e ∈ E. The weights can be negative or positive. The Zero-Weight-Cycle Problem is to decide if there is a simple cycle in G so that the sum of the edge weights on this cycle is exactly 0. Prove that this problem is NP-complete.

Q18. You've been asked to help some organizational theorists analyze data on group decision-making. In particular, they've been looking at a dataset that consists of decisions made by a particular governmental policy committee, and they're trying to decide whether it's possible to identify a small set of influential members of the committee.

Here's how the committee works. It has a set M = {m1,..., mn} of n members, and over the past year it's voted on t different issues. On each issue, each member can vote either "Yes," "No," or "Abstain"; the overall effect is that the committee presents an affirmative decision on the issue if the number of "Yes" votes is strictly greater than the number of "No" votes (the "Abstain" votes don't count for either side), and it delivers a negative decision otherwise.

Now we have a big table consisting of the vote cast by each committee member on each issue, and we'd like to consider the following definition. We say that a subset of the members M' ⊆ M is decisive if, had we looked just at the votes cast by the members in M', the committee's decision on every issue would have been the same. (In other words, the overall outcome of the voting among the members in M' is the same on every issue as the overall outcome of the voting by the entire committee.) Such a subset can be viewed as a kind of "inner circle" that reflects the behavior of the committee as a whole.

Here's the question: Given the votes cast by each member on each issue, and given a parameter k, we want to know whether there is a decisive subset consisting of at most k members. We'll call this an instance of the Decisive Subset Problem.

Q19. Suppose you're acting as a consultant for the port authority of a small Pacific Rim nation. They're currently doing a multibillion-dollar business per year, and their revenue is constrained almost entirely by the rate at which they can unload ships that arrive in the port.

Handling hazardous materials adds additional complexity to what is, for them, an already complicated task. Suppose a convoy of ships arrives in the morning and delivers a total of n cannisters, each containing a different kind of hazardous material. Standing on the dock is a set of m trucks, each of which can hold up to k containers.

Here are two related problems, which arise from different types of constraints that might be placed on the handling of hazardous materials.  For each of the two problems, give one of the following two answers:

  • A polynomial-time algorithm to solve it; or
  • A proof that it is NP-complete.

(a) For each cannister, there is a specified subset of the trucks in which it may be safely carried. Is there a way to load all n cannisters into the m trucks so that no truck is overloaded, and each container goes in a truck that is allowed to carry it?

(b) In this different version of the problem, any cannister can be placed in any truck; however, there are certain pairs of cannisters that cannot be placed together in the same truck. (The chemicals they contain may react explosively if brought into contact.) Is there a way to load all n cannisters into the m trucks so that no truck is overloaded, and no two cannisters are placed in the same truck when they are not supposed to be?

Q20. There are many different ways to formalize the intuitive problem of clustering, where the goal is to divide up a collection of objects into groups that are "similar" to one another.

First, a natural way to express the input to a clustering problem is via a set of objects p1, p2,..., pn, with a numerical distance d(pi, pj) defined on each pair. (We require only that d(pi, pi) = 0; that d(pi, pj)> 0 for distinct pi and pj; and that distances are symmetric: d(pi, pj) = d(pj, pi).)

In Section 4.7, earlier in the book, we considered one reasonable formulation of the clustering problem: Divide the objects into k sets so as to maximize the minimum distance between any pair of objects in distinct clusters. This turns out to be solvable by a nice application of the Minimum Spanning Tree Problem.

A different but seemingly related way to formalize the clustering problem would be as follows: Divide the objects into k sets so as to minimize the maximum distance between any pair of objects in the same cluster. Note the change. Where the formulation in the previous paragraph sought clusters so that no two were "close together," this new formulation seeks clusters so that none of them is too "wide"-that is, no cluster contains two points at a large distance from each other.

Given the similarities, it's perhaps surprising that this new formulation is computationally hard to solve optimally. To be able to think about this in terms of NP-completeness, let's write it first as a yes/no decision problem. Given n objects p1, p2,..., pn with distances on them as above, and a bound B, we define the Low-Diameter Clustering Problem as follows: Can the objects be partitioned into k sets, so that no two points in the same set are at a distance greater than B from each other?

Prove that Low-Diameter Clustering is NP-complete.

21. After a few too many days immersed in the popular entrepreneurial self-help book Mine Your Own Business, you've come to the realization that you need to upgrade your office computing system. This, however, leads to some tricky problems.

In configuring your new system, there are k components that must be selected: the operating system, the text editing software, the e-mail program, and so forth; each is a separate component. For the jth  component of the system, you have a set Aj of options; and a configuration of the system consists of a selection of one element from each of the sets of options A1, A2,..., Ak.

Now the trouble arises because certain pairs of options from different sets may not be compatible. We say that option xi ∈ Ai and option xj ∈ Aj form an incompatible pair if a single system cannot contain them both. (For example, Linux (as an option for the operating system) and Microsoft Word (as an option for the text-editing software) form an incompatible pair.) We say that a configuration of the system is fully compatible if it consists of elements x1 ∈ A1, x2 ∈ A2,... xk ∈ Ak such that none of the pairs (xi, xj) is an incompatible pair.

We can now define the Fully Compatible Configuration (FCC) Problem. An instance of FCC consists of disjoint sets of options A1, A2,..., Ak, and a set P of incompatible pairs (x, y), where x and y are elements of different sets of options. The problem is to decide whether there exists a fully compatible configuration: a selection of an element from each option set so that no pair of selected elements belongs to the set P.

Q22. Suppose that someone gives you a black-box algorithm A that takes an undirected graph G = (V, E), and a number k, and behaves as follows.

  • If G is not connected, it simply returns "G is not connected."
  • If G is connected and has an independent set of size at least k, it returns "yes."
  • If G is connected and does not have an independent set of size at least k, it returns "no."

Suppose that the algorithm A runs in time polynomial in the size of G and k.

Show how, using calls to A, you could then solve the Independent Set Problem in polynomial time: Given an arbitrary undirected graph G, and a number k, does G contain an independent set of size at least k?

Q23. Given a set of finite binary strings S = {s1,...,sk}, we say that a string u is a concatenation over S if it is equal to si1 si2 ... sit for some indices i1,..., it ∈{1,..., k}.

A friend of yours is considering the following problem: Given two sets of finite binary strings, A = {a1,..., am} and B = {b1,..., bn}, does there exist any string u so that u is both a concatenation over A and a concatenation over B?

Your friend announces, "At least the problem is in NP, since I would just have to exhibit such a string u in order to prove the answer is yes." You point out (politely, of course) that this is a completely inadequate explanation; how do we know that the shortest such string u doesn't have length exponential in the size of the input, in which case it would not be a polynomial-size certificate?

However, it turns out that this claim can be turned into a proof of membership in NP. Specifically, prove the following statement.

If there is a string u that is a concatenation over both A and B, then there is such a string whose length is bounded by a polynomial in the sum of the lengths of the strings in A ∪ B.

Q24. Let G = (V, E) be a bipartite graph; suppose its nodes are partitioned into sets X and Y so that each edge has one end in X and the other in Y. We define an (a, b)-skeleton of G to be a set of edges E' ⊆ E so that at most a nodes in X are incident to an edge in E', and at least b nodes in Y are incident to an edge in E'.

Show that, given a bipartite graph G and numbers a and b, it is NP-complete to decide whether G has an (a, b)-skeleton.

Q25. For functions g1,..., gl, we define the function max(g1,..., gl) via [max(g1,..., gl)] (x) = max(g1(x),..., gl(x)).

Consider the following problem. You are given n piecewise linear, continuous functions f1,..., fn defined over the interval [0, t] for some integer t. You are also given an integer B. You want to decide: Do there exist k of the functions fi1,..., fik so that

0t [max(fi1,..., fik)] (x) dx ≥ B?

Prove that this problem is NP-complete.

Q26. You and a friend have been trekking through various far-off parts of the world and have accumulated a big pile of souvenirs. At the time you weren't really thinking about which of these you were planning to keep and which your friend was going to keep, but now the time has come to divide everything up.

Here's a way you could go about doing this. Suppose there are n objects, labeled 1,2,..., n, and object i has an agreed-upon value xi. (We could think of this, for example, as a monetary resale value; the case in which you and your friend don't agree on the value is something we won't pursue here.) One reasonable way to divide things would be to look for a partition of the objects into two sets, so that the total value of the objects in each set is the same.

This suggests solving the following Number Partitioning Problem. You are given positive integers x1,..., xn; you want to decide whether the numbers can be partitioned into two sets S1 and S2 with the same sum:

x_iS_1 xi = ∑x_jS_2xj.

Show that Number Partitioning is NP-complete.

Q27. Consider the following problem. You are given positive integers x1,..., xn, and numbers k and B. You want to know whether it is possible to partition the numbers {xi} into k sets S1,..., Sk so that the squared sums of the sets add up to at most B:

i=1k(∑x_jS_i xj)2 ≤ B.

Show that this problem is NP-complete.

Q28. The following is a version of the Independent Set Problem. You are given a graph G = (V, E) and an integer k. For this problem, we will call a set I ⊂ V strongly independent if, for any two nodes v, u ∈ I, the edge (v, u) does not belong to E, and there is also no path of two edges from u to v, that is, there is no node w such that both (u, w) ∈ E and (w, v) ∈ E. The Strongly Independent Set Problem is to decide whether G has a strongly independent set of size at least k.

Prove that the Strongly Independent Set Problem is NP-complete.

Q29. You're configuring a large network of workstations, which we'll model as an undirected graph G; the nodes of G represent individual workstations and the edges represent direct communication links. The workstations all need access to a common core database, which contains data necessary for basic operating system functions.

You could replicate this database on each workstation; this would make lookups very fast from any workstation, but you'd have to manage a huge number of copies. Alternately, you could keep a single copy of the database on one workstation and have the remaining workstations issue requests for data over the network G; but this could result in large delays for a workstation that's many hops away from the site of the database.

So you decide to look for the following compromise: You want to maintain a small number of copies, but place them so that any workstation either has a copy of the database or is connected by a direct link to a workstation that has a copy of the database. In graph terminology, such a set of locations is called a dominating set. Thus we phrase the Dominating Set Problem as follows. Given the network G, and a number k, is there a way to place k copies of the database at k different nodes so that every node either has a copy of the database or is connected by a direct link to a node that has a copy of the database?

Show that Dominating Set is NP-complete.

Q30. One thing that's not always apparent when thinking about traditional "continuous math" problems is the way discrete, combinatorial issues of the kind we're studying here can creep into what look like standard calculus questions.

Consider, for example, the traditional problem of minimizing a one-variable function like f (x) = 3+ x - 3x2 over an interval like x ∈ [0, 1]. The derivative has a zero at x = 1/6, but this in fact is a maximum of the function, not a minimum; to get the minimum, one has to heed the standard warning to check the values on the boundary of the interval as well. (The minimum is in fact achieved on the boundary, at x = 1.)

Checking the boundary isn't such a problem when you have a function in one variable; but suppose we're now dealing with the problem of minimizing a function in n variables x1, x2,..., xn over the unit cube, where each of x1, x2,..., xn ∈ [0, 1]. The minimum may be achieved on the interior of the cube, but it may be achieved on the boundary; and this latter prospect is rather daunting, since the boundary consists of 2n "corners" (where each xi is equal to either 0 or 1) as well as various pieces of other dimensions. Calculus books tend to get suspiciously vague around here, when trying to describe how to handle multivariable minimization problems in the face of this complexity.

It turns out there's a reason for this: Minimizing an n-variable function over the unit cube in n dimensions is as hard as an NP-complete problem. To make this concrete, let's consider the special case of polynomials with integer coefficients over n variables x1, x2,..., xn. To review some terminology, we say a monomial is a product of a real-number coefficient c and each variable xi raised to some nonnegative integer power ai; we can write this as cx1a1x2a2... xnan. (For example, 2x21x2x43 is a monomial.)

A polynomial is then a sum of a finite set of monomials. (For example, 2x21x2x43 + x1x3 - 6x22x23 is a polynomial.)

We define the Multivariable Polynomial Minimization Problem as follows: Given a polynomial in n variables with integer coefficients, and given an integer bound B, is there a choice of real numbers x1, x2,..., xn ∈ [0, 1] that causes the polynomial to achieve a value that is ≤ B?

Choose a problem Y from this chapter that is known to be NP-complete and show that Y ≤ P Multivariable Polynomial Minimization.

Q31. Given an undirected graph G = (V, E), a feedback set is a set X ⊆ V with the property that G - X has no cycles. The Undirected Feedback Set Problem asks: Given G and k, does G contain a feedback set of size at most k? Prove that Undirected Feedback Set is NP-complete.

Q32. The mapping of genomes involves a large array of difficult computational problems. At the most basic level, each of an organism's chromosomes can be viewed as an extremely long string (generally containing millions of symbols) over the four-letter alphabet {A, C, G, T}. One family of approaches to genome mapping is to generate a large number of short, overlapping snippets from a chromosome, and then to infer the full long string representing the chromosome from this set of overlapping sub-strings.

While we won't go into these string assembly problems in full detail, here's a simplified problem that suggests some of the computational difficulty one encounters in this area. Suppose we have a set S ={s1, s2,..., sn} of short DNA strings over a q-letter alphabet; and each string si has length 2l, for some number l ≥ 1. We also have a library of additional strings T = {t1, t2,..., tm} over the same alphabet; each of these also has length 2l. In trying to assess whether the string sb might come directly after the string sa in the chromosome, we will look to see whether the library T contains a string tk so that the first l symbols in tk are equal to the last l symbols in sa, and the last l symbols in tk are equal to the first l symbols in sb. If this is possible, we will say that tk corroborates the pair (sa, sb). (In other words, tk could be a snippet of DNA that straddled the region in which sb directly followed sa.)

Now we'd like to concatenate all the strings in S in some order, one after the other with no overlaps, so that each consecutive pair is corroborated by some string in the library T. That is, we'd like to order the strings in S as si1, si2,..., sin, where i1, i2,..., in is a permutation of {1,2,..., n}, so that for each j = 1,2,..., n - 1, there is a string tk that corroborates the pair (sij, sij+1). (The same string tk can be used for more than one consecutive pair in the concatenation.) If this is possible, we will say that the set S has a perfect assembly.

Given sets S and T, the Perfect Assembly Problem asks: Does S have a perfect assembly with respect to T? Prove that Perfect Assembly is NP-complete.

Q33. In a barter economy, people trade goods and services directly, without money as an intermediate step in the process. Trades happen when each party views the set of goods they're getting as more valuable than the set of goods they're giving in return. Historically, societies tend to move from barter-based to money-based economies; thus various online systems that have been experimenting with barter can be viewed as intentional attempts to regress to this earlier form of economic interaction. In doing this, they've rediscovered some of the inherent difficulties with barter relative to money-based systems. One such difficulty is the complexity of identifying opportunities for trading, even when these opportunities exist.

To model this complexity, we need a notion that each person assigns a value to each object in the world, indicating how much this object would be worth to them. Thus we assume there is a set of n people p1,..., pn, and a set of m distinct objects a1,..., am. Each object is owned by one of the people. Now each person pi has a valuation function vi, defined so that vi (aj) is a nonnegative number that specifies how much object aj is worth to pi -the larger the number, the more valuable the object is to the person. Note that everyone assigns a valuation to each object, including the ones they don't currently possess, and different people can assign very different valuations to the same object.

A two-person trade is possible in a system like this when there are people pi and pj, and subsets of objects Ai and Aj possessed by pi and pj, respectively, so that each person would prefer the objects in the subset they don't currently have. More precisely,

  • pi's total valuation for the objects in Aj exceeds his or her total valuation for the objects in Ai, and
  • pj's total valuation for the objects in Ai exceeds his or her total valuation for the objects in Aj.

(Note that Ai doesn't have to be all the objects possessed by pi (and likewise for Aj); Ai and Aj can be arbitrary subsets of their possessions that meet these criteria.)

Suppose you are given an instance of a barter economy, specified by the above data on people's valuations for objects. (To prevent problems with representing real numbers, we'll assume that each person's valuation for each object is a natural number.) Prove that the problem of determining whether a two-person trade is possible is NP-complete.

Q34. In the 1970s, researchers including Mark Granovetter and Thomas Schelling in the mathematical social sciences began trying to develop models of certain kinds of collective human behaviors: Why do particular fads catch on while others die out? Why do particular technological innovations achieve widespread adoption, while others remain focused on a small group of users? What are the dynamics by which rioting and looting behavior sometimes (but only rarely) emerges from a crowd of angry people? They proposed that these are all examples of cascade processes, in which an individual's behavior is highly influenced by the behaviors of his or her friends, and so if a few individuals instigate the process, it can spread to more and more people and eventually have a very wide impact. We can think of this process as being like the spread of an illness, or a rumor, jumping from person to person.

The most basic version of their models is the following. There is some underlying behavior (e.g., playing ice hockey, owning a cell phone, taking part in a riot), and at any point in time each person is either an adopter of the behavior or a non adopter. We represent the population by a directed graph G = (V, E) in which the nodes correspond to people and there is an edge (v, w) if person v has influence over the behavior of person w: If person v adopts the behavior, then this helps induce person w to adopt it as well. Each person w also has a given threshold θ(w) ∈ [0, 1] , and this has the following meaning: At any time when at least a θ(w) fraction of the nodes with edges to w are adopters of the behavior, the node w will become an adopter as well.

Note that nodes with lower thresholds are more easily convinced to adopt the behavior, while nodes with higher thresholds are more conservative. A node w with threshold θ(w) = 0 will adopt the behavior immediately, with no inducement from friends. Finally, we need a convention about nodes with no incoming edges: We will say that they become adopters if θ(w) = 0, and cannot become adopters if they have any larger threshold.

Given an instance of this model, we can simulate the spread of the behavior as follows.

Initially, set all nodes w with θ(w) = 0 to be adopters

(All other nodes start out as non-adopters)

Until there is no change in the set of adopters:

For each non-adopter w simultaneously:

If at least a θ(w) fraction of nodes with edges to w are adopters then

w becomes an adopter

Endif

Endfor

End

Output the final set of adopters

Note that this process terminates, since there are only n individuals total, and at least one new person becomes an adopter in each iteration.

Now, in the last few years, researchers in marketing and data mining have looked at how a model like this could be used to investigate "word-of-mouth" effects in the success of new products (the so-called viral marketing phenomenon). The idea here is that the behavior we're concerned with is the use of a new product; we may be able to convince a few key people in the population to try out this product, and hope to trigger as large a cascade as possible.

Concretely, suppose we choose a set of nodes S ⊆ V and we reset the threshold of each node in S to 0. (By convincing them to try the product, we've ensured that they're adopters.) We then run the process described above, and see how large the final set of adopters is. Let's denote the size of this final set of adopters by f (S) (note that we write it as a function of S, since it naturally depends on our choice of S). We could think of f (S) as the influence of the set S, since it captures how widely the behavior spreads when "seeded" at S.

The goal, if we're marketing a product, is to find a small set S whose influence f (S) is as large as possible. We thus define the Influence Maximization Problem as follows: Given a directed graph G = (V, E), with a threshold value at each node, and parameters k and b, is there a set S of at most k nodes for which f (S) ≥ b?

Prove that Influence Maximization is NP-complete.

Q35. Three of your friends work for a large computer-game company, and they've been working hard for several months now to get their proposal for a new game, Droid Trader! , approved by higher management. In the process, they've had to endure all sorts of discouraging comments, ranging from "You're really going to have to work with Marketing on the name" to "Why don't you emphasize the parts where people get to kick each other in the head?"

At this point, though, it's all but certain that the game is really heading into production, and your friends come to you with one final issue that's been worrying them: What if the overall premise of the game is too simple, so that players get really good at it and become bored too quickly?

It takes you awhile, listening to their detailed description of the game, to figure out what's going on; but once you strip away the space battles, kick-boxing interludes, and Stars-Wars-inspired pseudo-mysticism, the basic idea is as follows. A player in the game controls a spaceship and is trying to make money buying and selling droids on different planets.

There are n different types of droids and k different planets. Each planet p has the following properties: there are s(j , p) ≥ 0 droids of type j available for sale, at a fixed price of x(j , p) ≥ 0 each, for j = 1,2,..., n; and there is a demand for d(j , p) ≥ 0 droids of type j, at a fixed price of y(j , p) ≥ 0 each. (We will assume that a planet does not simultaneously have both a positive supply and a positive demand for a single type of droid; so for each j, at least one of s(j , p) or d(j , p) is equal to 0.)

The player begins on planet s with z units of money and must end at planet t; there is a directed acyclic graph G on the set of planets, such that s-t paths in G correspond to valid routes by the player. (G is chosen to be acyclic to prevent arbitrarily long games.) For a given s-t path P in G, the player can engage in transactions as follows. Whenever the player arrives at a planet p on the path P, she can buy up to s(j , p) droids of type j for x(j , p) units of money each (provided she has sufficient money on hand) and/or sell up to d(j , p) droids of type j for y(j , p) units of money each (for j = 1,2,..., n). The player's final score is the total amount of money she has on hand when she arrives at planet t. (There are also bonus points based on space battles and kick-boxing, which we'll ignore for the purposes of formulating this question.)

So basically, the underlying problem is to achieve a high score. In other words, given an instance of this game, with a directed acyclic graph G on a set of planets, all the other parameters described above, and also a target bound B, is there a path P in G and a sequence of transactions on P so that the player ends with a final score that is at least B? We'll call this an instance of the High-Score-on-Droid-Trader! Problem, or HSoDT! for short.

Prove that HSoDT! is NP-complete, thereby guaranteeing (assuming P = NP) that there isn't a simple strategy for racking up high scores on your friends' game.

Q36. Sometimes you can know people for years and never really understand them. Take your friends Raj and Alanis, for example. Neither of them is a morning person, but now they're getting up at 6 AM every day to visit local farmers' markets, gathering fresh fruits and vegetables for the new health-food restaurant they've opened, Chez Alanisse.

In the course of trying to save money on ingredients, they've come across the following thorny problem. There is a large set of n possible raw ingredients they could buy, I1, I2,..., In (e.g., bundles of dandelion greens, jugs of rice vinegar, and so forth). Ingredient Ij must be purchased in units of size s(j) grams (any purchase must be for a whole number of units), and it costs c(j) dollars per unit. Also, it remains safe to use for t(j) days from the date of purchase.

Now, over the next k days, they want to make a set of k different daily specials, one each day. (The order in which they schedule the specials is up to them.) The ith daily special uses a subset Si ⊆ {I1, I2,..., In} of the raw ingredients. Specifically, it requires a(i, j) grams of ingredient Ij. And there's a final constraint: The restaurant's rabidly loyal customer base only remains rabidly loyal if they're being served the freshest meals available; so for each daily special, the ingredients Si are partitioned into two subsets: those that must be purchased on the very day when the daily special is being offered, and those that can be used any day while they're still safe. (For example, the mesclun-basil salad special needs to be made with basil that has been purchased that day; but the arugula-basil pesto with Cornell dairy goat cheese special can use basil that is several days old, as long as it is still safe.)

This is where the opportunity to save money on ingredients comes up. Often, when they buy a unit of a certain ingredient Ij, they don't need the whole thing for the special they're making that day. Thus, if they can follow up quickly with another special that uses Ij but doesn't require it to be fresh that day, then they can save money by not having to purchase Ij again. Of course, scheduling the basil recipes close together may make it harder to schedule the goat cheese recipes close together, and so forth- this is where the complexity comes in.

So we define the Daily Special Scheduling Problem as follows: Given data on ingredients and recipes as above, and a budget x, is there a way to schedule the k daily specials so that the total money spent on ingredients over the course of all k days is at most x?

Prove that Daily Special Scheduling is NP-complete.

Q37. There are those who insist that the initial working title for Episode XXVII of the StarWars series was "P = NP"-but this is surely apocryphal. In any case, if you're so inclined, it's easy to find NP-complete problems lurking just below the surface of the original Star Wars movies.

Consider the problem faced by Luke, Leia, and friends as they tried to make their way from the Death Star back to the hidden Rebel base. We can view the galaxy as an undirected graph G = (V, E), where each node is a star system and an edge (u, v) indicates that one can travel directly from u to v. The Death Star is represented by a node s, the hidden Rebel base by a node t. Certain edges in this graph represent longer distances than others; thus each edge e has an integer length le ≥ 0. Also, certain edges represent routes that are more heavily patrolled by evil Imperial spacecraft; so each edge e also has an integer risk re ≥ 0, indicating the expected amount of damage incurred from special-effects-intensive space battles if one traverses this edge.

It would be safest to travel through the outer rim of the galaxy, from one quiet upstate star system to another; but then one's ship would run out of fuel long before getting to its destination. Alternately, it would be quickest to plunge through the cosmopolitan core of the galaxy; but then there would be far too many Imperial spacecraft to deal with. In general, for any path P from s to t, we can define its total length to be the sum of the lengths of all its edges; and we can define its total risk to be the sum of the risks of all its edges.

So Luke, Leia, and company are looking at a complex type of shortest-path problem in this graph: they need to get from s to t along a path whose total length and total risk are both reasonably small. In concrete terms, we can phrase the Galactic Shortest-Path Problem as follows: Given a setup as above, and integer bounds L and R, is there a path from s to t whose total length is at most L, and whose total risk is at most R?

Prove that Galactic Shortest Path is NP-complete.

38. Consider the following version of the Steiner Tree Problem, which we'll refer to as Graphical Steiner Tree. You are given an undirected graph G = (V, E), a set X ⊆ V of vertices, and a number k. You want to decide whether there is a set F ⊆ E of at most k edges so that in the graph (V, F), X belongs to a single connected component.

Show that Graphical Steiner Tree is NP-complete.

Q39. The Directed Disjoint Paths Problem is defined as follows. We are given a directed graph G and k pairs of nodes (s1, t1), (s2, t2),..., (sk, tk). The problem is to decide whether there exist node-disjoint paths P1, P2,..., Pk so that Pi goes from si to ti.

Show that Directed Disjoint Paths is NP-complete.

Q40. Consider the following problem that arises in the design of broadcasting schemes for networks. We are given a directed graph G = (V, E), with a designated node r ∈ V and a designated set of "target nodes" T ⊆ V -{r}. Each node v has a switching time sv, which is a positive integer.

At time 0, the node r generates a message that it would like every node in T to receive. To accomplish this, we want to find a scheme whereby r tells some of its neighbors (in sequence), who in turn tell some of their neighbors, and so on, until every node in T has received the message. More formally, a broadcast scheme is defined as follows. Node r may send a copy of the message to one of its neighbors at time 0; this neighbor will receive the message at time 1. In general, at time t ≥ 0, any node v that has already received the message may send a copy of the message to one of its neighbors, provided it has not sent a copy of the message in any of the time steps t - sv + 1, t - sv + 2,..., t - 1. (This reflects the role of the switching time; v needs a pause of sv - 1 steps between successive sending's of the message. Note that if sv = 1, then no restriction is imposed by this.)

The completion time of the broadcast scheme is the minimum time t by which all nodes in T have received the message. The Broadcast Time Problem is the following: Given the input described above, and a bound b, is there a broadcast scheme with completion time at most b?

Prove that Broadcast Time is NP-complete.

Q41. Given a directed graph G, a cycle cover is a set of node-disjoint cycles so that each node of G belongs to a cycle. The Cycle Cover Problem asks whether a given directed graph has a cycle cover.

(a) Show that the Cycle Cover Problem can be solved in polynomial time. (Hint: Use Bipartite Matching.)

(b) Suppose we require each cycle to have at most three edges. Show that determining whether a graph G has such a cycle cover is NP-complete.

Q42. Suppose you're consulting for a company in northern New Jersey that designs communication networks, and they come to you with the following problem. They're studying a specific n-node communication network, modeled as a directed graph G = (V, E). For reasons of fault tolerance, they want to divide up G into as many virtual "domains" as possible: A domain in G is a set X of nodes, of size at least 2, so that for each pair of nodes u, v ∈ X there are directed paths from u to v and v to u that are contained entirely in X.

Show that the following Domain Decomposition Problem is NP-complete. Given a directed graph G = (V, E) and a number k, can V be partitioned into at least k sets, each of which is a domain?

Request for Solution File

Ask an Expert for Answer!!
Data Structure & Algorithms: We can now define the diverse subset problem as follows
Reference No:- TGS01631191

Expected delivery within 24 Hours