# Python Algorithms - C4 Induction and Recursion and Reduction

## Hujiawei Bujidao

Python算法设计篇(4) Chapter 4: Induction and Recursion and Reduction

You must never think of the whole street at once, understand? You must only concentrate on the next step, the next breath, the next stroke of the broom, and the next, and the next. Nothing else.
——Beppo Roadsweeper, in Momo by Michael Ende

• Reduction means transforming one problem to another. We normally reduce an unknown problem to one we know how to solve. The reduction may involve transforming both the input (so it works with the new problem) and the output (so it’s valid for the original problem).

Reduction(规约)意味着对问题进行转换，例如将一个未知的问题转换成我们能够解决的问题，转换的过程可能涉及到对问题的输入输出的转换。[问题规约在证明一个问题是否是NP完全问题时经常用到，如果我们能够将一个问题规约成一个我们已知的NP完全问题的话，那么这个问题也是NP完全问题]

• Induction (or, mathematical induction) is used to show that a statement is true for a large class of objects (often the natural numbers). We do this by first showing it to be true for a base case (such as the number 1) and then showing that it “carries over” from one object to the next (if it’s true for n –1, then it’s true for n).

Induction(推导)是一个数学意义上的推导，类似数学归纳法，主要是用来证明某个命题是正确的。首先我们证明对于基础情况(例如在k=1时)是正确的，然后证明该命题递推下去都是正确的(一般假设当k=n-1时是正确的，然后证明当k=n时也是正确的即可)

• Recursion is what happens when a function calls itself. Here we need to make sure the function works correctly for a (nonrecursive) base case and that it combines results from the recursive calls into a valid solution.

Recursion(递归)经常发生于一个函数调用自身的情况。递归函数说起来简单，但是实现不太容易，我们要确保对于基础情况(不递归的情况)能够正常工作，此外，对于递归情况能够将递归调用的结果组合起来得到一个有效的结果。

These are two major variations of reductions: reducing to a different problem and reducing to a shrunken version of the same.

[关于它们三个的关系的原文阐述：Induction and recursion are, in a sense, mirror images of one another, and both can be seen as examples of reduction. To use induction (or recursion), the reduction must (generally) be between instances of the same problem of different sizes. ]

[看了原书你会觉得，作者介绍算法的方式很特别，作者有提到他的灵感来自哪里：In fact, much of the material was inspired by Udi Manber’s wonderful paper “Using induction to design algorithms” from 1988 and his book from the following year, Introduction to Algorithms: A Creative Approach.]

[Induction is what you use to show that recursion is correct, and recursion is a very direct way of implementing most inductive algorithm ideas. However, rewriting the algorithm to be iterative can avoid the overhead and limitations of recursive functions in most (nonfunctional) programming languages. ]

def trav(seq, i=0):
if i == len(seq): return
#print seq[i]
trav(seq, i + 1)

trav(range(1000)) # RuntimeError: maximum recursion depth exceeded


def ins_sort_rec(seq, i):
if i == 0: return  # Base case -- do nothing
ins_sort_rec(seq, i - 1)  # Sort 0..i-1
j = i  # Start "walking" down
while j > 0 and seq[j - 1] > seq[j]:  # Look for OK spot
seq[j - 1], seq[j] = seq[j], seq[j - 1]  # Keep moving seq[j] down
j -= 1  # Decrement j

from random import randrange
seq = [randrange(1000) for i in range(100)]
ins_sort_rec(seq, len(seq)-1)


def ins_sort(seq):
for i in range(1, len(seq)):  # 0..i-1 sorted so far
j = i  # Start "walking" down
while j > 0 and seq[j - 1] > seq[j]:  # Look for OK spot
seq[j - 1], seq[j] = seq[j], seq[j - 1]  # Keep moving seq[j] down
j -= 1  # Decrement j

seq2 = [randrange(1000) for i in range(100)]
ins_sort(seq2)


def sel_sort_rec(seq, i):
if i == 0: return  # Base case -- do nothing
max_j = i  # Idx. of largest value so far
for j in range(i):  # Look for a larger value
if seq[j] > seq[max_j]: max_j = j  # Found one? Update max_j
seq[i], seq[max_j] = seq[max_j], seq[i]  # Switch largest into place
sel_sort_rec(seq, i - 1)  # Sort 0..i-1

seq = [randrange(1000) for i in range(100)]
sel_sort_rec(seq, len(seq)-1)

def sel_sort(seq):
for i in range(len(seq) - 1, 0, -1):  # n..i+1 sorted so far
max_j = i  # Idx. of largest value so far
for j in range(i):  # Look for a larger value
if seq[j] > seq[max_j]: max_j = j  # Found one? Update max_j
seq[i], seq[max_j] = seq[max_j], seq[i]  # Switch largest into place

seq2 = [randrange(1000) for i in range(100)]
sel_sort(seq2)


[这个问题的一个变种就是从一系列有依赖关系的集合中找到那个依赖关系最开始的元素，比如多线程环境下的线程依赖问题，后面将要介绍的拓扑排序是解决这类问题更实际的解法。A more down-to-earth version of the same problem would be examining a set of dependencies and trying to find a place to start. For example, you might have threads in a multithreaded application waiting for each other, with even some cyclical dependencies (so-called deadlocks), and you’re looking for one thread that isn’t waiting for any of the others but that all of the others are dependent on. ]

def naive_celeb(G):
n = len(G)
for u in range(n):  # For every candidate...
for v in range(n):  # For everyone else...
if u == v: continue  # Same person? Skip.
if G[u][v]: break  # Candidate knows other
if not G[v][u]: break  # Other doesn't know candidate
else:
return u  # No breaks? Celebrity!
return None  # Couldn't find anyone


from random import *
n = 100
G = [[randrange(2) for i in range(n)] for i in range(n)]
c = 57 # For testing
for i in range(n):
G[i][c] = True
G[c][i] = False

print naive_celeb(G) #57


def celeb(G):
n = len(G)
u, v = 0, 1  # The first two
for c in range(2, n + 1):  # Others to check
if G[u][v]:
u = c  # u knows v? Replace u
else:
v = c  # Otherwise, replace v
if u == n:
c = v  # u was replaced last; use v
else:
c = u  # Otherwise, u is a candidate
for v in range(n):  # For everyone else...
if c == v: continue  # Same person? Skip.
if G[c][v]: break  # Candidate knows other
if not G[v][c]: break  # Other doesn't know candidate
else:
return c  # No breaks? Celebrity!
return None  # Couldn't find anyone


[看书看到这里时，我想起了另一个看起来很相似的问题，从n个元素中找出最大值和最小值。如果我们单独地来查找最大值和最小值，共需要(2n-2)次比较(也许你觉得还可以少几次，但都还是和2n差不多对吧)，但是，如果我们成对来处理，首先比较第一个元素和第二个元素，较大的那个作为当前最大值，较小的那个作为当前最小值(如果n是奇数的话，为了方便可以直接令第一个元素既是最大值又是最小值)，然后向后移动，每次取两个元素出来先比较，较小的那个去和当前最小值比较，较大的那个去和当前最大值比较，这样的策略至多需要 $3\lfloor \frac{n}{2} \rfloor$ 次比较。两个问题虽然完全没关系，但是解决方式总有那么点千丝万缕有木有？]

def naive_topsort(G, S=None):
if S is None: S = set(G)  # Default: All nodes
if len(S) == 1: return list(S)  # Base case, single node
v = S.pop()  # Reduction: Remove a node
seq = naive_topsort(G, S)  # Recursion (assumption), n-1
min_i = 0
for i, u in enumerate(seq):
if v in G[u]: min_i = i + 1  # After all dependencies
seq.insert(min_i, v)
return seq

G = {'a': set('bf'), 'b': set('cdf'),'c': set('d'), 'd': set('ef'), 'e': set('f'), 'f': set()}
print naive_topsort(G) # ['a', 'b', 'c', 'd', 'e', 'f']


def topsort(G):
count = dict((u, 0) for u in G)  # The in-degree for each node
for u in G:
for v in G[u]:
count[v] += 1  # Count every in-edge
Q = [u for u in G if count[u] == 0]  # Valid initial nodes
S = []  # The result
while Q:  # While we have start nodes...
u = Q.pop()  # Pick one
S.append(u)  # Use it as first of the rest
for v in G[u]:
count[v] -= 1  # "Uncount" its out-edges
if count[v] == 0:  # New valid start nodes?
Q.append(v)  # Deal with them next
return S


[扩展知识：有意思的是，拓扑排序还和Python Method Resolution Order 有关，也就是用来确定某个方法是应该调用该实例的还是该实例的父类的还是继续往上调用祖先类的对应方法。对于单继承的语言这个很容易，顺着继承链一直往上找就行了，但是对于Python这类多重继承的语言则不简单，它需要更加复杂的策略，Python中使用了C3 Method Resolution Order，我不懂，想要了解的可以查看 on python docs]

1.Strong Assumptions

2.Invariants and Correctness

(1). Use induction to show that it is, in fact, true after each iteration.
(2). Show that we’ll get the correct answer if the algorithm terminates.
(3). Show that the algorithm terminates.

4.Reduction + Contraposition = Hardness Proof

“fast + fast = fast.” 的含义是：fast readuction + fast solution to B = fast solution to A

• If you can (easily) reduce A to B, then B is at least as hard as A.

• If you want to show that X is hard and you know that Y is hard, reduce Y to X.

(1)Make sure you really understand the problem.

What is the input? The output? What’s the precise relationship between the two? Try to represent the problem instances as familiar structures, such as sequences or graphs. A direct, brute-force solution can sometimes help clarify exactly what the problem is.

(2)Look for a reduction.

Can you transform the input so it works as input for another problem that you can solve? Can you transform the resulting output so that you can use it? Can you reduce an instance if size n to an instance of size k < n and extend the recursive solution (inductive hypothesis) back to n?

(3)Are there extra assumptions you can exploit?

Integers in a fixed value range can be sorted more efficiently than arbitrary values. Finding the shortest path in a DAG is easier than in an arbitrary graph, and using only non-negative edge weights is often easier than arbitrary edge weights.

Write a function for generating random DAGs. Write an automatic test that checks that topsort gives a valid orderings, using your DAG generator.

You could generate DAGs by, for example, randomly ordering the nodes, and add a random number of forward-pointing edges to each of them.

Redesign topsort so it selects the last node in each iteration, rather than the first.

This is quite similar to the original. You now have to maintain the out-degrees of the remaining nodes, and insert each node before the ones you have already found. (Remember not to insert anything in the beginning of a list, though; rather, append, and then reverse it at the end, to avoid a quadratic running time.)

[注意是使用append然后reverse，而不要使用insert]