http://www.reddit.com/r/dailyprogrammer/c

*The problem:*

there are N (<=10) 'interest points' on an infinite 2-D grid. The player [...] can move one unit at a time on this grid towards one of these interest points. The interest point they are told to move to is chosen at random. They also start at a random interest point. It is important to note that the player cannot move diagonally in any sense, the player must move parallel to one of the axis.

Given a set of interest points specified as 2-D integer coordinates, what is the minimum number of steps a player could take to get to them all and win the game? (order doesn't matter). What is the maximum number of steps? Assume the player heads optimally towards whichever interest point is randomly chosen until he gets it.

For reference, my solution produces a maximum of 1963 steps and a minimum of 617 steps on this input:62 -67 4 -71 -86 71 -25 -53 -23 70 46 -34 -92 -62 -15 89 100 42 -4 43EDIT: To clarify this a bit, what I mean is that you want to find the 'best possible game' and 'worst possible game'. So this min/max is over all possible starting locations. Also, you do not have to get back to the starting point.

This problem is NP-hard, but despite being classified as "difficult" this isn't a complex problem. It's an example of a more general class of problems, so solving it gives ways to solve several other similar problems.

-----------------------------------

One simple and very short way to solve the problem is to look at all the permutations of the points, compute the path length for each one, and take the shortest and longest of them. Python code (I am using Python 2.6.6), it's slow, runtime about 70 seconds, to find the solution 617 1963:

from itertools import permutations points = [map(int, row.split()) for row in open("points.txt")] tot = lambda p: sum(abs(a[0] - b[0]) + abs(a[1] - b[1]) for a,b in zip(p, p[1:])) print min(tot(path) for path in permutations(points)) print max(tot(path) for path in permutations(points))

"points.txt" is a text file that contains all the point coordinates, two coordinate each line.

-----------------------------------

A performance problem of the precedent basic permutation search algorithm is that the path length is computed from scratch for each permutation, so its computational complexity is about O(n! * n).

To improve the situations the Steinhaus-Johnson-Trotter algorithm generates successive permutations swapping adjacent items. This allows a quicker computation of the path length. This gives an O(n!) algorithm. This finds the solution in about 50.5 seconds:

Permutation code adapted from:

http://rosettacode.org/wiki/Permutations

from operator import itemgetter def spermutations(n): sign = 1 p = [[i, 0 if i == 0 else -1] for i in xrange(n)] while any(pp[1] for pp in p): i1, (n1, d1) = max(((i, pp) for i, pp in enumerate(p) if pp[1]), key=itemgetter(1)) sign *= -1 if d1 == -1: i2 = i1 - 1 p[i1], p[i2] = p[i2], p[i1] if i2 > i1: yield i1, i2 else: yield i2, i1 if i2 == 0 or p[i2 - 1][0] > n1: p[i2][1] = 0 elif d1 == 1: i2 = i1 + 1 p[i1], p[i2] = p[i2], p[i1] if i2 > i1: yield i1, i2 else: yield i2, i1 if i2 == n - 1 or p[i2 + 1][0] > n1: p[i2][1] = 0 for i3, pp in enumerate(p): n3, d3 = pp if n3 > n1: pp[1] = 1 if i3 < i2 else -1 def dist(p, q): return abs(p[0] - q[0]) + abs(p[1] - q[1]) def total_dist(p, plen, i, j): # Only one adjacent pair is swapped, so computing # the new total distance is very fast. if i > 0: new_plen = plen - dist(p[i-1], p[i]) + dist(p[i-1], p[j]) if j < len(p) - 1: new_plen = plen - dist(p[j], p[j+1]) + dist(p[i], p[j+1]) p[i], p[j] = p[j], p[i] return new_plen def main(): p = [map(int, row.split()) for row in open("points.txt")] plen = sum(dist(a, b) for a,b in zip(p, p[1:])) # path total length maxl = minl = plen for i, j in spermutations(len(p)): plen = total_dist(p, plen, i, j) if plen < minl: minl = plen if plen > maxl: maxl = plen print minl, maxl main()-----------------------------------

Another common way to think about this problem is to perform a search on the tree that represents the problem space. This is a very simple solution, it stores the current best solutions in two global variables, and goes from the root of the tree, where no points are used, toward the leaves of the tree, where more and more points are used. In the leaves all points are used, and it updates the min/max lengths.

At the leaves there is no need to compute the length of the root-leaf path, because at each recursive call it computes the current total, this saves a lot of computations for the path length, because it re-uses the computation of the shared path length.

To know what points are already used in each node of the path it uses a set of the currently avilable points, and it creates a new set every time it goes to a new node, this is not time and memory efficient:

def dist((x0, y0), (x1, y1)): return abs(x0 - x1) + abs(y0 - y1) min_d = 1e70 max_d = 0 def min_dist(p0, ps, tot): global min_d if not ps: min_d = min(min_d, tot) else: for p in ps: min_dist(p, ps.difference([p]), tot + dist(p0, p)) def max_dist(p0, ps, tot): global max_d if not ps: max_d = max(max_d, tot) else: for p in ps: max_dist(p, ps.difference([p]), tot + dist(p0, p)) def main(): ps = set(tuple(map(int, row.split())) for row in open("points_small.txt")) for p in ps: min_dist(p, ps.difference([p]), 0) max_dist(p, ps.difference([p]), 0) print min_d, max_d main()

"points_small.txt" is a smaller problem, with just 8 points, because this program is quite slow on the full problem composed of 10 points:

62 -67 4 -71 -86 71 -25 -53 -23 70 46 -34 -92 -62 -15 89It finds the solution: 464 1459.

-----------------------------------

*Cosmologicon*on Reddit has written a shorter but logically similar program, that here I have modified and simplified to increase clarity:

http://www.reddit.com/r/dailyprogrammer/c

Like the precedent program, this program explores every node of the search tree, but it avoids the need for the two global variables because it returns *partial* path lengths from the m*_dist functions.

At the end, when the set of points is empty, we are in a leaf, and it returns a path length 0. This will be added when the recursive calls return, and in the end the whole path lengths will be computed. So to avoid the global variables the computation of the path length is kind of the opposite of the precedent solution.

This program is almost functional, and it's shorter than the precedent one.

def dist((x0, y0), (x1, y1)): return abs(x0 - x1) + abs(y0 - y1) def min_dist(p0, ps): if not ps: return 0 return min(dist(p0, p) + min_dist(p, ps.difference([p])) for p in ps) def max_dist(p0, ps): if not ps: return 0 return max(dist(p0, p) + max_dist(p, ps.difference([p])) for p in ps) def main(): ps = set(tuple(map(int, row.split())) for row in open("points_small.txt")) print min(min_dist(p, ps.difference([p])) for p in ps) print max(max_dist(p, ps.difference([p])) for p in ps) main()

-----------------------------------

The precedent program is quite slow on the full "points.txt" dataset. But the use of memoization is a simple way to greatly reduce the amount of those computations (this program is logically closer to the Cosmologicon version).

Memoization means storing the input-output pairs inside a hash, so partial computations are done only once.

The memoize decorator is a Python Recipe by Ahmed El Deeb:

http://code.activestate.com/recipes/5772

def memoize(f): cache = {} def memf(*x): if x not in cache: cache[x] = f(*x) return cache[x] return memf def dist((x0, y0), (x1, y1)): return abs(x0 - x1) + abs(y0 - y1) @memoize def min_dist(p0, ps): if not ps: return 0 return min(dist(p0, p) + min_dist(p, ps.difference([p])) for p in ps) @memoize def max_dist(p0, ps): if not ps: return 0 return max(dist(p0, p) + max_dist(p, ps.difference([p])) for p in ps) def main(): ps = frozenset(tuple(map(int, row.split())) for row in open("points.txt")) print min(min_dist(p, ps.difference([p])) for p in ps) print max(max_dist(p, ps.difference([p])) for p in ps) main()

As

*Cosmologicon*I have used frozensets instead of sets to allow the memoization, because Python sets are not hashable.

This program finds the 617 1963 solution of the whole problem with 10 points in about 0.3 seconds. So here the memoization makes a huge difference.

-----------------------------------

Now I'll optimize the precedent program more and more.

A first performance optimization is to scan the (memoized) tree only once, instead of twice, this makes the code longer, but almost halves the run-time:

def memoize(f): cache = {} def memf(*x): if x not in cache: cache[x] = f(*x) return cache[x] return memf def dist((x0, y0), (x1, y1)): return abs(x0 - x1) + abs(y0 - y1) @memoize def min_max_dist(p0, ps): if not ps: return (0, 0) min_d = 1e70 max_d = 0 for p in ps: m1, m2 = min_max_dist(p, ps.difference([p])) d = dist(p, p0) min_d = min(min_d, m1 + d) max_d = max(max_d, m2 + d) return min_d, max_d def main(): ps = frozenset(tuple(map(int, row.split())) for row in open("points.txt")) min_d = 1e70 max_d = 0 for p in ps: m1, m2 = min_max_dist(p, ps.difference([p])) min_d = min(min_d, m1) max_d = max(max_d, m2) print min_d, max_d main()-----------------------------------

One next step is to avoid the creation of all those frozensets, and just keep a bit set to represent what are the points currently used. A compact way to represent a bit set is to manage the bits of a single unsigned integer, here named mask.

The points are now kept in just a list. The min_max_dist() needs ps to know what points to work on, but ps never changes, so memoizing it again and again is a waste of time and memory, so now ps is a global variable.

This program is now faster, so it can handle a bigger dataset, so I have also added five more random points in [-100,100] to ps. This program finds the solution 717 3049 in less than 7.4 seconds.

def memoize(f): cache = {} def memf(*x): if x not in cache: cache[x] = f(*x) return cache[x] return memf def dist((x0, y0), (x1, y1)): return abs(x0 - x1) + abs(y0 - y1) ps = [(62, -67), (4, -71), (-86, 71), (-25, -53), (-23, 70), (46, -34), (-92, -62), (-15, 89), (100, 42), (-4, 43), (21, 59), (86, -25), (93, -52), (-41, -48), (-45, 76)] def is_bit_set(mask, i): return mask & (1 << i) def reset_bit(mask, i): return mask & ~(1 << i) @memoize def min_max_dist(p0, mask): if not mask: return (0, 0) min_d = 1e70 max_d = 0 for i, p in enumerate(ps): if is_bit_set(mask, i): m1, m2 = min_max_dist(p, reset_bit(mask, i)) d = dist(p0, p) min_d = min(min_d, m1 + d) max_d = max(max_d, m2 + d) return min_d, max_d def main(): mask = (2 ** len(ps)) - 1 min_d = 1e70 max_d = 0 for i, p in enumerate(ps): m1, m2 = min_max_dist(p, reset_bit(mask, i)) min_d = min(min_d, m1) max_d = max(max_d, m2) print min_d, max_d main()-----------------------------------

Another optimization step is to implement the memoization in a simpler way, using a matrix that stores the [min,max] values. This turns the hash access into a faster array lookup, and it saves memory because it needs to contain only the values. The run-time for the 15 points is about 6 seconds.

def dist((x0, y0), (x1, y1)): return abs(x0 - x1) + abs(y0 - y1) ps = [(62, -67), (4, -71), (-86, 71), (-25, -53), (-23, 70), (46, -34), (-92, -62), (-15, 89), (100, 42), (-4, 43), (21, 59), (86, -25), (93, -52), (-41, -48), (-45, 76)] cache = [] EMPTY_CACHE = -1 def is_bit_set(mask, i): return mask & (1 << i) def reset_bit(mask, i): return mask & ~(1 << i) def min_max_dist(idx, mask): if not mask: return (0, 0) if cache[idx][mask][0] != EMPTY_CACHE: return cache[idx][mask] min_d = 1e70 max_d = 0 for i, p in enumerate(ps): if is_bit_set(mask, i): m1, m2 = min_max_dist(i, reset_bit(mask, i)) d = dist(ps[idx], p) min_d = min(min_d, m1 + d) max_d = max(max_d, m2 + d) cache[idx][mask] = (min_d, max_d) return min_d, max_d def main(): global cache cache = [[[EMPTY_CACHE,EMPTY_CACHE]] * (2 ** len(ps)) for _ in xrange(len(ps))] mask = (2 ** len(ps)) - 1 min_d = 1e70 max_d = 0 for i, p in enumerate(ps): m1, m2 = min_max_dist(i, reset_bit(mask, i)) min_d = min(min_d, m1) max_d = max(max_d, m2) print min_d, max_d main()-----------------------------------

Another simple optimization is to pre-compute a matrix with all distances between each pair of points. Now the dist function is not needed, and the global ps is not needed, just the dists matrix is needed, beside the cache (but they don't need to be globals, they are also fine as two extra arguments for min_max_dist). The run-time for 15 points is now less than 4.8 seconds (only 1.6 seconds with Psyco):

cache = [] dists = [] EMPTY_CACHE = -1 def is_bit_set(mask, i): return mask & (1 << i) def reset_bit(mask, i): return mask & ~(1 << i) def min_max_dist(idx, mask): if not mask: return (0, 0) if cache[idx][mask][0] != EMPTY_CACHE: return cache[idx][mask] min_d = 1e70 max_d = 0 for i, d in enumerate(dists): if is_bit_set(mask, i): m1, m2 = min_max_dist(i, reset_bit(mask, i)) min_d = min(min_d, m1 + d[idx]) max_d = max(max_d, m2 + d[idx]) cache[idx][mask] = (min_d, max_d) return min_d, max_d def main(): global cache, dists ps = [(62, -67), (4, -71), (-86, 71), (-25, -53), (-23, 70), (46, -34), (-92, -62), (-15, 89), (100, 42), (-4, 43), (21, 59), (86, -25), (93, -52), (-41, -48), (-45, 76)] cache = [[[EMPTY_CACHE,EMPTY_CACHE]] * (2 ** len(ps)) for _ in xrange(len(ps))] mask = (2 ** len(ps)) - 1 dists = [[0] * len(ps) for _ in xrange(len(ps))] for i in xrange(len(ps)): for j in xrange(i): dists[i][j] = dists[j][i] = \ abs(ps[i][0] - ps[j][0]) + abs(ps[i][1] - ps[j][1]) min_d = 1e70 max_d = 0 for i, p in enumerate(ps): m1, m2 = min_max_dist(i, reset_bit(mask, i)) min_d = min(min_d, m1) max_d = max(max_d, m2) print min_d, max_d main()-----------------------------------

Now to speed up the program I switch to the D language, a lower level language that is statically typed and it's quite efficient (I am using DMD 2.060, with -O -release -inline).

This program is very similar to the last Python version. The only relevant difference is that the [min,max] paits in cache are now a D tuple of two short. This means those min-max pais are a value (instead of a tuple object as in Python), and being short this pair requires just 4 bytes. This saves a lot of memory, and reducing cache misses increases the performance too.

This program is much faster (it runs on the 15 points in about 0.13 seconds, this time includes the constant time to start any D program), so I have added five more points. in [-100,100]. This program solves the problem with 20 points in less than 10 seconds, the answer is 801 3928.

import std.typecons; alias int Coord; alias Tuple!(Coord, Coord) Point; alias uint Tmask; alias short Tdist; __gshared Tdist[][] dists; __gshared Tuple!(Tdist, Tdist)[][] cache; enum Tdist emptyCache = -1; Tuple!(Tdist, Tdist) minMaxDist(in size_t idx, in Tmask mask) nothrow { if (!mask) return typeof(return).init; if (cache[idx][mask][0] != emptyCache) return cache[idx][mask]; Tdist min = Tdist.max; Tdist max = Tdist.min; foreach (i, d; dists) if (mask & (1 << i)) { immutable md = minMaxDist(i, mask & ~(1 << i)); immutable dist1 = cast(Tdist)(md[0] + d[idx]); if (dist1 < min) min = dist1; immutable dist2 = cast(Tdist)(md[1] + d[idx]); if (dist2 > max) max = dist2; } immutable result = tuple(min, max); cache[idx][mask] = result; return result; } void main() { import std.stdio, std.math; alias Point P; immutable ps = [P(62, -67), P(4, -71), P(-86, 71), P(-25, -53), P(-23, 70), P(46, -34), P(-92, -62), P(-15, 89), P(100, 42), P(-4, 43), P(21, 59), P(86, -25), P(93, -52), P(-41, -48), P(-45, 76), P(85, -43), P(-69, 64), P(-50, -32), P(48, -15), P(-14, 38)]; writeln(ps.length, " points."); assert(ps.length <= Tmask.sizeof * 8); Tmask mask = 0; foreach (i; 0 .. ps.length) mask |= 1 << i; dists = new typeof(dists)(ps.length, ps.length); foreach (i1, p1; ps) foreach (i2, p2; ps) dists[i1][i2] = cast(Tdist)(abs(p1[0] - p2[0]) + abs(p1[1] - p2[1])); cache = new typeof(cache)(ps.length, 2 ^^ ps.length); foreach (row; cache) row[] = tuple(emptyCache, emptyCache); Tdist min = Tdist.max; Tdist max = Tdist.min; foreach (i; 0 .. ps.length) { immutable dMinMax = minMaxDist(i, mask & ~(1 << i)); if (dMinMax[0] < min) min = dMinMax[0]; if (dMinMax[1] > max) max = dMinMax[1]; } writeln("Min, max: ", min, " ", max); }-----------------------------------

As next optimization step I have ported the code to C, a lower level language (in theory D has about the same performance as C if you use only C features, but I am compiling the D code with DMD that has a back-end that's less efficient than the GCC back-end. LDC2 and GDC probably do better).

Following the advice of the Reddit user

*Ledrug*, in this C program the cache matrix is now fully static (instead of being represented as dynamic array of dynamic arrays as in D), and its indexes are swapped. The run-time is now about 4.95 seconds for 20 points (GCC 4.7.1, with -std=c99 -Ofast -flto -s):

#include "stdio.h" #include "stdlib.h" #include "stdint.h" typedef int Coord; typedef uint32_t Tmask; typedef int16_t Tdist; typedef struct { Coord x, y; } P; typedef struct { Tdist min, max; } MinMax; const P ps[] = {{62, -67}, {4, -71}, {-86, 71}, {-25, -53}, {-23, 70}, {46, -34}, {-92, -62}, {-15, 89}, {100, 42}, {-4, 43}, {21, 59}, {86, -25}, {93, -52}, {-41, -48}, {-45, 76}, {85, -43}, {-69, 64}, {-50, -32}, {48, -15}, {-14, 38}}; #define N (sizeof(ps) / sizeof(ps[0])) Tdist dists[N][N]; MinMax cache[1 << N][N]; #define emptyCache -1 Coord abs(const Coord a) { return a >= 0 ? a : -a; } MinMax minMaxDist(const size_t idx, const Tmask mask) { if (!mask) return (MinMax){0, 0}; if (cache[mask][idx].min != emptyCache) return cache[mask][idx]; Tdist min = INT16_MAX; Tdist max = INT16_MIN; for (size_t i = 0; i < N; i++) { const Tdist *d = dists[i]; if (mask & (1 << i)) { const MinMax md = minMaxDist(i, mask & ~(1 << i)); const Tdist dist1 = (Tdist)(md.min + d[idx]); if (dist1 < min) min = dist1; const Tdist dist2 = (Tdist)(md.max + d[idx]); if (dist2 > max) max = dist2; } } const MinMax result = {min, max}; cache[mask][idx] = result; return result; } int main() { printf("%u points.\n", N); Tmask mask = 0; for (size_t i = 0; i < N; i++) mask |= 1 << i; for (size_t i = 0; i < N; i++) for (size_t j = 0; j < N; j++) dists[i][j] = (Tdist)(abs(ps[i].x - ps[j].x) + abs(ps[i].y - ps[j].y)); for (size_t i = 0; i < (1 << N); i++) for (size_t j = 0; j < N; j++) cache[i][j] = (MinMax){emptyCache, emptyCache}; Tdist min = INT16_MAX; Tdist max = INT16_MIN; for (size_t i = 0; i < N; i++) { const MinMax dMinMax = minMaxDist(i, mask & ~(1 << i)); if (dMinMax.min < min) min = dMinMax.min; if (dMinMax.max > max) max = dMinMax.max; } printf("Min, max: %d %d\n", min, max); return 0; }

Note: at first in the C code I didn't introduce the static matrix optimization because this C code is a D translation, and in D you can't define a static matrix that large. My mind was anchored to the D limits, so I didn't remember that the C compiler doesn't have that limit.

A way to solve this problem in D, and keep efficiency, is to allocate on the heap the cache as a single array MinMax[N * (1 << N)].

-----------------------------------

*Ledrug*has created a C program that is three times faster than the last program, here with reformatting and small changes by me (run-time just 1.65 seconds, it also uses less RAM, this means this program is able to face larger input problems, maybe with 23 or 24 points):

#include "stdio.h" #include "stdlib.h" #include "stdint.h" struct { int x, y; } const co[] = { {62, -67}, { 4, -71}, {-86, 71}, {-25, -53}, {-23, 70}, {46, -34}, {-92, -62}, {-15, 89}, {100, 42}, { -4, 43}, {21, 59}, { 86, -25}, { 93, -52}, {-41, -48}, {-45, 76}, {85, -43}, {-69, 64}, {-50, -32}, { 48, -15}, {-14, 38} }; #define N (sizeof(co) / sizeof(co[0])) typedef uint16_t Sint; typedef struct { Sint min, max; } MinMax; uint32_t dist[N][N]; MinMax cache[N + (N << (N - 1))]; MinMax* pos[1 << N]; size_t masks[1 << N]; void calc_mask(const size_t mask) { MinMax *out = pos[mask]; for (size_t i = N; i--; ) { if (!(mask & (1U << i))) continue; const size_t m = mask & ~(1U << i); MinMax *in = pos[m]; MinMax r = {UINT16_MAX, 0}; for (size_t j = N; j--; ) { if (!(m & (1U << j))) continue; const uint32_t d = dist[i][j]; MinMax sub = *in; in++; sub.max += d; if (sub.max > r.max) r.max = sub.max; sub.min += d; if (sub.min < r.min) r.min = sub.min; } *out = r; out++; } } int main() { for (size_t i = 0; i < N; i++) for (size_t j = 0; j < i; j++) dist[i][j] = dist[j][i] = abs(co[i].x - co[j].x) + abs(co[i].y - co[j].y); { size_t *ptr = &masks[0]; for (size_t i = 0; i < (1 << N); i++) { const size_t m = masks[i]; for (size_t j = 1 << (N - 1); j != 0; j >>= 1) { if (!(j & m) && !pos[j | m]) { pos[j | m] = &cache[0]; *ptr = j | m; ptr++; } } } } { MinMax *p = &cache[0] + N; for (size_t i = 0; i < (1 << N); i++) { const size_t m = masks[i]; pos[m] = p; for (size_t j = 1 << (N - 1); j != 0; j >>= 1) if ((j & m)) p++; } } for (size_t i = N; i < (1 << N); i++) calc_mask(masks[i]); const MinMax *p = pos[(1 << N) - 1]; Sint min = UINT16_MAX; Sint max = 0; for (size_t i = 0; i < N; i++) { if (p[i].max > max) max = p[i].max; if (p[i].min < min) min = p[i].min; } printf("%d %d\n", min, max); return 0; }

This is how

*Ledrug*explains this version and then explains why it uses less memory:

If you think the original method as top-down (to calculate a bitmask with m set bits, go down to all its m-1 bit sub masks), this one does it bottom-up. It first generate all possible bit masks and arrange them in the order of increasing number of set bits, and have each bit pattern with m bits point to a consecutive block of m results in the cache. All results are then calculated according to this mask order, so when a bit pattern is needed, all its sub patterns are already there, and this saves the recursive function call overhead. Because results with the same bit mask are all lumped together, cache behavior is pretty good. Smaller memory usage is only because the original method was wasting half of the cache space: cache[n][mask] is not a valid combination if mask has the n-th bit set.