This is a solution to problem 8 on the hard interview questions node. If you have not read the question, the following will make no sense to you:

For the first part, compute the sum of the numbers from 1 to N-1. Then sum the values in the array. The duplicate number is the difference between these sums.

For the second part, compute the sum of the numbers from 1 to N+1. Then sum the values in the array. The missing number is again the difference between these sums.

addition and subtraction are fine, but real men1 use xor...

See, the problem with summing the numbers from 1 to N-1, or 1 to N+1, is that if your list is, say, 1 to 1 billion, and you are dealing with 32-bit integers, you can add, add, add, right along, until you hit the ceiling of what a 32-bit integer can store, then POW! Like a clogged toilet when you try to flush it, overflow. And overflow is bad.

Ok, so overflow is bad. How do we avoid it? Well, you could start playing games like, "add the first element in the list, then subtract 1. Add the second element in the list, and subtract 2. Add ....", and so on. The problem is, no matter what algorithm you use, it is always possible to construct a case that results in over- (or under-) flow.

... Or, you could use the magical xor. If you can swap with it, why shouldn't you be able to find missing list elements with it? Well, you can, and it's no more complex to do that the addition/subtraction solution.

Step 1: XOR together all the numbers in the list.
Step 2: XOR that result with the XOR of all the numbers that you expect to be in the list. (1 to N-1, or 1 to N+1, depending on the problem.)
Step 3: there is no step 3; what you're left with is your answer.

How does it work?

Well, N xor N is 0, 0 xor N, and xor (the operation) is commutative, so after you xor together the range 1 to N-1 or N+1 and the list, you get a whole bunch of zeros, xored with the missing (or duplicate) element, you're done, and you haven't overflowed.

1or women.. whatever.

That's very neat. I'm not sure why it'd be a problem with overflow though. Just add the numbers mod m, where m is greater than the largest number, e.g. 2^32. For the first do sum(array) - sum(1..n-1), for the second do sum(1..n+1) - sum(array). Since e.g. the integers modulu n under addition is an abelian group, cancellation works as expected.

A less elegant, but completely different way that I thought of first would be to "attempt" a sort of count-sort of the numbers. When doing this for the first problem, at some point you're going to find the duplicate element. In Python (for 0..n-1):

def find_extra(a):
   for i in range(len(a)):
      while a[i] != i:
         j = a[i]
         a[i], a[j] = a[j], a[i] # swap
         if a[i] < i:
            return a[i]

The second problem could be sledge-hammered into a similar solution:

def find_missing(a):
   lastp = len(a)
   for i in range(len(a)):
      while a[i] != i:
         j = a[i]
         if a[i] == len(a):
            lastp = i
            break
         a[i], a[j] = a[j], a[i] # swap
   return lastp

Log in or register to write something here or to contact authors.