/random7

Golang implementation of a programming job interview puzzle, plus bonus analysis

Primary LanguageGo

Daily Coding Problem: Problem #761 [Easy]

This problem was asked by Two Sigma.

Using a function rand7() that returns an integer from 1 to 7 (inclusive) with uniform probability, implement a function rand5() that returns an integer from 1 to 5 (inclusive).

Build and Run

$ go build r7.go
$ ./r7 | sort | uniq -c
 100338 0
  99750 1
 100066 2
  99705 3
 100141 4
$

r7 invokes rand5 500,000 times. That's as close to uniform as you could expect.

Analysis

Code

I think solving the problem hinges on recognizing that rand7 returns uniformly-distributed numbers. The rand5 function should call rand7 while that return value is greater than 4. It should return whatever it gets that's the first value 4 or under.

If rand7 returns uniformly distributed numbers, so will such a rand5 function, it just "cuts off" the uniformly-distributed-values that don't fit the 0-4 range.

My algorithm doesn't have a bounded run time: it's entirely possible to get a series of values of 5 or 6 from the rand7 function, which would cause my function to never return. It's certainly possible to calculate that rand5 returns 5/7 of the time after 1 invocation of rand7, 2/7*5/7 of the time after 2 invocations, but I did an empirical investigation by writing a version that outputs the number of invocations of rand7 it did to get to a 0-4 valued output of rand5

$ go build r7a.go
$ ./r7a | sort -k1.1n | uniq -c | awk '{printf "%4d  %.5f\n", $2, $1/500000.}'
rand7 invocations Proportion of rand5 invocations Running proportion total
1 0.71435 0.71435
2 0.20300 0.91736
3 0.05899 0.97634
4 0.01681 0.99315
5 0.00492 0.99807
6 0.00138 0.99945
7 0.00038 0.99983
8 0.00014 0.99997
9 0.00003 0.99999
10 0.00000 1.00000
11 0.00000 1.00000

5/7 equates to approximately .71428, 2/7*5/7 is approximately .20317, 2/7*2/7*5/7 is approximately .05830, and so forth. It checks out. Seems like there's a vanishingly small number of invocations of rand5 that will invoke rand7 more than, say, 15 times.

Nicola Moro's solution

I don't like Moro's solution at all: Moro ends up creating an array based on the rand7 function's output, then calling a random-choice function on that array. Why bother calling rand7?

Interview Analysis

Asking a problem that hinges on recognizing some specific aspect of a non-programming question seems absurd. The coding involved is very easy, seeing it done won't get the interviewer closer to figuring out if the candidate is any good.

Similarly, the candidate can't fault themself for not figuring out some obscure factoid while in a very stressful situation.

In sum, don't use this question in an interview. Don't feel bad if you get this question in an interview then crash and burn. Not your fault.


Same problem in reverse

Daily Coding Problem: Problem #1064 [Easy]

This problem was asked by Two Sigma.

Using a function rand5() that returns an integer from 1 to 5 (inclusive) with uniform probability, implement a function rand7() that returns an integer from 1 to 7 (inclusive).

Analysis

Problem #1064 is the opposite of #761. I found it considerably harder than #761.

My first solution is based on the observation that the problem statement says nothing about the distribution of integers from 1 to 7. I doubt this is correct.

Stack Exchange answer

Reddit answers

My second solution uses one of the Stack Exchange answers. It uses 2 invocations of rand5 as indexes into a 5x5 array. Values of the array are 1 through 7, except for 4 zeros. If the code indexes a zero array value, it tries again.

If rand5 truly returns values with a uniform probability, then the code will give back 1-7 uniformly.

$ go build r5a.go
$ ./r5a | sort -n | uniq -c
  71237 1
  71637 2
  71398 3
  71345 4
  71457 5
  71209 6
  71717 7
$

Just to put this problem to bed, I wrote a third solution, which will create a uniform output with values 1 through M, from a "rand N" function that returns values 0 through N - 1 uniformly.

$ go build rmn.go
$ ./rmn 500000 7 5 | sort -n | uniq -c
Should end up with about 71428 of each value
  71905 1
  71430 2
  71734 3
  71590 4
  71029 5
  70931 6
  71381 7
$

The algorithm will work for M > N and M < N2. It requires extra space O(N2).

Interview analysis

The "rand5 into rand7" question in and of itself is a bad question, for the same reasons as the "rand7 into rand5" question: the candidate has to recognize some fact in a small amount of time. Once recognized, there's not a huge amount of coding.

The general "randN into randM, where M > N" could be a decent interview question, as long as the interviewer gives some hints. If the interviewer walks the candidate through two randN() values as indexes into a NxN array with 1-M values, each value appearing some times, the candidate could exhibit a lot of coding skills.

  • indirection in service of generalization while setting up the NxN array
  • remembering to seed the PRNG
  • array particulars, like indexing and initialization, off-by-1 recognition
  • mastery of division and modulo operators

All of these would show up in setting up the NxN array. Making one distribution from another merely motivates the work.