[ planet-factor ]

John Benediktsson: Finding Duplicate Numbers

One of my favorite interview questions goes something like this:

Given an array of 1001 elements which contains integers from 1 to 1000 inclusive. The numbers are randomly stored in the array. Only one number repeats itself. The candidate has to come up with an efficient solution for finding that duplicate given that you can access the elements of an array only once i.e., you can read the elements of the array only once.

I thought I would show some approaches to solving this, using both Python and Factor.

Numbers

First, we need to make a randomized array of our numbers (with one duplicate):

Python:

def make_numbers(n):
seq = [n] + range(1, 1001)
random.shuffle(seq)
return seq

Factor:

: make-numbers ( n -- seq )
[ 1000 [1,b] ] dip suffix randomize ;

It's worth pointing out that Factor has many "builtin" words that can be helpful for solving this, making the shortest solution:

IN: scratchpad 12 make-numbers duplicates first .
12

Enumeration

While this solution is not linear, it is often the "obvious" first way to solve this problem. It goes something like this: for each element in the list, check if duplicated by any other element:

Python:

def find_duplicate(seq):
for i, m in enumerate(seq):
for n in seq[i+1:]:
if m == n:
return m
return False

Factor:

: find-duplicate ( seq -- n )
dup '[ 1 + _ index-from ] find-index nip ;

Sets

Our first "linear" solution uses sets to track which elements we've seen, stopping when we encounter our first duplicate. We will gloss over the extra memory and time required to maintain the set.

Python:

def find_duplicate(seq):
d = set()
for m in seq:
if m in d:
return m
d.add(m)
return False

Factor:

: find-duplicate ( seq -- n ) 
f fast-set '[ _ ?adjoin not ] find nip ;

Both versions use a generic hash-set. Knowing that our data is numeric values, we can use a bit-set and gain some performance:

: find-duplicate ( seq -- n )
dup length <bit-set> '[ _ ?adjoin not ] find nip ;

Number Theory

Taking advantage of the fact that our numbers are from 1 to 1000, we can compute the sum of the numbers, and subtract the expected total (using the formula (n * n+1) / 2) to find the duplicate:

Python:

def find_duplicate(seq):
n = len(seq) - 1
return sum(seq) - (n*(n+1))/2

Factor:

: find-duplicate ( seq -- n )
[ sum ] [ length dup 1 - * 2 / ] bi - ;

What do you think? Any other good, fun, or unusual ways to solve this problem?

Update: Zeev pointed out in the comments that another way would be to XOR all the values in our sequence and the numbers 1 to 1000:

Python:

def find_duplicate(seq):
n1 = 0
for x in seq:
n1 ^= x
n2 = 0
for x in range(1, 1001):
n2 ^= x
return n1 ^ n2

Factor:

: find-duplicate ( seq -- n )
1000 [1,b] [ 0 [ bitxor ] reduce ] bi@ bitxor ;

Tue, 15 May 2012 04:15:00

John Benediktsson: Random Name Generator

In the spirit of the (almost) pure random demon name generator, I wanted to show a random name generator in Factor.

Some time ago, I implemented a vocabulary for creating fake data which could generate "fake names". The way it worked was to make a name by picking randomly from a list of valid first and last names. The drawback with that approach is that you can only create names that already exist in your list. It would be more interesting if you can use a list of valid names to "seed" a random name generator which uses that to create names that are similar to your list but do not appear in it.

Transition Tables

To do this, we will create a table of "transitions". A transition is a pair of characters that appear next to each other in a name. In pseudo-code, it will be something like this:

  1. For each name from a list of valid names,
  2. For every character in the name,
  3. Update the table by adding the "transition" at the characters index.

We can create a sequence of transitions for a given string by clumping each pair of characters together (as well as showing the last character transitions to f, meaning the end of the word):

: transitions ( string -- enum )
{ f } { } append-as 2 clump <enum> ;

Given a string and a transition table, we can update it with those transitions:

: (transition-table) ( string assoc -- )
[ transitions ] dip '[ swap _ push-at ] assoc-each ;

So, given a sequence of strings, we can create the transition table easily:

: transition-table ( seq -- assoc )
H{ } clone [ [ (transition-table) ] curry each ] keep ;

You can try it out and it will look something like this:

IN: scratchpad { "hello" } transition-table .
H{
{ 0 V{ { CHAR: h CHAR: e } } }
{ 1 V{ { CHAR: e CHAR: l } } }
{ 2 V{ { CHAR: l CHAR: l } } }
{ 3 V{ { CHAR: l CHAR: o } } }
{ 4 V{ { CHAR: o f } } }
}

Generating Names

We can use the make vocabulary to randomly choose the next transition from our table given a previous character, an index, and a transition table:

: next-char, ( prev index assoc -- next )
at swap [ [ nip = ] curry assoc-filter ] when*
random [ first , ] [ second ] bi ;

Generating the names is as easy as starting from zero and adding each character until we hit an f indicating the end of a word.

: generate-name ( seq -- name )
transition-table [
f 0 [
[ pick next-char, ] [ 1 + ] bi over
] loop 3drop
] "" make ;

Generating a number of names is just:

: generate-names ( n seq -- names )
[ generate-name ] curry replicate ;

Try It

So, does it work? Well, if we load a list of Star Trek races, we can generate some new names that sound pretty good!

IN: scratchpad 10 star-trek-races generate-names .
{
"Sooyllan"
"Brakalan"
"Napl"
"Drizi"
"Tevorri"
"Flladian"
"Skadi"
"Cynocak"
"Pamu"
"Patha"
}

The code for this is on my Github.

Thu, 10 May 2012 01:11:00

John Benediktsson: Twin Primes

The most recent programming challenge from Programming Praxis is to:

Pairs of prime numbers that differ by two are known as twin primes: (3,5), (5,7), (11,13), (17,19), (29,31), (41,43), (59,61), (71,73),... Your task is to write a function that finds the twin primes less than a given input n.

Samuel Tardieu has contributed many improvements to Factor's math.primes vocabulary, which we will be using to solve this puzzle.

We can solve this puzzle in the naive method by computing all prime numbers up to a specified input, and then filtering them for pairs that differ by two:

: twin-primes-upto ( n -- seq )
primes-upto 2 <clumps> [ first2 - abs 2 = ] filter ;

Another nice word would check to see whether any two numbers are twin primes (using short-circuit combinators to exit early if any of the conditions are not satisfied):

: twin-primes? ( x y -- ? )
{ [ - abs 2 = ] [ nip prime? ] [ drop prime? ] } 2&& ;

The puzzle page suggests a more efficient method for computing twin primes, which might be worth experimenting with...

Thu, 19 Apr 2012 22:01:00

Chris Double: Travelling to Pitcairn Island

Pitcairn Island Approach

From Sunday 15th April through to 29th April I’ll be mostly offline as I take some leave to visit Pitcairn Island, one of the remotest inhabited islands with a population of about 50 people.

My first stop is flying from New Zealand to Tahiti where I spend a couple of days, then I fly to Mangareva on the 17th to board the Xplore sailing yacht for the approximately two day trip to Pitcairn. I spend a couple of days on the island itself, then return on the Xplore back to Mangareva, followed by flying back to Tahiti for a few more days.

The trip was easy to organise through Pitcairn Travel. Longer trips are available than the one I’m taking but none are scheduled at this time of year. Assuming this trip goes well I hope to go for longer, and maybe to the other Islands in the Pitcairn group, in the future.

Why Pitcairn? Pitcairn is the island that was settled by the Bounty mutineers. My grandmother was born on the island and through her I’m a descendant of three mutineers (Fletcher Christian, John Mills, Ned Young and their Tahitian wives are my sixth great grandparents). I’m looking forward to visiting the Bounty monument in Tahiti and the Bounty Plaque on Pitcairn.

Electricity is available on Pitcairn for about 10 hours per day which limits laptop/gadget usage time. Luckily I plan to spend as much time as possible exploring the island, weather permitting.

I’ll have internet access while in Tahiti but I suspect access to be a bit hit or miss on Pitcairn. In the past internet access was available by sharing satellite internet that was provided by a United States Geologic Survey station on the island. A description of the setup is available here.

Later the Pitcairn Island Government arranged their own satellite internet capability. Recently speeds have been improved to 512 kilobits per second - shared amongst the approximately 50 people on the island. Costs for residents of the island are around $40 per 400MB of usage from what I hear. I would imagine that if someone wanted to regularly access the internet there for work they’d require a dedicated satellite internet connection just for that (Something like Pactel’s VSAT internet maybe). I’ll be sure to do a later post on what using the modern web is like in this part of the world.

Anyone in the area of Tahiti, Mangareva or Pitcairn, let me know, I’d be keen to meet.

Wed, 11 Apr 2012 00:00:00

John Benediktsson: Faster Big Ratios!

While working on a post about approximating pi, I noticed that the performance of Factor's large rational numbers was less than stellar.

Specifically, we defined this estimation function to find pi as a rational number:

:: find-pi-to ( accuracy -- n approx )
1 4 [
over [ 2 * 1 + ] [ odd? [ neg ] when ] bi
4 swap / [ + ] keep
abs accuracy >= [ 1 + ] 2dip
] loop ;

Using this for an accuracy of 0.001 was incredibly slow:

IN: scratchpad [ 0.001 find-pi-to ] time
Running time: 1.692090043 seconds

Bignum

In Factor, large integers are stored as arbitrary-precision "bignums". Similar to other languages such as Python and Scheme, Factor stores these numbers as a sequence of large (either 30-bit or 62-bit) digits, and then performs math on this sequence of digits.

It turns out that most of the ratios has both a bignum numerator and bignum denominator. For most basic math operation on ratios, Factor would compute the greatest common divisor to produce a normalized fraction (e.g., 6/4 would become 3/2).

For an accuracy of 0.001 in this example, the denominator had more than 1,700 digits in base 10. As these numbers grew, the performance of Euclid's GCD algorithm began degrading particularly due to the (relatively) expensive calculation of "mod" for bignum's.

Lehmer GCD

I noticed a Python bug report that suggested addressing a similar performance problem for their rational number implementation (in fractions.py) by implementing Lehmer's GCD algorithm.

After implementing a bignum-gcd primitive that used Lehmer's GCD, I created a fast-gcd word that used this for bignum's and the current gcd word for other real numbers.

Performance improvement was impressive! After recompiling with these patches, our original test case takes less than 10% of the time!

IN: scratchpad [ 0.001 find-pi-to ] time
Running time: 0.156713844 seconds

You can find this in the latest development version of Factor.

Thu, 5 Apr 2012 17:22:00

John Benediktsson: Mega Millions

With the Mega Millions jackpot over $640 million, you might be asking yourself:

How do I use Factor to win the lottery???

You can find a good article on Forbes about calculating your odds of winning. We can also calculate it with Factor. The lottery works like this: you have to choose 5 "main" numbers (1 through 56) and then pick a "mega" number (1 through 46).

We can use the "n choose k" formula in math.combinatorics to compute the number of ways of picking "5 unordered outcomes from 56 numbers and then 1 of 46 possible mega numbers":

IN: scratchpad 56 5 nCk 46 * .
175711536

Sure enough, thats 175,711,536 possibilities. So if you buy one random lottery ticket, you have a 1 in 175,711,536 chance of winning, or a 0.00000000569114597006% chance. Not that great! What are the other odds?

Since the jackpot is so large, it's got to be worth playing, right? In fact, if you take the total jackpot and the odds of each winning category, you can find the expected value of a ticket:

IN: scratchpad {
640000000/175711536
250000/3904701
10000/689065
150/15313
150/13781
7/306
10/844
3/141
2/75
} sum "$%.2f" printf
$3.82

Wow, $3.82 of expected value (not counting sharing the jackpot with someone who picked the same numbers, or the fact you can only win the highest prize you qualify for)! But, how do we pick our numbers... well, Factor to the rescue! Let's randomly sample our 5 main numbers and then pick a random mega number and output the result:

IN: scratchpad : mega ( -- seq )
56 [1,b] 5 sample natural-sort
46 [1,b] random suffix ;

In the spirit of fiver, we can generate these numbers to play:

IN: scratchpad 5 [ mega ] replicate .
{
{ 2 8 25 30 43 29 }
{ 6 15 31 41 47 33 }
{ 11 20 26 29 33 15 }
{ 3 6 19 23 32 15 }
{ 16 25 31 44 50 24 }
}

And tonight we can check the winning numbers and see if it worked!

Fri, 30 Mar 2012 21:41:00

Chris Double: Building and Running Boot To Gecko on the Nexus S

Update 2012-03-29 - The Nexus S port has moved to an ICS base system and the existing Gingerbread base no longer works correctly. I’ve adjusted the instructions below to build the ICS based system. It just involves using ‘config-nexuss-ics’ instead of ‘config-nexus’.

Last year Mozilla announced the Boot to Gecko project - a mobile OS based on web technologies. Recently it was demoed at MWC 2012. Work is being done to improve video playback on B2G using hardware codecs in bug 714408.

I’ve built and run B2G on the emulator before but I wanted to try it out on real hardware to test the video support and play around with the OS. I upgraded my main phone to a Galaxy Note recently leaving a my Nexus S spare for trying different ROMS on it. Support for the Nexus S has started becoming available for B2G (previously the main consumer phone for testing was the Galaxy S II) so I gave it a try. The Nexus S I have is the GSM (non-4G) version. The steps to get the source code:

$ git clone git://github.com/andreasgal/B2G
$ cd B2G
$ make sync

This takes a long time (on New Zealand networks anyway…). Multiple gigabytes of git submodules are cloned.

You’ll want to make sure you have a build environment set up, as per this MDN article so that ‘adb’ and other android tools work. Once done, configure for a Nexus S build:

$ make config-nexuss-ics

This will download binaries for the phone and get you to confirm a bunch of licenses. To build:

$ make gonk
...
$ make
...

The ‘gonk’ make invocation builds the underlying android layer. The following ‘make’ builds gecko and related parts of B2G. Once those are completed you can flash the phone with the result. Note, you do the following at your own risk! You’re flashing your phone, overwriting everything, with experimental, possibly buggy software.

To flash a Nexus S you need to have unlocked the bootloader. If you haven’t done this yet, boot into the bootloader (hold the up volume key down while pressing the power button) and run:

$ make unlock-bootloader

This runs “fastboot oem unlock” which is the command to unlock the Nexus S bootloader. You’ll need to agree to it on the phone. I also installed the CyanogenMod recovery firmware but this is optional. Instructions to do that are here.

To flash, boot the phone into recovery mode (Hold down the up volume key while pressing power, from the menu that appears choose ‘Recovery’), plugin in the USB cable to your PC. Run:

$ make flash-only

‘flash-only’ will flash your B2G build onto the phone. You’ll lose everything on the phone, sorry.

If your Nexus S was running a 2.3 based Android this should just work. If it was running ICS (as mine was) then you might get an error about unsupported baseband and/or bootloader versions. If you get this, I fixed it by editing ‘glue/gonk/device/samsung/crespo/board-info.txt’ to add the versions that my bootloader and baseband had. My file looked like:

require board=herring
require version-bootloader=I9020XXJK1|I9020XXKA3|I9020XXKL1
require version-baseband=I9020XXJK8|I9020XXKB1|...|I9020UCKE1|I9020XXKI1

Note the addition of ‘I9020XXKL1’ and ‘I9020XXKI1’ to the bootloader and baseband lines respectively. After editing I redid “make gonk”, “make” and “make flash-only”.

After ‘flash-only’ your phone will reboot. If it boots into an error box saying “no homescreen found” then do the following while the USB cable is connected and that error is showing:

$ make install-gaia
$ adb reboot

This will install the user files and reboot the phone. B2G should now be running on the device. Enjoy! I’ve found calls, text messages, Wifi and Web Browsing works.

Nexus S running B2G

Mon, 19 Mar 2012 23:00:00

John Benediktsson: Next Permutation

Yesterday, I noticed a programming challenge to find the "next greater permutation of digits" for a given number:

Given a number, find the next higher number that uses the same set of digits. For instance, given the number 38276, the next higher number that uses the digits 2 3 6 7 8 is 38627.

While reading the comments, I noticed that some of the C++ solutions used the std::next_permutation function that returns the "lexicographically next greater permutation of elements". Noticing that the math.combinatorics vocabulary lacks a next-permutation word, I thought it would be nice to contribute one to the Factor standard library.

std::next_permutation()

It's a useful place to start to get an overview and example of how the algorithm works. The C++ version is pretty dense and looks like this:

template<typename Iter>
bool next_permutation(Iter first, Iter last)
{
if (first == last)
return false;
Iter i = first;
++i;
if (i == last)
return false;
i = last;
--i;

for(;;)
{
Iter ii = i;
--i;
if (*i < *ii)
{
Iter j = last;
while (!(*i < *--j))
{}
std::iter_swap(i, j);
std::reverse(ii, last);
return true;
}
if (i == first)
{
std::reverse(first, last);
return false;
}
}
}

Example!

Walking through an example of the algorithm is particularly helpful to understand it:

As an example consider finding the next permutation of:

8342666411

The longest monotonically decreasing tail is 666411, and the corresponding head is 8342.

8342 666411

666411 is, by definition, reverse-ordered, and cannot be increased by permuting its elements. To find the next permutation, we must increase the head; a matter of finding the smallest tail element larger than the head’s final 2.

8342 666411

Walking back from the end of tail, the first element greater than 2 is 4.

8342 666411

Swap the 2 and the 4

8344 666211

Since head has increased, we now have a greater permutation. To reduce to the next permutation, we reverse tail, putting it into increasing order.

8344 112666

Join the head and tail back together. The permutation one greater than 8342666411 is 8344112666.

Implementation

We can use the steps in the example above to help organize our code. First, we find the "cut point" which is the index to the left of the longest monotonic tail:

: cut-point ( seq -- n )
[ last ] keep [ [ > ] keep swap ] find-last drop nip ;

Next, we want to find the smallest element larger than the element at the "cut point" (searching from the end of the sequence):

: greater-from-last ( n seq -- i )
[ nip ] [ nth ] 2bi [ > ] curry find-last drop ;

We then need a way to reverse the tail of a sequence (the ! denotes that this modifies the sequence in-place):

: reverse-tail! ( n seq -- seq )
[ swap 1 + tail-slice reverse! drop ] keep ;

Putting this together gives us a word that looks a bit like our example. I have decided that in the case where the sequence reaches its lexicographic greatest order, we reverse it to its smallest ordering. This allows it to cycle through all possible permutations no matter where you start.

: (next-permutation) ( seq -- seq )
dup cut-point [
swap [ greater-from-last ] 2keep
[ exchange ] [ reverse-tail! nip ] 3bi
] [ reverse! ] if* ;

We wrap this with a simple check to make sure the sequence is not empty. Arguably, we could instead check if the length is greater than 1 since a single element sequence has only possible permutation.

: next-permutation ( seq -- seq )
dup [ ] [ drop (next-permutation) ] if-empty ;

Testing

Factor makes it easy to do unit tests. Here are some of the ones I've used to test this code:

[ "" ] [ "" next-permutation ] unit-test

[ "1" ] [ "1" next-permutation ] unit-test

[ "21" ] [ "12" next-permutation ] unit-test

[ "8344112666" ] [ "8342666411" next-permutation ] unit-test

[ "ABC" "ACB" "BAC" "BCA" "CAB" "CBA" "ABC" ]
[ "ABC" 6 [ dup >string next-permutation ] times ] unit-test

This is available in the latest development branch of Factor.

Bonus

Solving the original programming challenge is as easy as:

IN: scratchpad 38276 number>string next-permutation string>number .
38627

Sat, 3 Mar 2012 05:38:00

Blogroll


planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp.

Syndicate