John Benediktsson: Finding Duplicate Numbers
One of my favorite interview questions goes something like this: Given an array of 1001 elements which contains integers from 1 to 1000 inclusive. The numbers are randomly stored in the array. Only one number repeats itself. The candidate has to come up with an efficient solution for finding that duplicate given that you can access the elements of an array only once i.e., you can read the elements of the array only once. I thought I would show some approaches to solving this, using both Python and Factor. NumbersFirst, we need to make a randomized array of our numbers (with one duplicate):
It's worth pointing out that Factor has many "builtin" words that can be helpful for solving this, making the shortest solution: IN: scratchpad 12 make-numbers duplicates first . EnumerationWhile this solution is not linear, it is often the "obvious" first way to solve this problem. It goes something like this: for each element in the list, check if duplicated by any other element:
SetsOur first "linear" solution uses sets to track which elements we've seen, stopping when we encounter our first duplicate. We will gloss over the extra memory and time required to maintain the set.
Both versions use a generic hash-set. Knowing that our data is numeric values, we can use a bit-set and gain some performance: : find-duplicate ( seq -- n ) Number TheoryTaking advantage of the fact that our numbers are from 1 to 1000, we can compute the
What do you think? Any other good, fun, or unusual ways to solve this problem? Update: Zeev pointed out in the comments that another way would be to XOR all the values in our sequence and the numbers 1 to 1000:
John Benediktsson: Random Name Generator
In the spirit of the (almost) pure random demon name generator, I wanted to show a random name generator in Factor. Some time ago, I implemented a vocabulary for creating fake data which could generate "fake names". The way it worked was to make a name by picking randomly from a list of valid first and last names. The drawback with that approach is that you can only create names that already exist in your list. It would be more interesting if you can use a list of valid names to "seed" a random name generator which uses that to create names that are similar to your list but do not appear in it. Transition TablesTo do this, we will create a table of "transitions". A transition is a pair of characters that appear next to each other in a name. In pseudo-code, it will be something like this:
We can create a sequence of transitions for a given string by clumping each pair of characters together (as well as showing the last character transitions to : transitions ( string -- enum ) Given a string and a transition table, we can update it with those transitions: : (transition-table) ( string assoc -- ) So, given a sequence of strings, we can create the transition table easily: : transition-table ( seq -- assoc ) You can try it out and it will look something like this: IN: scratchpad { "hello" } transition-table . Generating NamesWe can use the make vocabulary to randomly choose the next transition from our table given a previous character, an index, and a transition table: : next-char, ( prev index assoc -- next ) Generating the names is as easy as starting from zero and adding each character until we hit an : generate-name ( seq -- name ) Generating a number of names is just: : generate-names ( n seq -- names ) Try ItSo, does it work? Well, if we load a list of Star Trek races, we can generate some new names that sound pretty good! IN: scratchpad 10 star-trek-races generate-names . The code for this is on my Github. John Benediktsson: Twin Primes
The most recent programming challenge from Programming Praxis is to: Pairs of prime numbers that differ by two are known as twin primes: (3,5), (5,7), (11,13), (17,19), (29,31), (41,43), (59,61), (71,73),... Your task is to write a function that finds the twin primes less than a given input n. Samuel Tardieu has contributed many improvements to Factor's math.primes vocabulary, which we will be using to solve this puzzle. We can solve this puzzle in the naive method by computing all prime numbers up to a specified input, and then filtering them for pairs that differ by two: : twin-primes-upto ( n -- seq ) Another nice word would check to see whether any two numbers are twin primes (using short-circuit combinators to exit early if any of the conditions are not satisfied): : twin-primes? ( x y -- ? ) The puzzle page suggests a more efficient method for computing twin primes, which might be worth experimenting with... Chris Double: Travelling to Pitcairn IslandFrom Sunday 15th April through to 29th April I’ll be mostly offline as I take some leave to visit Pitcairn Island, one of the remotest inhabited islands with a population of about 50 people. My first stop is flying from New Zealand to Tahiti where I spend a couple of days, then I fly to Mangareva on the 17th to board the Xplore sailing yacht for the approximately two day trip to Pitcairn. I spend a couple of days on the island itself, then return on the Xplore back to Mangareva, followed by flying back to Tahiti for a few more days. The trip was easy to organise through Pitcairn Travel. Longer trips are available than the one I’m taking but none are scheduled at this time of year. Assuming this trip goes well I hope to go for longer, and maybe to the other Islands in the Pitcairn group, in the future. Why Pitcairn? Pitcairn is the island that was settled by the Bounty mutineers. My grandmother was born on the island and through her I’m a descendant of three mutineers (Fletcher Christian, John Mills, Ned Young and their Tahitian wives are my sixth great grandparents). I’m looking forward to visiting the Bounty monument in Tahiti and the Bounty Plaque on Pitcairn. Electricity is available on Pitcairn for about 10 hours per day which limits laptop/gadget usage time. Luckily I plan to spend as much time as possible exploring the island, weather permitting. I’ll have internet access while in Tahiti but I suspect access to be a bit hit or miss on Pitcairn. In the past internet access was available by sharing satellite internet that was provided by a United States Geologic Survey station on the island. A description of the setup is available here. Later the Pitcairn Island Government arranged their own satellite internet capability. Recently speeds have been improved to 512 kilobits per second - shared amongst the approximately 50 people on the island. Costs for residents of the island are around $40 per 400MB of usage from what I hear. I would imagine that if someone wanted to regularly access the internet there for work they’d require a dedicated satellite internet connection just for that (Something like Pactel’s VSAT internet maybe). I’ll be sure to do a later post on what using the modern web is like in this part of the world. Anyone in the area of Tahiti, Mangareva or Pitcairn, let me know, I’d be keen to meet. John Benediktsson: Faster Big Ratios!
While working on a post about approximating pi, I noticed that the performance of Factor's large rational numbers was less than stellar. Specifically, we defined this estimation function to find :: find-pi-to ( accuracy -- n approx ) Using this for an accuracy of IN: scratchpad [ 0.001 find-pi-to ] time BignumIn Factor, large integers are stored as arbitrary-precision "bignums". Similar to other languages such as Python and Scheme, Factor stores these numbers as a sequence of large (either 30-bit or 62-bit) digits, and then performs math on this sequence of digits. It turns out that most of the ratios has both a bignum numerator and bignum denominator. For most basic math operation on ratios, Factor would compute the greatest common divisor to produce a normalized fraction (e.g., 6/4 would become 3/2). For an accuracy of Lehmer GCDI noticed a Python bug report that suggested addressing a similar performance problem for their rational number implementation (in After implementing a bignum-gcd primitive that used Lehmer's GCD, I created a fast-gcd word that used this for bignum's and the current gcd word for other real numbers. Performance improvement was impressive! After recompiling with these patches, our original test case takes less than 10% of the time! IN: scratchpad [ 0.001 find-pi-to ] time You can find this in the latest development version of Factor. John Benediktsson: Mega Millions
With the Mega Millions jackpot over $640 million, you might be asking yourself: How do I use Factor to win the lottery??? You can find a good article on Forbes about calculating your odds of winning. We can also calculate it with Factor. The lottery works like this: you have to choose 5 "main" numbers (1 through 56) and then pick a "mega" number (1 through 46). We can use the "n choose k" formula in math.combinatorics to compute the number of ways of picking "5 unordered outcomes from 56 numbers and then 1 of 46 possible mega numbers": IN: scratchpad 56 5 nCk 46 * . Sure enough, thats 175,711,536 possibilities. So if you buy one random lottery ticket, you have a 1 in 175,711,536 chance of winning, or a 0.00000000569114597006% chance. Not that great! What are the other odds? Since the jackpot is so large, it's got to be worth playing, right? In fact, if you take the total jackpot and the odds of each winning category, you can find the expected value of a ticket: IN: scratchpad { Wow, $3.82 of expected value (not counting sharing the jackpot with someone who picked the same numbers, or the fact you can only win the highest prize you qualify for)! But, how do we pick our numbers... well, Factor to the rescue! Let's randomly sample our 5 main numbers and then pick a random mega number and output the result: IN: scratchpad : mega ( -- seq ) In the spirit of fiver, we can generate these numbers to play: IN: scratchpad 5 [ mega ] replicate . And tonight we can check the winning numbers and see if it worked! Chris Double: Building and Running Boot To Gecko on the Nexus S
Update 2012-03-29 - The Nexus S port has moved to an ICS base system and the existing Gingerbread base no longer works correctly. I’ve adjusted the instructions below to build the ICS based system. It just involves using ‘config-nexuss-ics’ instead of ‘config-nexus’. Last year Mozilla announced the Boot to Gecko project - a mobile OS based on web technologies. Recently it was demoed at MWC 2012. Work is being done to improve video playback on B2G using hardware codecs in bug 714408. I’ve built and run B2G on the emulator before but I wanted to try it out on real hardware to test the video support and play around with the OS. I upgraded my main phone to a Galaxy Note recently leaving a my Nexus S spare for trying different ROMS on it. Support for the Nexus S has started becoming available for B2G (previously the main consumer phone for testing was the Galaxy S II) so I gave it a try. The Nexus S I have is the GSM (non-4G) version. The steps to get the source code:
This takes a long time (on New Zealand networks anyway…). Multiple gigabytes of git submodules are cloned. You’ll want to make sure you have a build environment set up, as per this MDN article so that ‘adb’ and other android tools work. Once done, configure for a Nexus S build:
This will download binaries for the phone and get you to confirm a bunch of licenses. To build:
The ‘gonk’ make invocation builds the underlying android layer. The following ‘make’ builds gecko and related parts of B2G. Once those are completed you can flash the phone with the result. Note, you do the following at your own risk! You’re flashing your phone, overwriting everything, with experimental, possibly buggy software. To flash a Nexus S you need to have unlocked the bootloader. If you haven’t done this yet, boot into the bootloader (hold the up volume key down while pressing the power button) and run:
This runs “fastboot oem unlock” which is the command to unlock the Nexus S bootloader. You’ll need to agree to it on the phone. I also installed the CyanogenMod recovery firmware but this is optional. Instructions to do that are here. To flash, boot the phone into recovery mode (Hold down the up volume key while pressing power, from the menu that appears choose ‘Recovery’), plugin in the USB cable to your PC. Run:
‘flash-only’ will flash your B2G build onto the phone. You’ll lose everything on the phone, sorry. If your Nexus S was running a 2.3 based Android this should just work. If it was running ICS (as mine was) then you might get an error about unsupported baseband and/or bootloader versions. If you get this, I fixed it by editing ‘glue/gonk/device/samsung/crespo/board-info.txt’ to add the versions that my bootloader and baseband had. My file looked like:
Note the addition of ‘I9020XXKL1’ and ‘I9020XXKI1’ to the bootloader and baseband lines respectively. After editing I redid “make gonk”, “make” and “make flash-only”. After ‘flash-only’ your phone will reboot. If it boots into an error box saying “no homescreen found” then do the following while the USB cable is connected and that error is showing:
This will install the user files and reboot the phone. B2G should now be running on the device. Enjoy! I’ve found calls, text messages, Wifi and Web Browsing works.
John Benediktsson: Next Permutation
Yesterday, I noticed a programming challenge to find the "next greater permutation of digits" for a given number: Given a number, find the next higher number that uses the same set of digits. For instance, given the number 38276, the next higher number that uses the digits 2 3 6 7 8 is 38627. While reading the comments, I noticed that some of the C++ solutions used the std::next_permutation function that returns the "lexicographically next greater permutation of elements". Noticing that the math.combinatorics vocabulary lacks a std::next_permutation()It's a useful place to start to get an overview and example of how the algorithm works. The C++ version is pretty dense and looks like this: template<typename Iter> Example!Walking through an example of the algorithm is particularly helpful to understand it:
ImplementationWe can use the steps in the example above to help organize our code. First, we find the "cut point" which is the index to the left of the longest monotonic tail: : cut-point ( seq -- n ) Next, we want to find the smallest element larger than the element at the "cut point" (searching from the end of the sequence): : greater-from-last ( n seq -- i ) We then need a way to reverse the tail of a sequence (the : reverse-tail! ( n seq -- seq ) Putting this together gives us a word that looks a bit like our example. I have decided that in the case where the sequence reaches its lexicographic greatest order, we reverse it to its smallest ordering. This allows it to cycle through all possible permutations no matter where you start. : (next-permutation) ( seq -- seq ) We wrap this with a simple check to make sure the sequence is not empty. Arguably, we could instead check if the length is greater than 1 since a single element sequence has only possible permutation. : next-permutation ( seq -- seq ) TestingFactor makes it easy to do unit tests. Here are some of the ones I've used to test this code: [ "" ] [ "" next-permutation ] unit-test This is available in the latest development branch of Factor. BonusSolving the original programming challenge is as easy as: IN: scratchpad 38276 number>string next-permutation string>number . |
Blogroll
planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp. |