John Benediktsson: Bit Test
Factor has a bit? generic that is used to test an integer to see if a particular bit is set. When operating on fixnum integers, this is implemented by the : fixnum-bit? ( x n -- ? ) Many CPU architectures have instructions for performing a Bit Test. On x86, the BT instruction is available. Below, we are going to implement a compiler intrinsic in the hopes of speeding up this operation in Factor. IntrinsicIn cpu.architecture, add a generic HOOK: %bit-test cpu ( dst src1 src2 temp -- ) In cpu.x86, implement M:: x86 %bit-test ( dst src1 src2 temp -- ) In compiler.cfg.instructions, add a FOLDABLE-INSN: ##bit-test In compiler.codegen, we link the CODEGEN: ##bit-test %bit-test In compiler.cfg.intrinsics, enable replacing : enable-bit-test ( -- ) DisassembleThe old assembly using 000000010ea00250: mov [rip-0x1ff256], eax The new assembly looks like this with BT: 000000010e656ec0: mov [rip-0x293ec6], eax PerformanceTurns out that this speeds up : bench-fixnum-bit ( x n -- ? ) Our old version was decently fast: IN: scratchpad gc [ 0b101010101 0 bench-fixnum-bit ] time But our new version is much faster! IN: scratchpad gc [ 0b101010101 0 bench-fixnum-bit ] time This has been committed to the development version and is available now. John Benediktsson: Compressed Sieve of Eratosthenes
In my previous post, we used "2-3-5-7" wheel factorization to produce a faster Sieve of Eratosthenes with a compression factor of 2 (by storing only the odd numbers). Samuel Tardieu reminded me that the sieve variation used by Factor uses less memory with a higher compression factor of 3.75. I thought would be a fun implementation to show below. Version 6The "2-3-5" wheel with the first 30 numbers that are not divisible by 2, 3, or 5. CONSTANT: wheel-2-3-5 $[ Note: We use 30 here because the " If you look at the wheel, you see that there are 8 candidates in every 30 numbers: IN: scratchpad wheel-2-3-5 . The number CONSTANT: masks $[ We can index into this array to find the byte/mask for every number: : byte-mask ( n -- byte mask/f ) Using the byte/mask we can check if a number is marked in the sieve byte-array. :: marked? ( n sieve -- ? ) And we can unmark a number in the sieve in a similar fashion: :: unmark ( n sieve -- ) Using that, we can unmark multiples of prime numbers: :: unmark-multiples ( i upper sieve -- ) That's all we need to build our sieve, starting from a byte-array initialized with all numbers marked and then progressively unmarking multiples of primes: :: sieve6 ( n -- sieve ) As you can see, storing prime numbers up to 10 million takes only 333 KB! IN: scratchpad 10,000,000 sieve6 length . This approach is used by the math.primes vocabulary to quickly check primality of any number less than 9 million using only 300 KB of memory. Neat! John Benediktsson: Genuine Sieve of Eratosthenes
Iain Gray posted on the mailing list about a paper called the Genuine Sieve of Eratosthenes by Melissa O'Neill, a Professor of Computer Science at Harvey Mudd College. It begins with a discussion about some clever looking Haskell code that is just trial division and not the Sieve of Eratosthenes. At the end of the paper is an interesting discussion of performance improvements that I thought might be fun to implement in Factor as a followup to the three versions that I posted about recently. Version 4We are going to use wheel factorization as a way of reducing the search space by ignoring some multiples of small prime numbers. It is a bit similar to ignoring all even numbers in our previous "Version 3". In this case, we want to ignore all multiples of the first four prime numbers. To calculate the "2-3-5-7" wheel, we start with CONSTANT: wheel-2-3-5-7 $[ Note: We use 210 here because the " Next, we want a way to iterate across all the numbers that might be prime, calling a quotation on each number that is prime. :: each-prime ( upto sieve quot -- ) For each prime found, we will want to mark all odd multiples as not prime. :: mark-multiples ( i upto sieve -- ) We will use a bit-array that is sized in 210-bit blocks, so the number of bits needed is: : sieve-bits ( n -- bits ) Finally, we calculate our sieve, first for :: sieve4 ( n -- primes ) Calculating prime numbers up to 10 million is getting even faster. IN: scratchpad gc [ 10,000,000 sieve4 ] time Version 5If you use the compiler.tree.debugger to inspect the optimized output, you can see some dynamic dispatch with branches for fixed-width (fixnum) and arbitrary-precision (bignum) integers. It would be nice to perform fixnum specialization or improve type propagation in the compiler, but for now we are going to write some ugly code to force fixnum math for performance. We make a version of :: each-prime2 ( upto sieve quot -- ) And similarly for :: mark-multiples2 ( i upto sieve -- ) And use them in our sieve computation: :: sieve5 ( n -- primes ) Calculating prime numbers up to 10 million is super fast! IN: scratchpad gc [ 10,000,000 sieve5 ] time We can compute all the 11,078,937 prime numbers up to 200 million in about a second! IN: scratchpad gc [ 200,000,000 sieve5 ] time And all the 50,847,534 prime numbers up to 1 billion in about six seconds! IN: scratchpad gc [ 1,000,000,000 sieve5 ] time Not quite primegen or primesieve, but not bad for a laptop! This is available in the math.primes.erato.fast vocabulary on our nightly builds! John Benediktsson: Sieve of Eratosthenes
I noticed that the Crystal Programming Language homepage has an example of the Sieve of Eratosthenes algorithm for finding prime numbers. I thought it would be fun to show a few simple variations of that algorithm in Factor. The algorithm works by iteratively marking as not prime the multiples of each prime, starting with the multiples of 2. In pseudocode, it looks like this: Input: an integer n > 1 We will be looking at the performance and memory usage to calculate the 664,579 prime numbers between 1 and 10,000,000. Version 1Our first version, is a straightforward implementation from the pseudocode: :: sieve1 ( n -- primes ) We can show that it works for the first few prime numbers: IN: scratchpad 25 sieve1 . Calculating prime numbers up to 10 million takes less than half a second: IN: scratchpad [ 10,000,000 sieve1 ] time The memory used is basically an array with 10 million elements in it, each element of the array is a boolean (64 bits on my laptop). Total memory used is 80 MB. Version 2Storing booleans can be more memory-efficient using bit-arrays, where each boolean is stored as a single bit. The logic is the same, however primes are marked by false rather than true. :: sieve2 ( n -- primes ) Calculating prime numbers up to 10 million is faster: IN: scratchpad [ 10,000,000 sieve2 ] time The memory used is now one bit per number, so only 10 million bits or 1.25 MB. Version 3Since we know that even numbers (except :: sieve3 ( n -- sieve ) Note: the convenience of selector is not available because we want to initialize the list of primes with Calculating prime numbers up to 10 million is much faster! IN: scratchpad [ 10,000,000 sieve3 ] time As for memory usage, it is storing one bit for each odd number, so only 5 million bits or 625 KB. John Benediktsson: SEND + MORE = MONEY?
There's a classic verbal arithmetic puzzle that was published in 1924 by Henry Dudeney, where you solve the equation: S E N D Note: Each letter is a unique digit and A few days ago, I noticed a blog post with solutions in Perl 6 that references another blog post with a solution in Haskell. I thought I'd show a solution in Factor using the backtrack vocabulary that provides something like John McCarthy's First, we need a list of available digits, just the numbers 0 through 9: CONSTANT: digits { 0 1 2 3 4 5 6 7 8 9 } Next, we turn a sequence of digits into a number (e.g., : >number ( seq -- n ) 0 [ [ 10 * ] dip + ] reduce ; We can then implement our solver, by choosing digits while progressively restricting the possible values for future digits using the ones we've chosen so far (using local variables to store the digits): :: send-more-money ( -- ) Note: We search all solutions using amb-all (even though there is only one). In this case, it is effectively an iterative search, which we could implement without backtracking. If we wanted the first solution, we could use if-amb. So, what's the answer? Let's see! IN: scratchpad send-more-money Neat! And it's fast too -- solving this puzzle in about 2.5 seconds on my laptop. The code for this is on my GitHub. John Benediktsson: File Monitor
Factor has a cross-platform file-system change monitor which supports detecting changes to file names, attributes and contents under a specified directory. There is some minor platform differences between Mac OS X, Windows, and Linux which may be worth looking at if you are building on top of the io.monitors vocabulary. I was curious about what kind of events are generated for various test-cases and built a small utility to experiment with it. Some code to monitor for changed paths recursively in a directory and print each one out: : watch-loop ( monitor -- ) I've committed this as the file-monitor tool (with support for an optional command-line argument to specify which directory to monitor as well as printing the change descriptors). You can run it very simply: factor -run=file-monitor [path] An example session on Linux, monitoring some simple changes to files in
John Benediktsson: File Server
Python has a neat feature that lets you serve files from the current directory. # Python 2 I always thought this was a quick and useful way to share files on a local network. Given that Factor has a HTTP Server, we should be able to implement this! We already have support for serving static content and serving CGI scripts, so we can very simply implement a script to create and launch a HTTP server for the current directory (or the one specified on the command-line), logging HTTP connections to stdout. This is available in the file-server vocabulary, now you can: factor -run=file-server [--cgi] [path] Currently, this defaults to serving files on port 8080 from all available network interfaces. In the future, it would be nice to add the ability to specify port and network interfaces to bind. John Benediktsson: Burrows-Wheeler Transform
The Burrows–Wheeler transform is a reversible method of rearranging text used to improve the performance of compression algorithms, such as bzip2. We will implement transform, Version 1We start with the pseudocode suggested in the Wikipedia article: function BWT (string s) In Python, this might look like: def bwt(s): In Factor, using all-rotations, it might look like this: : bwt ( seq -- seq' ) The pseudocode to perform the inverse transform: function inverseBWT (string s) In Python, this might look like: def ibwt(s): In Factor, we could implement it like this: : ibwt ( seq -- seq' ) Unfortunately, this is very slow, with most of the performance loss in the invert transform. Version 2Another way to increase the speed of BWT inverse is to use an algorithm that returns an index into the sorted rotations along with the transform. In Python, it looks like this: def bwt(s): In Factor, it might look like this: : bwt ( seq -- i seq' ) In Python, the inverse transform looks like this: def ibwt(k, s): In Factor, that roughly translates to: : ibwt ( i seq -- seq' ) An improved version 2 is available in the development version. In particular, it uses rotated virtual sequences for increased performance and returns transformations that match the type of the input sequence. |
Blogroll
planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp. |