<?xml version="1.0" encoding="iso-8859-1"?><feed xmlns="http://www.w3.org/2005/Atom"><title>[ planet-factor ]</title><link href="http://planet.factorcode.org"/><entry><title type="html">Daniel Ehrenberg: Interval maps in Factor</title><link href="http://useless-factor.blogspot.com/2008/05/interval-maps-in-factor.html"/><published>2008-05-07T21:50:00.000-07:00</published><content type="html">Recently, I wrote a little library in Factor to get the script of a Unicode code point. It&apos;s in the Factor git repository in the vocab &lt;code&gt;unicode.script&lt;/code&gt;. Initially, I relatively simple representation of the data: there was a byte array, where the index was the code point and the elements were bytes corresponding to scripts. (It&apos;s possible to use a byte array because there are only seventy-some scripts to care about.) Lookup consisted of &lt;code&gt;char&gt;num-table nth num&gt;name-table nth&lt;/code&gt;. But this was pretty inefficient. The largest code point (that I wanted to represent here) was something around number 195,000, meaning that the byte array took up almost 200Kb. Even if I somehow got rid of that empty space (and I don&apos;t see an obvious way how, without a bunch of overhead), there are 100,000 code points whose script I wanted to encode. &lt;br /&gt;&lt;br /&gt;But we can do better than taking up 100Kb. The thing about this data is that scripts are in a bunch of contiguous ranges. That is, two characters that are next to each other in code point order are very likely to have the same script. The &lt;a href=&quot;http://unicode.org/Public/UNIDATA/Scripts.txt&quot;&gt;file&lt;/a&gt; in the Unicode Character Database encoding this information actually uses special syntax to denote a range, rather than write out each one individually. So what if we store these intervals directly rather than store each element of the intervals?&lt;br /&gt;&lt;br /&gt;A data structure to hold intervals with O(log n) lookup and insertion has already been developed: interval trees. They&apos;re described in Chapter 14 of &lt;a href=&quot;hhttp://books.google.com/books?id=NLngYyWFl_YC&amp;dq=&amp;pg=PP1&amp;ots=BwOmAE4oG5&amp;sig=EP2XL5q4OCbvCdHfj44WGN8Nhpg&amp;hl=en&amp;sa=X&amp;oi=print&amp;ct=title&amp;cad=one-book-with-thumbnail&quot;&gt;Introduction to Algorithms&lt;/a&gt; starting on page 311, but I won&apos;t describe them here. At first, I tried to implement these, but I realized that, for my purposes, they&apos;re overkill. They&apos;re really easy to get wrong: if you implement them on top of another kind of balanced binary tree, you have to make sure that balancing preserves certain invariants about annotations on the tree. Still, if you need fast insertion and deletion, they make the most sense.&lt;br /&gt;&lt;br /&gt;A much simpler solution is to just have a sorted array of intervals, each associated with a value. The right interval, and then the corresponding value, can be found by simple &lt;a href=&quot;http://en.wikipedia.org/wiki/Binary_search&quot;&gt;binary search&lt;/a&gt;. I don&apos;t even need to know how to do binary search, because it&apos;s already in the Factor library! This is efficient as long as the interval map is constructed all at once, which it is in this case. By a high constant factor, this is also more space-efficient than using binary trees. The whole solution takes less than 30 lines of code.&lt;br /&gt;&lt;br /&gt;(Note: the intervals here are closed and must be disjoint. &amp;lt;=&gt; must be defined on them. They don&apos;t use the intervals in &lt;code&gt;math.intervals&lt;/code&gt; to save space, and since they&apos;re overkill. Interval maps don&apos;t follow the assoc protocol because intervals aren&apos;t discrete, eg floats are acceptable as keys.)&lt;br /&gt;&lt;br /&gt;First, the tuples we&apos;ll be using: an &lt;code&gt;interval-map&lt;/code&gt; is the whole associative structure, containing a single slot for the underlying array.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;TUPLE: interval-map array ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;That array consists of &lt;code&gt;interval-node&lt;/code&gt;s, which have a beginning, end and corresponding value.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;TUPLE: interval-node from to value ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Let&apos;s assume we already have the sorted interval maps. Given a key and an interval map, find-interval will give the index of the interval which might contain the given key.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: find-interval ( key interval-map -- i )&lt;br /&gt;    [ from&gt;&gt; &amp;lt;=&gt; ] binsearch ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;code&gt;interval-contains?&lt;/code&gt; tests if a node contains a given key.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: interval-contains? ( object interval-node -- ? )&lt;br /&gt;    [ from&gt;&gt; ] [ to&gt;&gt; ] bi between? ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Finally, &lt;code&gt;interval-at*&lt;/code&gt; searches an interval map to find a key, finding the correct interval and returning its value only if the interval contains the key.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: fixup-value ( value ? -- value/f ? )&lt;br /&gt;    [ drop f f ] unless* ;&lt;br /&gt;&lt;br /&gt;: interval-at* ( key map -- value ? )&lt;br /&gt;    array&gt;&gt; [ find-interval ] 2keep swapd nth&lt;br /&gt;    [ nip value&gt;&gt; ] [ interval-contains? ] 2bi&lt;br /&gt;    fixup-value ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;A few convenience words, analogous to those for assocs:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: interval-at ( key map -- value ) interval-at* drop ;&lt;br /&gt;: interval-key? ( key map -- ? ) interval-at* nip ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So, to construct an interval map, there are a fewi things that have to be done. The input is an abstract specification, consisting of an assoc where the keys are either (1) 2arrays, where the first is the beginning of an interval and the second is the end (2) numbers, representing an interval of the form [a,a]. This can be converted into a form of all (1) with the following:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: all-intervals ( sequence -- intervals )&lt;br /&gt;    [ &gt;r dup number? [ dup 2array ] when r&gt; ] assoc-map&lt;br /&gt;    { } assoc-like ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Once that is done, the objects should be converted to intervals:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: &gt;intervals ( specification -- intervals )&lt;br /&gt;    [ &gt;r first2 r&gt; interval-node boa ] { } assoc&gt;map ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;After that, and after the intervals are sorted, it needs to be assured that all intervals are disjoint. For this, we can use the &lt;code&gt;monotonic?&lt;/code&gt; combinator, which checks to make sure that all adjacent pairs in a sequence satisfy a predicate. (This is more useful than it sounds at first.)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: disjoint? ( node1 node2 -- ? )&lt;br /&gt;    [ to&gt;&gt; ] [ from&gt;&gt; ] bi* &lt; ;&lt;br /&gt;&lt;br /&gt;: ensure-disjoint ( intervals -- intervals )&lt;br /&gt;    dup [ disjoint? ] monotonic?&lt;br /&gt;    [ &quot;Intervals are not disjoint&quot; throw ] unless ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And, to put it all together, using a tuple array for improved space efficiency:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: &amp;lt;interval-map&gt; ( specification -- map )&lt;br /&gt;    all-intervals [ [ first second ] compare ] sort&lt;br /&gt;    &gt;intervals ensure-disjoint &gt;tuple-array&lt;br /&gt;    interval-map boa ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;All in all, in the case of representing the table of scripts, a table which was previously 200KB is now 20KB. Yay!</content></entry><entry><title type="html">Slava Pestov: I/O changes, and process pipeline support</title><link href="http://factor-language.blogspot.com/2008/05/io-changes-and-process-pipeline-support.html"/><published>2008-05-05T18:02:00.004-04:00</published><content type="html">I made some improvements to the I/O system today.&lt;br /&gt;&lt;h3&gt;Default stream variables&lt;/h3&gt;&lt;br /&gt;The &lt;code&gt;stdio&lt;/code&gt; variable has been replaced by &lt;code&gt;input-stream&lt;/code&gt; and &lt;code&gt;output-stream&lt;/code&gt;, and there are four new words:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;code&gt;with-input-stream&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;with-output-stream&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;with-input-stream*&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;with-output-stream*&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The first two close the stream after, the latter do not. The &lt;code&gt;with-stream&lt;/code&gt; and &lt;code&gt;with-stream*&lt;/code&gt; words are still around, they expect a duplex stream, unpack it, and bind both variables.&lt;br /&gt;&lt;br /&gt;I&apos;ve changed many usages of &lt;code&gt;with-stream&lt;/code&gt; to use one of the unidirectional variants instead. This means that you can now write code like the following:&lt;br /&gt;&lt;pre&gt;&quot;foo.txt&quot; utf8 [&lt;br /&gt;    &quot;blah.txt&quot; utf8 [&lt;br /&gt;        ... read from the first file, write to the second file,&lt;br /&gt;        using read and write ...&lt;br /&gt;    ] with-file-writer&lt;br /&gt;] with-file-reader&lt;/pre&gt;&lt;br /&gt;Before you had to use this really ugly &quot;design pattern&quot;:&lt;br /&gt;&lt;pre&gt;&quot;foo.txt&quot; utf8 &amp;lt;file-reader&gt; [&lt;br /&gt;    &quot;blah.txt&quot; utf8 &amp;lt;file-writer&gt; [&lt;br /&gt;        &amp;lt;duplex-stream&gt; [&lt;br /&gt;            ...&lt;br /&gt;        ] with-stream&lt;br /&gt;    ] with-disposal&lt;br /&gt;] with-disposal&lt;/pre&gt;&lt;br /&gt;Speaking of duplex streams, because they&apos;re not used by anything in the core anymore I have moved them to extra. They are still used by &lt;code&gt;&amp;lt;process-stream&gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;client&gt;&lt;/code&gt;. I added a &lt;code&gt;&amp;lt;process-reader&gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;process-writer&gt;&lt;/code&gt; word for those cases where you only want a pipe in one direction; they express intention better.&lt;br /&gt;&lt;h3&gt;Pipes&lt;/h3&gt;&lt;br /&gt;The &lt;code&gt;&amp;lt;process-stream&gt;&lt;/code&gt; word has been around for a while, and this word used pipes internally, but they were not exposed in a nice, cross-platform way, until now.&lt;br /&gt;&lt;br /&gt;The &lt;code&gt;io.pipes&lt;/code&gt; vocabulary contains two words:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;code&gt;&amp;lt;pipe&gt;&lt;/code&gt; creates a new pipe and wraps a pair of streams around it. The streams are packaged into a single duplex stream; any data written to the stream can be read back from the same stream (presumably, in a different thread). This is actually implemented with native pipes on Unix and Windows.&lt;/li&gt;&lt;li&gt;The &lt;code&gt;run-pipeline&lt;/code&gt; word takes a sequence of quotations or launch descriptors, and runs them all in parallel with input wired up as if it were a Unix shell pipe. For example,&lt;br /&gt;&lt;pre&gt;{ &quot;cat foo.txt&quot; &quot;grep x&quot; &quot;sort&quot; &quot;uniq&quot; } run-pipeline&lt;/pre&gt;&lt;br /&gt;Corresponds to the following shell command:&lt;br /&gt;&lt;pre&gt;cat foo.txt | grep x | sort | uniq&lt;/pre&gt;&lt;br /&gt;In addition, being able to place process objects and quotations in the pipeline gives you a lot of expressive power.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h3&gt;Appending process output to files&lt;/h3&gt;&lt;br /&gt;The &lt;code&gt;io.launcher&lt;/code&gt; vocabulary supports the full array of input and output redirection features, and now, pipelines. There was one missing component: redirecting process output to a file opened for appending. Now this is possible. The following Factor code:&lt;br /&gt;&lt;pre&gt;&amp;lt;process&gt;&lt;br /&gt;    &quot;do-stuff&quot; &gt;&gt;command&lt;br /&gt;    &quot;log.txt&quot; &amp;lt;appender&gt; &gt;&gt;stderr&lt;br /&gt;run-process&lt;/pre&gt;&lt;br /&gt;Corresponds to this shell command:&lt;br /&gt;&lt;pre&gt;do-stuff 2&gt;&gt; do-stuff.txt&lt;/pre&gt;&lt;br /&gt;Of course, shell script is a DSL for launching processes so it is more concise than Factor. However, Factor&apos;s &lt;code&gt;io.launcher&lt;/code&gt; library now supports all of the features that the shell does, and its pretty easy to build a shell command parser using &lt;a href=&quot;http://bluishcoder.co.nz&quot;&gt;Chris Double&apos;s&lt;/a&gt; PEG library, which translates shell commands to sequences of Factor process descriptors in a pipeline.&lt;br /&gt;&lt;br /&gt;And now, I present a concise illustration of the difference between the Unix philosophy and the Windows philosophy. Here we have two pieces of code, which do the exact same thing: create a new pipe, open both ends, return a pair of handles.&lt;br /&gt;&lt;br /&gt;Unix:&lt;br /&gt;&lt;pre&gt;USING: system alien.c-types kernel unix math sequences&lt;br /&gt;qualified io.unix.backend io.nonblocking ;&lt;br /&gt;IN: io.unix.pipes&lt;br /&gt;QUALIFIED: io.pipes&lt;br /&gt;&lt;br /&gt;M: unix io.pipes:(pipe) ( -- pair )&lt;br /&gt;    2 &quot;int&quot; &amp;lt;c-array&gt;&lt;br /&gt;    dup pipe io-error&lt;br /&gt;    2 c-int-array&gt; first2&lt;br /&gt;    [ [ init-handle ] bi@ ] [ io.pipes:pipe boa ] 2bi ;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Windows:&lt;br /&gt;&lt;pre&gt;USING: alien alien.c-types arrays destructors io io.windows libc&lt;br /&gt;windows.types math.bitfields windows.kernel32 windows namespaces&lt;br /&gt;kernel sequences windows.errors assocs math.parser system random&lt;br /&gt;combinators accessors io.pipes io.nonblocking ;&lt;br /&gt;IN: io.windows.nt.pipes&lt;br /&gt;&lt;br /&gt;: create-named-pipe ( name -- handle )&lt;br /&gt;    { PIPE_ACCESS_INBOUND FILE_FLAG_OVERLAPPED } flags&lt;br /&gt;    PIPE_TYPE_BYTE&lt;br /&gt;    1&lt;br /&gt;    4096&lt;br /&gt;    4096&lt;br /&gt;    0&lt;br /&gt;    security-attributes-inherit&lt;br /&gt;    CreateNamedPipe&lt;br /&gt;    dup win32-error=0/f&lt;br /&gt;    dup add-completion&lt;br /&gt;    f &amp;lt;win32-file&gt; ;&lt;br /&gt;&lt;br /&gt;: open-other-end ( name -- handle )&lt;br /&gt;    GENERIC_WRITE&lt;br /&gt;    { FILE_SHARE_READ FILE_SHARE_WRITE } flags&lt;br /&gt;    security-attributes-inherit&lt;br /&gt;    OPEN_EXISTING&lt;br /&gt;    FILE_FLAG_OVERLAPPED&lt;br /&gt;    f&lt;br /&gt;    CreateFile&lt;br /&gt;    dup win32-error=0/f&lt;br /&gt;    dup add-completion&lt;br /&gt;    f &amp;lt;win32-file&gt; ;&lt;br /&gt;&lt;br /&gt;: unique-pipe-name ( -- string )&lt;br /&gt;    [&lt;br /&gt;        &quot;\\\\.\\pipe\\factor-&quot; %&lt;br /&gt;        pipe counter #&lt;br /&gt;        &quot;-&quot; %&lt;br /&gt;        32 random-bits #&lt;br /&gt;        &quot;-&quot; %&lt;br /&gt;        millis #&lt;br /&gt;    ] &quot;&quot; make ;&lt;br /&gt;&lt;br /&gt;M: winnt (pipe) ( -- pipe )&lt;br /&gt;    [&lt;br /&gt;        unique-pipe-name&lt;br /&gt;        [ create-named-pipe dup close-later ]&lt;br /&gt;        [ open-other-end dup close-later ]&lt;br /&gt;        bi pipe boa&lt;br /&gt;    ] with-destructors ;&lt;/pre&gt;&lt;br /&gt;The Windows API makes things difficult for no reason. Anonymous pipes do not support overlapped I/O, so you have to open a named pipe with a randomly-generated name (I&apos;m not making this up, many other frameworks do the same thing and Microsoft even recommends this approach).&lt;br /&gt;&lt;br /&gt;On the flipside, the nice thing about supporting both Unix and Windows is that I get to come up with true high-level abstractions that make sense, instead of being able to get away with having thin wrappers over Unix system calls as so many other language implementations do. For example, Ocaml&apos;s idea of &quot;high-level I/O&quot; is some basic &lt;a href=&quot;http://caml.inria.fr/pub/docs/manual-ocaml/libref/Unix.html&quot;&gt;POSIX bindings&lt;/a&gt;, together with incomplete emulation of Unix semantics on Windows. Look forward to writing a page of code if you want to map a file into memory or run a process and read its output into a string. And of course Java doesn&apos;t support pipes, I/O redirection for launching processes, or file system change monitoring, at all.</content></entry><entry><title type="html">Daniel Ehrenberg: A couple GC algorithms in more detail</title><link href="http://useless-factor.blogspot.com/2008/05/couple-gc-algorithms-in-more-detail.html"/><published>2008-05-03T00:51:00.000-07:00</published><content type="html">In previous posts on garbage collection, I&apos;ve given a pretty cursory overview as to how things actually work. In this post, I hope to give a somewhat more specific explanation of two incremental (and potentially concurrent or parallel, but we&apos;ll ignore that for now) GC algorithms: Yuasa&apos;s snapshot-at-the-beginning incremental mark-sweep collector, and the MC&lt;sup&gt;2&lt;/sup&gt; algorithm. Yuasa&apos;s collector is very widely used, for example in Java 5 when an incremental collector is requested. MC&lt;sup&gt;2&lt;/sup&gt; is a more recent algorithm designed to reduce the fragmentation that mark-sweep creates, and appears to get great performance, though it isn&apos;t used much yet. In their practical implementation, both collectors are generational.&lt;br /&gt;&lt;h3&gt;Yuasa&apos;s mark-sweep collector&lt;/h3&gt;&lt;br /&gt;The idea is pretty simple: take a mark-sweep collector and split up the work, doing a little bit on each allocation. When the heap occupancy passes a certain threshold, say 80%, switch into &quot;mark phase&quot;, and on each allocation, mark the right amount of the heap so that everything&apos;s marked by the time the heap is full. (You can ensure this by making the amount of marking proportional to the amount of memory allocated.) Then, switch into sweep phase, and on each allocation sweep the heap by a certain amount. If a big object is allocated, sweeping continues until there&apos;s enough room. Once sweeping is done, the collector returns to a neutral state and allocation takes place without any special collection actions until the free space dips below the threshold.&lt;br /&gt;&lt;h4&gt;Making this work&lt;/h4&gt;&lt;br /&gt;This is a neat little way to specify a GC algorithm. The implementor has three knobs at their disposal: the threshold to begin collection, the speed of marking, and the speed of sweeping. But there&apos;s a problem: the algorithm, as I described it, doesn&apos;t work. See, the graph of interconnections in the heap may change during the course of marking, and that&apos;s a problem. As I described &lt;a href=&quot;http://useless-factor.blogspot.com/2008/03/some-more-advanced-gc-techniques.html&quot;&gt;in a previous post&lt;/a&gt;, if a pointer gets moved to another location, it might evade marking and get swept, causing memory corruption.&lt;br /&gt;&lt;br /&gt;In a snapshot-at-the-beginning incremental marking GC, the technique to save this is to trap all pointer writes and execute a little bit of code: if the collector is in the marking phase, and if the old pointer value isn&apos;t marked, it needs to get marked and get pushed on the marking stack so that its children get marked. (The marking stack is the explicit stack used for depth-first traversal of the heap, to mark everything it reaches.) This piece of code is called the write barrier, and it goes on in addition to the generational write barrier, if one is necessary.&lt;br /&gt;&lt;h4&gt;Conservativeness&lt;/h4&gt;&lt;br /&gt;One more thing: objects are allocated as marked, if an object is allocated during a GC cycle. This means that they can&apos;t be collected until the next time around. Unfortunately, this means that any generational GC will be ineffective while marking is going on: everything is effectively allocated in the oldest generation. Nevertheless, generations still provide a significant performance advantage, since most time is spent in the neural non-GC state.&lt;br /&gt;&lt;br /&gt; This is called snapshot-at-the-beginning not because an actual snapshot is made, but because everything is saved that had something referring to it at the beginning of the marking phase. (Everything that gets a reference to it during the cycle is also saved.) Of all incremental mark-sweep GC algorithms, a snapshot-at-the-beginning collector is the most conservative, causing the most floating garbage to lie around and wait, uncollected, until the next cycle. Other algorithms have techniques to avoid this, but it often comes at other costs.&lt;br /&gt;&lt;h3&gt;MC&lt;sup&gt;2&lt;/sup&gt;&lt;/h3&gt;&lt;br /&gt;Unfortunately, no matter what strategy is used to minimize fragmentation, there is a program which will cause bad fragmentation of the heap, making it less usable and allocation more expensive. For this reason, a compaction strategy is helpful, and the MC&lt;sup&gt;2&lt;/sup&gt; algorithm (Memory-Constrained Compaction), created by Narendran Sachindran, provides one within an incremental and generational system. The details are somewhat complicated, and in this blog post I&apos;ll offer a simplified view. You can also look at the &lt;a href=&quot;http://www.cs.umass.edu/~emery/pubs/04-15.pdf&quot;&gt;full paper&lt;/a&gt;.&lt;br /&gt;&lt;h4&gt;MC&lt;/h4&gt;&lt;br /&gt;The idea is based on the Mark-Copy (MC) algorithm. The heap is divided up into a number of equally sized windows, say 40. One of these is the nursery, and the others act as tenured space. (I don&apos;t know why, but the papers about this seem to use a two-generation rather than three-generation model. I think it could easily be updated to use three generations, but I&apos;ll stick with this for now.) Each window has a logical number, with the nursery having the highest number.&lt;br /&gt;&lt;br /&gt;Nursery collections go on as I&apos;ve described in &lt;a href=&quot;http://useless-factor.blogspot.com/2008/03/little-more-about-garbage-collection.html&quot;&gt;a previous post&lt;/a&gt;. A tenured space collection is triggered when there is only one (non-nursery) window left free. At this point, the heap is fully marked. During marking, remembered sets of pointers into each window are made. In turn, each window is copied (using Cheney&apos;s copying collector) to the open space that exists, starting in the one free window. The remembered sets can be used to update pointers that go to things that were moved. If the lowest number window is copied first, the remembered sets only need to contain pointers from higher windows to lower windows.&lt;br /&gt;&lt;h4&gt;New modifications&lt;/h4&gt;&lt;br /&gt;MC&lt;sup&gt;2&lt;/sup&gt; adds a few things to this, to make the algorithm incremental and give low upper bounds on space overhead. The first change is that incremental marking is done. This is similar to the incremental snapshot-at-the-beginning marker described above, though the creators of MC&lt;sup&gt;2&lt;/sup&gt; opted for a version called incremental update, which is less conservative and more complicated but equally sound. The next change is in the copying technique. If a window is determined to have high occupancy (like more than 95%), it is left as it is without copying. Otherwise, windows are collected into groups whose remaining data can fit into one window. Those groups are incrementally copied into a new window.&lt;br /&gt;&lt;br /&gt;Other changes make sure that the space overhead is bounded. The size of remembered sets is limited by switching to a card marking system in the event of an overflow. Objects with many references to them are put in semi-permanent storage in the lowest possible window number, minimizing the size of remembered set that they need.&lt;br /&gt;&lt;br /&gt;In a benchmark included in the MC&lt;sup&gt;2&lt;/sup&gt; paper, it is demonstrated that MC&lt;sup&gt;2&lt;/sup&gt; has the same or slightly better performance compared to &lt;em&gt;non-incremental&lt;/em&gt; generational mark-sweep or generational mark-compact, the alternatives for the domain of memory-constrained systems. Pauses more than 30ms are rare, and performance appears to be consistent over a wide range of Java programs.</content></entry><entry><title type="html">Slava Pestov: USA Zip code database</title><link href="http://factor-language.blogspot.com/2008/04/usa-zip-code-database.html"/><published>2008-04-29T21:45:00.004-04:00</published><content type="html">Recently someone posted a &lt;a href=&quot;http://mappinghacks.com/2008/04/28/civicspace-zip-code-database/&quot;&gt;freely available Zip code database&lt;/a&gt; on reddit. The database is in CSV format.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://www.phildawes.net/blog/&quot;&gt;Phil Dawes&lt;/a&gt; contributed a CSV parser to Factor a few days ago, and I thought this would be the perfect use-case to test out the parser.&lt;br /&gt;&lt;br /&gt;I added the library to &lt;code&gt;extra/usa-cities&lt;/code&gt;. It begins with the usual boilerplate:&lt;br /&gt;&lt;pre&gt;USING: io.files io.encodings.ascii sequences sequences.lib&lt;br /&gt;math.parser combinators kernel memoize csv symbols inspector&lt;br /&gt;words accessors math.order sorting ;&lt;br /&gt;IN: usa-cities&lt;/pre&gt;&lt;br /&gt;Then, we define some singleton types for the various states of the union. While this isn&apos;t strictly necessary, it allows us to write generic words which dispatch on states; for example, I&apos;m sure Doug&apos;s &lt;code&gt;taxes&lt;/code&gt; library could use this:&lt;br /&gt;&lt;pre&gt;SINGLETONS: AK AL AR AS AZ CA CO CT DC DE FL GA HI IA ID IL IN&lt;br /&gt;KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK&lt;br /&gt;OR PA PR RI SC SD TN TX UT VA VI VT WA WI WV WY ;&lt;br /&gt;&lt;br /&gt;: states ( -- seq )&lt;br /&gt;    {&lt;br /&gt;        AK AL AR AS AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY&lt;br /&gt;        LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK&lt;br /&gt;        OR PA PR RI SC SD TN TX UT VA VI VT WA WI WV WY&lt;br /&gt;    } ; inline&lt;br /&gt;&lt;br /&gt;ERROR: no-such-state name ;&lt;br /&gt;&lt;br /&gt;M: no-such-state summary drop &quot;No such state&quot; ;&lt;br /&gt;&lt;br /&gt;MEMO: string&gt;state ( string -- state )&lt;br /&gt;    dup states [ word-name = ] with find nip&lt;br /&gt;    [ ] [ no-such-state ] ?if ;&lt;/pre&gt;&lt;br /&gt;Next up, we define a data type storing rows from the CSV database:&lt;br /&gt;&lt;pre&gt;TUPLE: city&lt;br /&gt;first-zip name state latitude longitude gmt-offset dst-offset ;&lt;/pre&gt;&lt;br /&gt;Now a word which reads the database, parses it as CSV, and then parses each column into a specific data type:&lt;br /&gt;&lt;pre&gt;MEMO: cities ( -- seq )&lt;br /&gt;    &quot;resource:extra/usa-cities/zipcode.csv&quot; ascii &amp;lt;file-reader&gt;&lt;br /&gt;    csv rest-slice [&lt;br /&gt;        7 firstn {&lt;br /&gt;            [ string&gt;number ]&lt;br /&gt;            [ ]&lt;br /&gt;            [ string&gt;state ]&lt;br /&gt;            [ string&gt;number ]&lt;br /&gt;            [ string&gt;number ]&lt;br /&gt;            [ string&gt;number ]&lt;br /&gt;            [ string&gt;number ]&lt;br /&gt;        } spread city boa&lt;br /&gt;    ] map ;&lt;/pre&gt;&lt;br /&gt;This word is tricky; some notes on its workings:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;We begin by opening a stream for reading from the CSV file with ASCII encoding.&lt;/li&gt;&lt;li&gt;The &lt;code&gt;csv&lt;/code&gt; word reads CSV data from a stream.&lt;/li&gt;&lt;li&gt;The first line of the file consists of column headings and not actual data, so we ignore it by using the non-copying variant of &lt;code&gt;rest&lt;/code&gt;, &lt;code&gt;rest-slice&lt;/code&gt; (recall that the primary sequence type in Factor is an array, so removing the first element makes a copy; a slice is a virtual sequence presenting a view of a subsequence of an array).&lt;/li&gt;&lt;li&gt;The &lt;code&gt;spread&lt;/code&gt; combinator takes a sequence of quotations, &lt;code&gt;Q1,...,QN&lt;/code&gt;, and &lt;i&gt;n&lt;/i&gt; values from the stack, &lt;code&gt;X1,...XN&lt;/code&gt;, and outputs &lt;code&gt;Q1[X1],...,QN[XN]&lt;/code&gt;. In this case we&apos;re taking the first 7 elements of each row of CSV data (each row has exactly 7 columns so in effect we&apos;re pushing every element of the sequence on the stack), then using &lt;code&gt;spread&lt;/code&gt; to convert some columns from their initial text format into something more useful to us; a state singleton or a number.&lt;/li&gt;&lt;li&gt;Finally, we use &lt;code&gt;city boa&lt;/code&gt; to construct a city tuple &quot;by order of arguments&quot;; this slurps the 7 stack values and stores them into a new instance of &lt;code&gt;city&lt;/code&gt; (note that the definition of the &lt;code&gt;city&lt;/code&gt; type has exactly 7 slots and they are defined in the same order as the columns of the file).&lt;/li&gt;&lt;li&gt;Finally, we &lt;code&gt;map&lt;/code&gt; over the sequence of rows to perform the above steps on each row of the file. The result is a sequence of &lt;code&gt;city&lt;/code&gt; instances.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The word is memoized so of course it will only load the database once.&lt;br /&gt;&lt;br /&gt;We can now define words to query it:&lt;br /&gt;&lt;pre&gt;MEMO: cities-named ( name -- cities )&lt;br /&gt;    cities [ name&gt;&gt; = ] with filter ;&lt;br /&gt;&lt;br /&gt;MEMO: cities-named-in ( name state -- cities )&lt;br /&gt;    cities [&lt;br /&gt;        tuck [ name&gt;&gt; = ] [ state&gt;&gt; = ] 2bi* and&lt;br /&gt;    ] with with filter ;&lt;br /&gt;&lt;br /&gt;: find-zip-code ( code -- city )&lt;br /&gt;    cities [ first-zip&gt;&gt; &amp;lt;=&gt; ] binsearch* ;&lt;/pre&gt;&lt;br /&gt;Now, let&apos;s look at some examples.&lt;br /&gt;&lt;br /&gt;First, let&apos;s look up my Zip code:&lt;br /&gt;&lt;pre&gt;( scratchpad ) 55406 find-zip-code .&lt;br /&gt;T{ city f 55406 &quot;Minneapolis&quot; MN 44.938615 -93.22082 -6 1 }&lt;/pre&gt;&lt;br /&gt;And another famous Zip code:&lt;br /&gt;&lt;pre&gt;( scratchpad ) 90210 find-zip-code .&lt;br /&gt;T{ city f 90210 &quot;Beverly Hills&quot; CA 34.088808 -118.40612 -8 1 }&lt;/pre&gt;&lt;br /&gt;How many states have a city named &quot;Minneapolis&quot;?&lt;br /&gt;&lt;pre&gt;( scratchpad ) &quot;Minneapolis&quot; cities-named [ state&gt;&gt; ] map prune .&lt;br /&gt;V{ NC MN KS }&lt;/pre&gt;&lt;br /&gt;What is the possible range of Zip codes for Austin?&lt;br /&gt;&lt;pre&gt;( scratchpad ) &quot;Austin&quot; TX cities-named-in [ first-zip&gt;&gt; ] map [ infimum . ] [ supremum . ] bi&lt;br /&gt;73301&lt;br /&gt;78972&lt;/pre&gt;&lt;br /&gt;There are many possible applications for this library, including form validation in web apps. It could be extended further: if the database was loaded into a quadtree sorted by latitude/longitude, you could perform queries such as finding all towns within 50 miles of a given city.</content></entry><entry><title type="html">Slava Pestov: An addendum to &quot;The new HTTP server, part 2&quot;</title><link href="http://factor-language.blogspot.com/2008/04/addendum-to-new-http-server-part-2.html"/><published>2008-04-29T17:45:00.003-04:00</published><content type="html">If you run the web app presented in the last blog post verbatim, you will get a &quot;500 Internal server error&quot; with no further indication of what&apos;s going wrong. This is because the code I presented has a minor omission.&lt;br /&gt;&lt;br /&gt;The opaque error message is intentional: if your web app crashes, you don&apos;t necessarily want to expose internal details to every user that comes along (one famous case was &lt;a href=&quot;http;//reddit.com&quot;&gt;reddit.com&lt;/a&gt;, which leaked a portion of their Python codebase inside a stack trace at some point). However, if you set the &lt;code&gt;development-mode&lt;/code&gt; global variable to a true value, the behavior of the HTTP server changes in two respects:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;If an error occurs, the error page contains the error message as well as the full stack trace.&lt;/li&gt;&lt;li&gt;Every request begins by calling &lt;code&gt;refresh-all&lt;/code&gt;, thus interactive testing of web app changes becomes very straightforward.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;If we enable development mode, we see the real error message, &quot;No such table: SESSIONS&quot;. This is because I didn&apos;t mention that one must initialize the database by creating the table for storing sessions first:&lt;br /&gt;&lt;pre&gt;&quot;counter.db&quot; sqlite-db [ init-sessions-table ] with-db&lt;/pre&gt;&lt;br /&gt;In the next installment of this series, which hopefully won&apos;t take as long as the second one did, I will discuss form validation and the templating system.</content></entry><entry><title type="html">Slava Pestov: Write once, run anywhere</title><link href="http://factor-language.blogspot.com/2008/04/write-once-run-anywhere.html"/><published>2008-04-29T17:25:00.003-04:00</published><content type="html">Apple finally released &lt;a href=&quot;http://docs.info.apple.com/article.html?artnum=307403&quot;&gt;Java 6 for Mac OS X&lt;/a&gt;, about a year late, and it only runs on 64-bit Intel Macs. It also requires Leopard, which isn&apos;t such a big deal because anyone can upgrade. I sold my G5 but a lot of people out there still use PowerPC Macs and I guess they&apos;re just screwed. It&apos;s a shame too because Java 6 is the first release that&apos;s starting to approach true usability for desktop applications.</content></entry><entry><title type="html">Daniel Ehrenberg: Potential ideas to explore</title><link href="http://useless-factor.blogspot.com/2008/04/potential-ideas-to-explore.html"/><published>2008-04-29T15:48:00.000-07:00</published><content type="html">I haven&apos;t written in a while, and it&apos;s a little hard to get started back up, so here are just a bunch of random ideas in my head that I&apos;d like to share with you guys. Sorry if it&apos;s a little incoherent...&lt;br /&gt;&lt;h3&gt;Possible extensions to Inverse&lt;/h3&gt;I&apos;ve been thinking about possible ways to generalize my system for concatenative pattern matching, currently in &lt;code&gt;extra/inverse&lt;/code&gt;. There are two ways to go about it: making a more general constraint solving system, and giving access to the old input when inverting something, as in the Harmony project. A third way is to add backtracking (in a different place than constraint solving would put it). To someone familiar with Inverse, these might seem like they&apos;re coming from nowhere, but they&apos;re actually very closely related. (To someone not familiar with it, see &lt;a href=&quot;http://useless-factor.blogspot.com/2007/06/concatenative-pattern-matching.html&quot;&gt;my previous blog post describing Inverse&lt;/a&gt;.)&lt;br /&gt;&lt;h4&gt;Constraint solving&lt;/h4&gt;The idea of resolving constraints is to figure out as much as you can about a situation given certain facts. This is easy in some cases, but impossible in others, even if enough facts are known to, potentially, figure out what everything is. For example, Diophantine equations can be solved by a fully general constraint-solving system, but they&apos;re known to be undecidable in general.&lt;br /&gt;&lt;br /&gt;So what can constraint solving get you in Inverse? Well, imagine an inverse to &lt;code&gt;bi&lt;/code&gt;. It&apos;s not difficult to make one within the current framework, but some information is lost: everything must be completely determined. Think about inverting &lt;code&gt;[ first ] [ second ] bi&lt;/code&gt;. Inverting this should get the same result as &lt;code&gt;first2&lt;/code&gt; (which has a hard-coded inverse right now, inverting to &lt;code&gt;2array&lt;/code&gt;). But it won&apos;t work.&lt;br /&gt;&lt;br /&gt;A way for &lt;code&gt;[ first ] [ second ] bi&lt;/code&gt; to work would be using the following steps:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Initialize a logic variable X as unbound&lt;/li&gt;&lt;li&gt;Unify X with the information, &quot;the first element is what&apos;s second from the top of the stack (at runtime)&quot;. Now it&apos;s known that X is a sequence of length at least 1.&lt;/li&gt;&lt;li&gt;Unify X with the information, &quot;the second element is what&apos;s on the top of the stack (at runtime)&quot;. Now it&apos;s know that X is a sequence of length at least two.&lt;/li&gt;&lt;li&gt;From the information we have about X, produce a canonical representation, since the inverted quotation is over: an array of the minimum possible length.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;This isn&apos;t easy to do in general, but it should be possible, in theory. It&apos;d be extremely cool if it worked out.&lt;br /&gt;&lt;br /&gt;Formally, you can think of Inverse as already a reasonable constraint solving system, for a limited problem domain. Given [ f ], and the statement about stacks A and B that f(A) = B, and given B, find a possible value for A.  The strategy used right now is mathematically sound, and I hope to write it up some day. But, a more general use of logic variables is possible: explicit logic variables in code. This could be used to make a better-integrated logic language in Factor.&lt;br /&gt;&lt;h4&gt;The Harmony Project&lt;/h4&gt;&lt;br /&gt;The &lt;a href=&quot;http://www.seas.upenn.edu/~harmony/&quot;&gt;Harmony Project&lt;/a&gt;, led by Benjamin C. Pierce, is an attempt to solve the &quot;view-update problem&quot; using a new programming language and type system which is largely invertible. The view-update problem is that we want to convert different storage formats into an abstract representation, manipulate that representation and put it back without duplicating code about the representation. Everything operates on edge-labeled trees.&lt;br /&gt;&lt;br /&gt;Within the Harmony framework, it&apos;s possible to do all your work in bijections (one-to-one onto functions, similar but not identical to the domain of Inverse right now), but there&apos;s extra power included: the function to put the abstract representation back into the original form has access to the original. This adds a huge amount of power, giving the possibility of conditionals and recursion, in limited cases. Also, it gives the power to ignore certain things about the surface structure when looking at the abstract form. (Harmony also has ideas about tree merging, and of course a new type system, but I&apos;m not as interested in that right now.)&lt;br /&gt;&lt;br /&gt;So far, only relatively trivial things have been made with Harmony, but the idea looks really useful, though there are two problems: (1) I don&apos;t really understand it fully (like constraints) and (2) I have no idea how it can fit together with Inverse as it is right now.&lt;br /&gt;&lt;h4&gt;Backtracking&lt;/h4&gt;In &lt;a href=&quot;http://citeseer.ist.psu.edu/337368.html&quot;&gt;Mark Tullsen&apos;s paper on first-class patterns&lt;/a&gt;, there was an interesting idea that Inverse could adopt. Tullsen used monads to sequence the patterns. It&apos;s the simplest to use the Maybe monad, and that corresponds to how pattern matching systems normally work. But if the List monad is used instead, then you easily get backtracking. This could be ported to Factor either by using monads or, maybe easier, by using continuations. Years ago, Chris Double implemented amb in Factor using continuations, though the code won&apos;t work anymore. The sequencing and backtracking I&apos;m talking about is relevant in things like &lt;code&gt;switch&lt;/code&gt; statements, rather than &lt;code&gt;undo&lt;/code&gt; itself. I&apos;m not sure if it&apos;d actually be useful in practice.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Garbage collection research ideas&lt;/h3&gt;Because the summer&apos;s coming up, and I&apos;ll be participating in Harvey Mudd&apos;s Garbage Collection REU, I&apos;ve been coming up with a few research ideas. The suggested one is to continue with the work of previous years&apos; REUs and think about simplifiers and collecting certain persistent data structures and weak hashtables, but here are a couple more:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Figure out how efficient garbage collection on Non-Uniform Memory Access systems can work.&lt;/strong&gt; The problem (if it is a problem) is that plain old garbage collection on multiprocessor NUMA systems isn&apos;t as fast as it could be, because a chunk of memory allocated for a thread may be far away from where it&apos;s used. One way to ensure locality is to give each processor (at least) its own heap, where the heap is guaranteed to be stored in the closest memory. But if data needs to be shared between processors, this can be too limiting. A piece of data can be kept on the RAM closest the processor which made the allocating call, but maybe it&apos;d be beneficial to collect data on which processor is using which data, and dynamically move data around to different places in RAM to put it closest to where it&apos;s used. A related issue is maximizing locality when actually performing the tracing in the GC, which I have no ideas about.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Run a real benchmark comparing several GC algorithms.&lt;/strong&gt; Probably the most annoying thing for programming language implementors trying to pick a good GC algorithm is that &lt;em&gt;there&apos;s no comprehensive benchmark to refer to&lt;/em&gt;. No one really knows which algorithm is the fastest, so there are two strategies remaining: pick the one that sounds the fastest, or do trial and error among just a few. Each paper about a new algorithm reports speed improvements&amp;mdash;over significantly older algorithms. It&apos;d be a big project, but I think it&apos;s possible to make a good benchmark suite and test how long it takes for these algorithms to run, in terms of absolute throughput and pause length and frequency, given different allocation strategies. If it&apos;s possible, it&apos;d be nice to know what kind of GC performs best given a particular memory use pattern.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Garbage collector implementation in proof-carrying code.&lt;/strong&gt; There are a couple invariants that garbage collectors have, that must be preserved. For example, the user can&apos;t be exposed to any forwarding pointers, and a new garbage collection can&apos;t be started when forwarding pointers exist. The idea of proof-carrying code (an explicit proof, which is type-checked to be accurate, is given with the code) isn&apos;t new; it&apos;s mostly been used to prove memory consistency safety given untrusted code. But maybe it could be used to prove that a GC implementation is correct.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;These ideas are really difficult, but I think they&apos;re interesting, and with four other smart people working with me, maybe in a summer we can do something really cool, like this or whatever other idea they come up with.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Ragel-style state machines in Factor&lt;/h3&gt;In my Automata and Computability class at Carleton, we&apos;ve been studying (what else) finite automata, and it got me thinking about regular expressions and their utility in Factor. By regular expression, I mean an expression denoting a regular language: a real, academic regexp. A regular language is one that can be written as a deterministic finite automaton (finite state machine). Hopefully, I&apos;ll explain more about this in a future blog post.&lt;br /&gt;&lt;br /&gt;Anyway, if you&apos;ve heard of &lt;a href=&quot;http://www.cs.queensu.ca/~thurston/ragel/&quot;&gt;Ragel&lt;/a&gt;, it&apos;s basically what I want to do. But the form it&apos;d take is basically the same as PEGs (Chris Double&apos;s Pacrat parser), with the one restriction that no recursion is allowed. In return for this restriction, there is no linear space overhead. Basically everything else, as far as I know, could stay the same.&lt;br /&gt;&lt;br /&gt;I&apos;m thinking I&apos;ll redo the XML parser with this. The SAX-like view will be done with this regular languages parser (since all that&apos;s needed is a tokenizer), and then that can be formed into a tree using PEGs (since linear space overhead is acceptable there). Linear space overhead, by the way, is unacceptable for the SAX-like view, since it should be usable for extremely large documents that couldn&apos;t easily fit in memory all at once.&lt;br /&gt;&lt;br /&gt;(By the way, I know Ragel also allows you to explicitly make state charts, but I won&apos;t include this until I see a place where I want to use it.)</content></entry><entry><title type="html">Slava Pestov: The new HTTP server, part 2</title><link href="http://factor-language.blogspot.com/2008/04/new-http-server-part-2.html"/><published>2008-04-29T03:04:00.004-04:00</published><content type="html">It&apos;s been a month and a half since &lt;a href=&quot;http://factor-language.blogspot.com/2008/03/new-http-server-part-1.html&quot;&gt;the first part of this series&lt;/a&gt;. Why the long delay? I&apos;ve been busy with other things. I implemented &lt;a href=&quot;http://factor-language.blogspot.com/2008/04/inheritance-is-done.html&quot;&gt;inheritance&lt;/a&gt;, various &lt;a href=&quot;http://factor-language.blogspot.com/2008/04/performance-improvements.html&quot;&gt;compiler optimizations&lt;/a&gt;, and many other things. In the last couple of weeks I&apos;ve been working on the web framework again, tying up some loose ends and porting more existing web applications over (namely, the pastebin and planet factor).&lt;br /&gt;&lt;br /&gt;In this entry I will talk about session management. Session management was one of the first things I implemented in the new framework when I started working on it, but recently I gave the code an overhaul.&lt;br /&gt;&lt;h3&gt;Session management&lt;/h3&gt;&lt;br /&gt;The basic idea behind session management is that while HTTP is a stateless protocol, we can simulate state by sending a token to the client -- either in the form of a hidden element on the page, or a cookie, which the client sends back to the server with a later request. This token is associated with an object on the server and the object holds state between requests.&lt;br /&gt;&lt;br /&gt;Another approach for session management is to store state entirely on the client; instead of sending the client a session ID identifying an object on the server, you send the session data itself to the client. Traditionally this approach has only been used for user preferences and such where security is immaterial, but it can even be used for more sensitive data by encrypting it with a private key only known to the server. The client receives an opaque blob of binary data which cannot be inspected or tampered with (unless the public key encryption algorithm being used is compromised).&lt;br /&gt;&lt;br /&gt;Currently Factor&apos;s session manager does not support client-side sessions, but it will soon, using &lt;a href=&quot;http://code-factor.blogspot.com&quot;&gt;Doug Coleman&apos;s&lt;/a&gt; public-key encryption code. Server-side sessions are supported, however.&lt;br /&gt;&lt;br /&gt;The session manager uses two main strategies to pass state to the client:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;For GET and HEAD requests, a cookie is used. The cookie&apos;s value is a randomly-generated session ID.&lt;/li&gt;&lt;li&gt;For POST requests, the form must define a hidden field with the session ID. The value of the cookie is ignored to thwart cross-site scripting attacks.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The idea is to strike a balance between security and convenience; we don&apos;t want to add a session ID to every link and start a new session if the user navigates to the site by directly entering a URL, but on the other hand we don&apos;t want potentially destructive POST requests to be accepted unless they were sent by a form generated from within the session itself.&lt;br /&gt;&lt;br /&gt;In Factor, a session is simply a hashtable where values can be stored. Keys are known as &quot;session variables&quot; and values can be read and written with the &lt;code&gt;sget&lt;/code&gt; and &lt;code&gt;sset&lt;/code&gt; words, there&apos;s also a &lt;code&gt;schange&lt;/code&gt; combinator which applies a quotation is applied to an existing session variable to yield a new value. This all entirely analogous to the &lt;code&gt;get&lt;/code&gt;/&lt;code&gt;set&lt;/code&gt;/&lt;code&gt;change&lt;/code&gt; words for dynamic variables.&lt;br /&gt;&lt;br /&gt;Session namespaces are serialized and stored in a database using Doug&apos;s &lt;code&gt;db.tuples&lt;/code&gt; O/R mapper. I originally supported pluggable &quot;session storage&quot; backends, with database storage and in-memory storage as the two options, however I decided to simplify the code and hardcode database storage. This has the side-effect that you&apos;ll need to set up a database to use the session management feature, however SQLite presents a lightweight option which requires no configuration, so I don&apos;t think this is a big deal at all.&lt;br /&gt;&lt;br /&gt;I will show a small example of a &apos;counter&apos; web application, much like the &lt;a href=&quot;http://seaside.st/about/examples/counter&quot;&gt;counter example&lt;/a&gt; for the Seaside framework.&lt;br /&gt;&lt;br /&gt;We start off with a vocabulary search path:&lt;br /&gt;&lt;pre&gt;USING: math kernel accessors http.server http.server.actions&lt;br /&gt;http.server.sessions http.server.templating.fhtml locals ;&lt;br /&gt;IN: webapps.counter&lt;/pre&gt;&lt;br /&gt;Now, we define a symbol used to key a session variable:&lt;br /&gt;&lt;pre&gt;SYMBOL: count&lt;/pre&gt;&lt;br /&gt;Next, we define a pair of actions which increment the counter value, using the &lt;code&gt;schange&lt;/code&gt; combinator. The &lt;code&gt;display&lt;/code&gt; slot of an action contains code to be executed upon a GET request; it is expected to output a response object. In our case, the word outputs an action which applies the quotation to the current counter value; the action outputs a response which redirects back to the main page:&lt;br /&gt;&lt;pre&gt;:: &amp;lt;counter-action&gt; ( quot -- action )&lt;br /&gt;    &amp;lt;action&gt; [&lt;br /&gt;        count quot schange&lt;br /&gt;        &quot;&quot; f &amp;lt;standard-redirect&gt;&lt;br /&gt;    ] &gt;&gt;display ;&lt;/pre&gt;&lt;br /&gt;The action to decrement the counter is entirely analogous:&lt;br /&gt;&lt;pre&gt;: &amp;lt;dec-action&gt; ( -- action )&lt;br /&gt;    &amp;lt;action&gt; [ count [ 1- ] schange f ] &gt;&gt;display ;&lt;/pre&gt;&lt;br /&gt;Note that this word &lt;i&gt;constructs&lt;/i&gt; actions, instead of invoking them. This approach is more flexible than the old &quot;furnace&quot; web framework, where actions were mapped directly to word execution, because it allows one to write &quot;higher-order actions&quot; parametrized by values more easily.&lt;br /&gt;&lt;br /&gt;Here is the default action; it displays the counter value using a template:&lt;br /&gt;&lt;pre&gt;: &amp;lt;counter-action&gt; ( -- action )&lt;br /&gt;    &amp;lt;action&gt; [&lt;br /&gt;        &quot;resource:extra/webapps/counter/counter.fhtml&quot; &amp;lt;fhtml&gt;&lt;br /&gt;    ] &gt;&gt;display ;&lt;/pre&gt;&lt;br /&gt;Finally we put everything together in a dispatcher:&lt;br /&gt;&lt;pre&gt;: &amp;lt;counter-app&gt; ( -- responder )&lt;br /&gt;    counter-app new-dispatcher&lt;br /&gt;        [ 1+ ] &amp;lt;counter-action&gt; &quot;inc&quot; add-responder&lt;br /&gt;        [ 1- ] &amp;lt;counter-action&gt; &quot;dec&quot; add-responder&lt;br /&gt;        &amp;lt;display-action&gt; &quot;&quot; add-responder&lt;br /&gt;    &amp;lt;sessions&gt; ;&lt;/pre&gt;&lt;br /&gt;We create a dispatcher, add instances of our actions to it, and wrap the whole thing in a session manager.&lt;br /&gt;&lt;br /&gt;Now, the template:&lt;br /&gt;&lt;pre&gt;&amp;lt;% USING: io math.parser http.server.sessions webapps.counter ; %&gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;html&gt;&lt;br /&gt;    &amp;lt;body&gt;&lt;br /&gt;        &amp;lt;h1&gt;&amp;lt;% count sget number&gt;string write %&gt;&amp;lt;/h1&gt;&lt;br /&gt;&lt;br /&gt;        &amp;lt;a href=&quot;inc&quot;&gt;++&amp;lt;/a&gt;&lt;br /&gt;        &amp;lt;a href=&quot;dec&quot;&gt;--&amp;lt;/a&gt;&lt;br /&gt;    &amp;lt;/body&gt;&lt;br /&gt;&amp;lt;/html&gt;&lt;/pre&gt;&lt;br /&gt;Finally, once we have all the parts, we can create the counter responder and start the HTTP server:&lt;br /&gt;&lt;pre&gt;&amp;lt;counter-app&gt; &quot;test.db&quot; sqlite-db &amp;lt;db-persistence&gt; main-responder set&lt;br /&gt;8888 httpd&lt;/pre&gt;&lt;br /&gt;Note that here we wrap the counter responder in another layer of indirection, this time for database persistence; while the counter web app doesn&apos;t use persistence the session manager does, and we chose to use SQLite since it requires no configuration or external services.&lt;br /&gt;&lt;br /&gt;Navigating over to http://localhost:8888/ should now display the counter app, and clicking the increment and decrement links should have an effect on the displayed value. Sessions persist between server restarts and time out after 20 minutes of inactivity by default. Looking at your web browser&apos;s cookie manager will show that a &lt;code&gt;factorsessid&lt;/code&gt; cookie has been set.&lt;br /&gt;&lt;br /&gt;As an aside, the Seaside version uses continuations to maintain state. The Factor version explicitly maintains state. Even though I ported &lt;a href=&quot;http://bluishcoder.co.nz&quot;&gt;Chris Double&apos;s&lt;/a&gt; modal web framework over to the new HTTP server, I&apos;m avoiding continuations in favor of explicit state for now. I am building up a form component framework with validation, easy persistence, and user authentication without resorting to continuations, and I plan on building a state-machine model with a page flow DSL, much like &lt;a href=&quot;http://www.jboss.com/products/jbpm&quot;&gt;jBPM&lt;/a&gt;, to handle more complex multi-page flows such as shopping carts. While this will result in more work for me, I believe the benefits include transparent support for load-balancing and fail-over, readable URLs, and ultimately, simpler and more reusable web application code because page flow can be decoupled from logic and expressed in a custom DSL intended for that purpose.&lt;br /&gt;&lt;br /&gt;The Seaside version is also somewhat shorter; it is easy to express with idiomatic Seaside (transparent session management, presentation logic mixed in with web app code). I will add better abstractions to make up for some of the difference, and for larger applications there should be no difference in code size; in fact since the scope of Factor&apos;s framework is wider than Seaside (it covers persistence, authentication and validation, and soon, versioning of persistent entities) you might even need less code to accomplish the same thing.&lt;br /&gt;&lt;h3&gt;Virtual hosting&lt;/h3&gt;&lt;br /&gt;The other topic I promised to cover last time was virtual hosting. Virtual hosting is done with dispatchers, much like nested directory structure is. You create a virtual host dispatcher with &lt;code&gt;&amp;lt;vhost-dispatcher&gt;&lt;/code&gt; and add responders for various virtual hosts using &lt;code&gt;add-responder&lt;/code&gt;; the &lt;code&gt;&gt;&gt;default&lt;/code&gt; slot can be used to set the default virtual host.  The key difference between the new approach and the old HTTP server virtual hosting implementation, which relied on a global hashtable mapping virtual host names to responders, is flexibility; the virtual host dispatcher does not necessarily have to be your top-level responder.&lt;br /&gt;&lt;br /&gt;For example, the &lt;code&gt;&amp;lt;boilerplate&gt;&lt;/code&gt; responder gives you a way of enforcing a common look and feel across a set of web apps, by adding common headers and footers to every page. While I will describe boilerplate responders and the template system in more detail in a later post, for now here is an example:&lt;br /&gt;&lt;pre&gt;&amp;lt;vhost-dispatcher&gt;&lt;br /&gt;    &amp;lt;online-store&gt; &quot;store.acme.com&quot; add-responder&lt;br /&gt;    &amp;lt;support-site&gt; &quot;support.acme.com&quot; add-responder&lt;br /&gt;    &amp;lt;main-site&gt; &quot;acme.com&quot; add-responder&lt;br /&gt;&amp;lt;boilerplate&gt;&lt;br /&gt;    &quot;acme-site.xml&quot; &gt;&gt;template&lt;br /&gt;acme-db &amp;lt;db-persistence&gt;&lt;br /&gt;&amp;lt;sessions&gt; main-responder set&lt;/pre&gt;&lt;br /&gt;Here, all virtual hosts share the same session management, database persistence, and common theme, and the virtual host dispatch only happens after the request filters through the mentioned layers of functionality. This would not be possible with the old HTTP server without duplicating code.&lt;br /&gt;&lt;h3&gt;Cookies&lt;/h3&gt;&lt;br /&gt;Finally I promised to talk about cookies. The session management support is great but sometimes you just want to get and set cookies directly. This can be done by reading the &lt;code&gt;cookies&lt;/code&gt; slot of the request object, and writing the &lt;code&gt;cookies&lt;/code&gt; slot of the response object. The slot contains a sequence of &lt;code&gt;cookie&lt;/code&gt; objects, which are parsed and unparsed from their HTTP representation for you. A cookie object contains a series of slots, such as name, value, expiration date (as a Factor timestamp object), max-age (as a Factor duration object), path, and host. While the expiration date is deprecated as of HTTP/1.1, most sites still use it in favor of max-age because older browsers don&apos;t support max-age. Factor&apos;s HTTP server sets the date header on each response so that expiration dates can work correctly.&lt;br /&gt;&lt;br /&gt;Here is an example of using the HTTP client (which shares the cookie code with the server) to look at Google&apos;s ridiculously long-lived cookies:&lt;br /&gt;&lt;pre&gt;( scratchpad ) &quot;http://www.google.com&quot; http-get-stream drop cookies&gt;&gt; first describe&lt;br /&gt;cookie instance&lt;br /&gt;&quot;delegate&quot;  f&lt;br /&gt;&quot;name&quot;      &quot;pref&quot;&lt;br /&gt;&quot;value&quot;     &quot;ID=c0f4c074cd87502e:TM=1209466656:LM=1209466656:S=_6gGEKtuTgP...&quot;&lt;br /&gt;&quot;path&quot;      &quot;/&quot;&lt;br /&gt;&quot;domain&quot;    &quot;.google.com&quot;&lt;br /&gt;&quot;expires&quot;   T{ timestamp f 2010 4 29 10 57 36 ~duration~ }&lt;br /&gt;&quot;max-age&quot;   f&lt;br /&gt;&quot;http-only&quot; f&lt;/pre&gt;</content></entry><entry><title type="html">Doug Coleman: Word renaming (part 2)</title><link href="http://code-factor.blogspot.com/2008/04/word-renaming-part-2.html"/><published>2008-04-28T10:49:00.000-07:00</published><content type="html">Here are the word names that have changed:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;find* -&gt; find-from&lt;/li&gt;&lt;li&gt;find-last* -&gt; find-last-from&lt;/li&gt;&lt;li&gt;index* -&gt; index-from&lt;/li&gt;&lt;li&gt;last-index* -&gt; last-index-from&lt;/li&gt;&lt;li&gt;subset -&gt; filter&lt;/li&gt;&lt;/ul&gt;New shorthand words:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;1 tail -&gt; rest&lt;/li&gt;&lt;li&gt;1 tail-slice -&gt; rest-slice&lt;/li&gt;&lt;li&gt;swap compose -&gt; prepose&lt;/li&gt;&lt;/ul&gt;Changes to existing word behavior:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;reverse stack effect of assoc-diff, diff&lt;/li&gt;&lt;li&gt;before? after? before=? after=? are now generic&lt;/li&gt;&lt;li&gt;min, max can compare more objects than before, such as timestamps&lt;br /&gt;&lt;/li&gt;&lt;li&gt;between? can compare more objects&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;=&gt; returns symbols&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;There are several motivations at work here.  One is that words named &lt;code&gt;foo*&lt;/code&gt; are a variant of &lt;code&gt;foo&lt;/code&gt;, but otherwise the * is no help to what the word actually does differently.  We&apos;re trying to move away from word names with stars to something more meaningful.&lt;br /&gt;&lt;br /&gt;Code with common patterns that come up a lot, like &lt;code&gt;1 tail&lt;/code&gt; and &lt;code&gt;swap compose&lt;/code&gt;, is more clearly understood if these patterns are given a single name.&lt;br /&gt;&lt;br /&gt;Factor&apos;s subset is not equivalent to the mathematical definition of subset, so it was renamed to &lt;code&gt;filter&lt;/code&gt; to avoid confusion.  Along these same lines, &lt;code&gt;diff&lt;/code&gt; and &lt;code&gt;assoc-diff&lt;/code&gt; are now the more mathematically intuitive; you can think of diff like set subtraction now, &lt;code&gt;seq1 seq2 diff&lt;/code&gt; is like seq1 - seq2.&lt;br /&gt;&lt;br /&gt;Finally, the &quot;UFO operator&quot; &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; now returns symbols &lt;code&gt;+lt+ +eq+ +gt+&lt;/code&gt; instead of negative, zero, and positive numbers.  The &lt;code&gt;before?&lt;/code&gt; and &lt;code&gt;after?&lt;/code&gt; words can compare anything that defines a method on this operator.  Since &lt;code&gt;between?&lt;/code&gt;, &lt;code&gt;min&lt;/code&gt;, and &lt;code&gt;max&lt;/code&gt;&lt;br /&gt;are defined in terms of these comparison words, they also work on more objects.&lt;br /&gt;&lt;br /&gt;Please let me know if you have any more suggestions for things words that have awkward argument orders, imprecise names, or if you can suggest alternate names for words with stars in their names.</content></entry><entry><title type="html">Slava Pestov: Performance improvements</title><link href="http://factor-language.blogspot.com/2008/04/performance-improvements.html"/><published>2008-04-19T04:56:00.008-04:00</published><content type="html">Over the last three days I spent some time improving Factor&apos;s compiler. I made the following improvements:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Partial dispatch is performed on integer arithmetic operations. Previously, Factor&apos;s compiler would convert generic arithmetic to machine arithmetic when it knew the exact types involved; so if you had two floats on the stack, &lt;code&gt;+&lt;/code&gt; would become &lt;code&gt;float+&lt;/code&gt;, and if you had two fixnums it would become &lt;code&gt;fixnum+&lt;/code&gt; or &lt;code&gt;fixnum+fast&lt;/code&gt;, depending on whether interval inference determined if the overflow check was needed or not. While this worked well in many cases there were a lot of instances where the compiler could only infer you were dealing with integers, but not fixnums in particular; either because interval inference was not smart enough, or because the values really could be out of bounds for a fixnum. Now, if it knows that both inputs are integers, it compiles a call to a special word which still performs a dispatch, but with only two possibilities. This is a win, because a conditional is faster than a jump table as used in the generic &lt;code&gt;+&lt;/code&gt;. It is an even bigger win if one of the inputs is known to be a fixnum, because then the two jump table dispatches are replaced by a single conditional.&lt;/li&gt;&lt;li&gt;Improved overflow check elimination. First, there is an enabling optimization. Suppose we have a positive integer on the stack. Then, the following two are equivalent:&lt;br /&gt;&lt;pre&gt;4096 mod&lt;br /&gt;4095 bitand&lt;/pre&gt;&lt;br /&gt;The optimizer now recognizes the case where mod is applied with a power of two to a positive integer, and converts it to a bitwise and. For the &lt;code&gt;rem&lt;/code&gt; word an even more general optimization is possible; we only have to assume the first input is an integer and the second is a power of two.&lt;br /&gt;&lt;br /&gt;While on a modern CPU, this is not a big in itself for &lt;code&gt;mod&lt;/code&gt; (it is for &lt;code&gt;rem&lt;/code&gt;, which is more expensive), it does enable the following optimization. Note that the following two expressions are equivalent:&lt;br /&gt;&lt;pre&gt;4095 bitand&lt;br /&gt;16777215 bitand 4095 bitand&lt;/pre&gt;&lt;br /&gt;That is, if you mask off the first &lt;code&gt;n&lt;/code&gt; bits, then the first &lt;code&gt;m&lt;/code&gt; bits, and &lt;code&gt;m&amp;lt;n&lt;/code&gt;, you get the same result as masking off the first &lt;code&gt;m&lt;/code&gt; bits. This might seem trivial and useless and first, but then you realize that a truncating conversion of a positive bignum to a fixnum is simply masking off bits! So the following two are equivalent:&lt;br /&gt;&lt;pre&gt;4095 bitand&lt;br /&gt;&gt;fixnum 4095 bitand&lt;/pre&gt;&lt;br /&gt;But of course, both inputs to the latter &lt;code&gt;bitand&lt;/code&gt; are fixnums now, and can be converted to:&lt;br /&gt;&lt;pre&gt;4095 bitand&lt;br /&gt;&gt;fixnum 4095 fixnum-bitand&lt;/pre&gt;&lt;br /&gt;It gets even better. Consider the following piece of code from &lt;code&gt;project-euler.150&lt;/code&gt;:&lt;br /&gt;&lt;pre&gt;: next ( t -- new-t s )&lt;br /&gt;    615949 * 797807 + 20 2^ rem dup 19 2^ - ;&lt;/pre&gt;&lt;br /&gt;Constant folding gives us:&lt;br /&gt;&lt;pre&gt;: next ( t -- new-t s )&lt;br /&gt;    615949 * 797807 + 1048576 rem dup 524288 - ;&lt;/pre&gt;&lt;br /&gt;The above optimization, together with an overflow removal on the &lt;code&gt;-&lt;/code&gt;, gives us:&lt;br /&gt;&lt;pre&gt;: next ( t -- new-t s )&lt;br /&gt;    15949 * 797807 + &gt;fixnum 1048575 fixnum-bitand dup 524288 fixnum-fast ;&lt;/pre&gt;&lt;br /&gt;There is an existing optimization takes advantage of these identities:&lt;br /&gt;&lt;pre&gt;* &gt;fixnum == [ &gt;fixnum ] bi@ fixnum*fast&lt;br /&gt;+ &gt;fixnum == [ &gt;fixnum ] bi@ fixnum+fast&lt;/pre&gt;&lt;br /&gt;By applying the above, the compiler converts this code to:&lt;br /&gt;&lt;pre&gt;: next ( t -- new-t s )&lt;br /&gt;    &gt;fixnum 15949 fixnum*fast 797807 fixnum+fast 1048575 fixnum-bitand dup 524288 fixnum-fast ;&lt;/pre&gt;&lt;br /&gt;All generic arithmetic and overflow checks have been removed, because of a single &lt;code&gt;rem&lt;/code&gt; in the middle of the calculation!&lt;br /&gt;&lt;br /&gt;In &lt;code&gt;project-euler.150&lt;/code&gt;, this word was actually used inside a loop, where it was iteratively applied to a starting generator value. With the new modular arithmetic optimization, Factor&apos;s existing interval inference code managed to infer the result of the word is always in the interval &lt;code&gt;(-2^20,2^20)&lt;/code&gt;, and since the initial value is &lt;code&gt;0&lt;/code&gt;, even the call to &lt;code&gt;&gt;fixnum&lt;/code&gt; was optimized away.&lt;/li&gt;&lt;li&gt;I improved Factor&apos;s type inference algorithm to better deal with local recursive code blocks. While type inference worked okay with loops, it didn&apos;t fare so well with binary recursive functions because it wasn&apos;t able to obtain any information about return values of the recursive calls. Everybody&apos;s favorite binary recursive benchmark, fibonacci, was one example of this situation. Solving this required a bit of thought.&lt;br /&gt;&lt;br /&gt;Previously, type inference for recursive functions was done in a pretty ad-hoc manner. Factor&apos;s type inference is only used for optimization so it is okay if it is conservative. It worked pretty well, however today I decided to add better support for recursion, and I also found a bug where it would infer invalid types, so I decided to redo it a little.&lt;br /&gt;&lt;br /&gt;Type inference of recursive functions is now an iterative process, where you begin by assuming the recursive calls take the top type as inputs and the bottom type as outputs, then you infer the type of the body under these assumptions, then apply the resulting types to the recursive calls, then infer again, and so on, until a fixed point is reached. There are smarter ways of doing this with other type systems however in Factor, the type functions of some words are pretty complex. For example, the output type of &lt;code&gt;+&lt;/code&gt; depends on not only the types of its inputs, but the interval ranges they lie in, so I&apos;m not sure if there&apos;s any other solution.&lt;br /&gt;&lt;br /&gt;With this out of the way, there is one major remaining limitation in Factor&apos;s type inference, and that is it only works within words. It does not attempt to infer types across word boundaries. In practice this means many optimizations only kick in if a lot of words are inlined, which is not practical in some cases due to code size explosion. My next project in this area is to cache type signatures for words and use this information when inferring types of callers.&lt;/li&gt;&lt;li&gt;I improved performance of inline object allocation. In the VM, the structure holding the allocation pointer is now stored directly in a global variable, instead of a pointer to it being stored in the global variable. This is one less indirection for inline allocators. Another improvement is that the heap exhaustion check is done in the allocator itself, so a call into the VM is avoided if the heap is not full. This saves a subroutine call in the common case of course, but also some saving and restoring of registers.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;These changes have let to some performance improvements on the two benchmarks I was working with. My computer here is a MacBook Pro with a 2.4 GHz Intel Core Duo 2.&lt;br /&gt;&lt;br /&gt;The &lt;code&gt;project-euler.150&lt;/code&gt; benchmark saw its runtime decrease from 87 seconds to 22 seconds.&lt;br /&gt;&lt;br /&gt;The &lt;code&gt;benchmark.recursive&lt;/code&gt; benchmark, which is a Factor implementation of the &lt;a href=&quot;&quot;&gt;recursive benchmark&lt;/a&gt; from the Computer Language Shootout, saw its runtime decrease from 27 seconds to 9 seconds.&lt;br /&gt;&lt;br /&gt;For comparison, SBCL runs the recursive benchmark in 3.5 seconds, and Java runs it in 1.6 seconds.&lt;br /&gt;&lt;br /&gt;Using the &lt;code&gt;java -Xint&lt;/code&gt; result of 22 seconds, I guesstimated from the &lt;a href=&quot;http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=recursive&amp;lang=all&quot;&gt;results for all languages&lt;/a&gt; on the shootout that Factor at around the performance of the Erlang HIPE JIT, and slightly faster than the Python Psyco JIT and GForth. By anybody&apos;s standard, this is anywhere between &quot;not very fast&quot; and &quot;bloody slow&quot;, but its slowly improving.&lt;br /&gt;&lt;br /&gt;Now I&apos;ll end with a rant about the language shootout.&lt;br /&gt;&lt;br /&gt;The only so-called &quot;dynamic&quot; language implementations which come close to the performance of Java and C on this benchmark are SBCL, Ikarus Scheme, and Chicken Scheme. However, all these benchmarks are actually static programs in disguise, peppered with type declarations and even unsafe low-level features. The Ikarus and Chicken versions even implement the same functions twice, once for integer inputs and using integer arithmetic primitives, and another time for float inputs and using float arithmetic primitives. Unless their goal is really to promote their languages as nothing more than C with s-expression syntax, this is disappointing and dishonest.&lt;br /&gt;&lt;br /&gt;The Haskell benchmarks suffer from this also. Many benchmarks seem to use malloc and pointer arithmetic, and exclamation points (strictness annotations) abound. The &lt;a href=&quot;http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=nbody&amp;lang=ghc&amp;id=0&quot;&gt;n-body benchmark&lt;/a&gt; contains this gem:&lt;br /&gt;&lt;pre&gt;planets :: Ptr Double&lt;br /&gt;planets = unsafePerformIO $ mallocBytes (7 * nbodies * 8) -- sizeOf double = 8&lt;/pre&gt;&lt;br /&gt;I would expect better from the Haskell people than &lt;i&gt;hardcoding the size of a double to do pointer arithmetic on manually-managed memory&lt;/i&gt;!&lt;br /&gt;&lt;br /&gt;Right now I have no idea how idiomatic Haskell performs in practice; from looking at these benchmarks, it is clear to me that if one writes C in Haskell, pretty decent performance is possible, but that&apos;s not saying much. You can write C in any language.&lt;br /&gt;&lt;br /&gt;The runtime of the Factor recursive benchmark further improves by two-fold if I manually replace generic arithmetic with unsafe machine arithmetic, however I&apos;m not interested in writing such code. I don&apos;t want to expose low-level primitives to the user, document their use as necessary for performance critical code, and declare that my job is done.</content></entry><entry><title type="html">Slava Pestov: My top 8 shell commands</title><link href="http://factor-language.blogspot.com/2008/04/my-top-8-shell-commands.html"/><published>2008-04-19T03:52:00.003-04:00</published><content type="html">I wrote a short Factor script to check my bash history and tally up the most frequently occurring commands.&lt;br /&gt;&lt;pre&gt;    git&lt;br /&gt;    ./factor&lt;br /&gt;    bf&lt;br /&gt;    cdf&lt;br /&gt;    ls&lt;br /&gt;    cd&lt;br /&gt;    fc&lt;br /&gt;    push&lt;/pre&gt;&lt;br /&gt;There are some aliases here:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;code&gt;cdf&lt;/code&gt; is aliased to &lt;code&gt;cd ~/factor&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;bf&lt;/code&gt; is aliased to &lt;code&gt;./factor -i=boot.x86.32.image&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;push&lt;/code&gt; is aliased to &lt;code&gt;git push origin master&lt;/code&gt;&lt;/li&gt;&lt;li&gt;&lt;code&gt;fc&lt;/code&gt; is alised to &lt;code&gt;ssh factorcode.org&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;It seems that the only thing I use my computer for is running Factor!</content></entry><entry><title type="html">Doug Coleman: Word renaming</title><link href="http://code-factor.blogspot.com/2008/04/word-renaming.html"/><published>2008-04-14T12:25:00.001-07:00</published><content type="html">Several words have been renamed and moved around to make Factor more consistent:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;new -&gt; new-sequence&lt;/li&gt;&lt;li&gt;construct-empty -&gt; new&lt;/li&gt;&lt;li&gt;construct-boa -&gt; boa&lt;/li&gt;&lt;li&gt;diff -&gt; assoc-diff&lt;/li&gt;&lt;li&gt;union -&gt; assoc-union&lt;/li&gt;&lt;li&gt;intersect -&gt; assoc-intersect&lt;/li&gt;&lt;li&gt;seq-diff -&gt; diff&lt;/li&gt;&lt;li&gt;seq-intersect -&gt; intersect&lt;/li&gt;&lt;/ul&gt;To make things symmetrical, a new word &lt;code&gt;union&lt;/code&gt; operates on sequences.&lt;br /&gt;&lt;br /&gt;Somehow, &lt;code&gt;seq-diff&lt;/code&gt; and &lt;code&gt;seq-intersect&lt;/code&gt; were implemented as O(n^2) algorithms.  Now, they use hashtables and are O(n).&lt;br /&gt;&lt;br /&gt;Lastly, a new vocabulary named ``sets&apos;&apos; contains the set theoretic words, along with a new word &lt;code&gt;unique&lt;/code&gt; that converts a sequence to a hash table whose keys and values are the same.  An efficient union and intersect are implemented in terms of this word.</content></entry><entry><title type="html">Doug Coleman: Adding a new primitive</title><link href="http://code-factor.blogspot.com/2008/04/adding-new-primitive.html"/><published>2008-04-13T14:20:00.000-07:00</published><content type="html">I added two primitives to the Factor VM to allow setting and unsetting of environment variables.  It&apos;s not that hard to do, but you have to edit several C files in the VM and a couple .factor files in the core.  Really they should not be primitives, so eventually they will be moved into the core.&lt;br /&gt;&lt;br /&gt;The primitives that I added are defined as follows:&lt;pre&gt;IN: system&lt;br /&gt;PRIMITIVE: set-os-env ( value key -- )&lt;br /&gt;PRIMITIVE: unset-os-env ( key -- )&lt;/pre&gt;&lt;div&gt;&lt;h2&gt;Adding a primitive to vm/&lt;/h2&gt;Since Factor&apos;s datatypes are not the same as C&apos;s datatypes, and because of the garbage collector, there are C functions for accessing and manipulating Factor objects.  The data conversion functions are not documented yet, so here&apos;s a sampling of a few of them:&lt;ul&gt;&lt;li&gt;unbox_u16_string() - pop a Factor string off the datastack and return it as a F_CHAR*&lt;/li&gt;&lt;li&gt;from_u16_string() - convert a C string to a Factor object&lt;/li&gt;&lt;li&gt;REGISTER_C_STRING() - register a C string with Factor&apos;s garbage collector&lt;/li&gt;&lt;li&gt;UNREGISTER_C_STRING() - unregister a registered C string&lt;/li&gt;&lt;li&gt;dpush() - push an object onto the datastack&lt;/li&gt;&lt;li&gt;dpop() - pop an object off of the datastack&lt;/li&gt;&lt;/ul&gt;Registering a C string with the garbage collector is required when VM code calls code that may trigger a garbage collection (gc).  Any call to Factor from the VM might trigger a gc, and if that happened the object could be moved, thus invalidating your C pointer.  When a pointer is unregistered, it&apos;s popped from a gc stack with the corrected pointer value.&lt;br /&gt;&lt;br /&gt;Here is the call to &lt;word&gt;set-os-env:&lt;/word&gt;&lt;pre&gt;DEFINE_PRIMITIVE(set_os_env)&lt;br /&gt;{&lt;br /&gt; F_CHAR *key = unbox_u16_string();&lt;br /&gt; REGISTER_C_STRING(key);&lt;br /&gt; F_CHAR *value = unbox_u16_string();&lt;br /&gt; UNREGISTER_C_STRING(key);&lt;br /&gt; if(!SetEnvironmentVariable(key, value))&lt;br /&gt;     general_error(ERROR_IO, tag_object(get_error_message()), F, NULL);&lt;br /&gt;}&lt;/pre&gt;The function is defined with a macro &lt;code&gt;DEFINE_PRIMITIVE&lt;/code&gt; that takes only the function name.  A corresponding &lt;code&gt;DECLARE_PRIMITIVE&lt;/code&gt; goes in run.h as your function declaration.  Not all primitives use these C preprocessor macros, for instance bignums don&apos;t because it doesn&apos;t improve the performance.  Parameters to your primitive are popped or unboxed off the data stack, so a primitive&apos;s declaration expands to:&lt;pre&gt;F_FASTCALL primitive_set_os_env_impl(void);&lt;br /&gt;&lt;br /&gt;F_FASTCALL void primitive_set_os_env(CELL word, F_STACK_FRAME *callstack_top) {&lt;br /&gt;    save_callstack_top(callstack_top);&lt;br /&gt;    primitive_set_os_env_impl();&lt;br /&gt;}&lt;br /&gt;INLINE void primitive_set_os_env_impl(void)&lt;/pre&gt;F_FASTCALL is a wrapper around FASTCALL, which on x86 will pass the first two arguments in registers as an optimization.  Note that while it declares that it takes no arguments (void), most primitives will do something to the data stack.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Since unbox_u16_string() allocates memory for the Factor object, it could trigger a gc, so it&apos;s registered as a string.  You can also register values using &lt;code&gt;REGISTER_ROOT&lt;/code&gt; for cells, &lt;code&gt;REGISTER_BIGNUM&lt;/code&gt; for bignums, and &lt;code&gt;REGISTER_UNTAGGED&lt;/code&gt; for arrays, words, and other Factor object pointers for which the type is known.  The key string can immediately be unregistered after calling unbox on the next stack value since the rest of the function will not cause a gc.  If the win32 call fails, there&apos;s a function &lt;code&gt;general_error()&lt;/code&gt; that throws an exception.  In this case, it&apos;s an ERROR_IO that calls a helper function to return the Windows error message.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;Now that the function is written, you have to add it to the list of primitives in primitives.c.  The important thing is that this list remains in the same order as the list in core/ which you will edit in the next section.  Also, add a prototype to the run.h file.&lt;br /&gt;&lt;div&gt;&lt;h2&gt;Adding a primitive to core/&lt;/h2&gt;Everything in Factor compiles down to primitives.  Because they are by definition &quot;primitive&quot;, the compiler cannot infer the stack effect and argument types.  To make a primitive&apos;s stack effect &quot;known&quot;, edit &lt;i&gt;core/inference/known-words/known-words.factor&lt;/i&gt;:&lt;pre&gt;\ set-os-env { string string } { } &amp;lt;effect&amp;gt; set-primitive-effect&lt;/pre&gt;The next step is to put your word in the file &lt;i&gt;core/bootstrap/primitives.factor&lt;/i&gt; in the same order as in &lt;i&gt;vm/primitives.c&lt;/i&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Sometime in the future there might be a &lt;code&gt;PRIMITIVE:&lt;/code&gt; word that will reduce the number of different places to edit to add a primitive.  If it used Factor&apos;s FFI, you could add a new primitive without even having to bootstrap again.&lt;/div&gt;</content></entry><entry><title type="html">Slava Pestov: Multi-touch gestures in the Factor UI</title><link href="http://factor-language.blogspot.com/2008/04/multi-touch-gestures-in-factor-ui.html"/><published>2008-04-11T23:34:00.004-04:00</published><content type="html">Recently I acquired a MacBook Pro. The new models come with a multi-touch keypad and I&apos;ve found it extremely useful in Safari to be able to go back, go forward and zoom with the mouse. However, I missed the ability to do this in other applications, especially the Factor UI. The Factor UI already supports vertical and horizontal scrolling gestures, and I wanted to be able to use the other gestures as well.&lt;br /&gt;&lt;br /&gt;Apparently, Apple&apos;s official stance is that &lt;a href=&quot;http://lists.apple.com/archives/Cocoa-dev/2008/Jan/msg00917.html&quot;&gt;no multitouch API will be made public until 10.6&lt;/a&gt;. I was slightly discouraged by this but some more Googling turned up &lt;a href=&quot;http://cocoadex.com/2008/02/nsevent-modifications-swipe-ro.html&quot;&gt;a blog post by Elliott Harris&lt;/a&gt; detailing the undocumented API for receiving these events.&lt;br /&gt;&lt;br /&gt;While normally I shy away from relying on undocumented functionality in this case the API is dead-simple and it is almost an oversight on Apple&apos;s part not to document it. And if they break it, well, it will be easy to update Factor too.&lt;br /&gt;&lt;br /&gt;Here are the changes I had to make. First, I added some new UI gestures to &lt;code&gt;extra/ui/gestures/gestures.factor&lt;/code&gt;; these are cross-platform and theoretically the Windows and X11 UI backends could produce them too, perhaps as a result of button presses on those &quot;Internet&quot; keyboards:&lt;br /&gt;&lt;pre&gt;TUPLE: left-action ;        C: &amp;lt;left-action&gt; left-action&lt;br /&gt;TUPLE: right-action ;       C: &amp;lt;right-action&gt; right-action&lt;br /&gt;TUPLE: up-action ;          C: &amp;lt;up-action&gt; up-action&lt;br /&gt;TUPLE: down-action ;        C: &amp;lt;down-action&gt; down-action&lt;br /&gt;&lt;br /&gt;TUPLE: zoom-in-action ;  C: &amp;lt;zoom-in-action&gt; zoom-in-action&lt;br /&gt;TUPLE: zoom-out-action ; C: &amp;lt;zoom-out-action&gt; zoom-out-action&lt;/pre&gt;&lt;br /&gt;Next, I edited &lt;code&gt;extra/ui/cocoa/views/views.factor&lt;/code&gt; with some new methods for handling the new multitouch gestures, and translating them to Factor gestures:&lt;br /&gt;&lt;pre&gt;{ &quot;magnifyWithEvent:&quot; &quot;void&quot; { &quot;id&quot; &quot;SEL&quot; &quot;id&quot; }&lt;br /&gt;    [&lt;br /&gt;        nip&lt;br /&gt;        dup -&gt; deltaZ sgn {&lt;br /&gt;            {  1 [ T{ zoom-in-action } send-action$ ] }&lt;br /&gt;            { -1 [ T{ zoom-out-action } send-action$ ] }&lt;br /&gt;            {  0 [ 2drop ] }&lt;br /&gt;        } case&lt;br /&gt;    ]&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;{ &quot;swipeWithEvent:&quot; &quot;void&quot; { &quot;id&quot; &quot;SEL&quot; &quot;id&quot; }&lt;br /&gt;    [&lt;br /&gt;        nip&lt;br /&gt;        dup -&gt; deltaX sgn {&lt;br /&gt;            {  1 [ T{ left-action } send-action$ ] }&lt;br /&gt;            { -1 [ T{ right-action } send-action$ ] }&lt;br /&gt;            {  0&lt;br /&gt;                [&lt;br /&gt;                    dup -&gt; deltaY sgn {&lt;br /&gt;                        {  1 [ T{ up-action } send-action$ ] }&lt;br /&gt;                        { -1 [ T{ down-action } send-action$ ] }&lt;br /&gt;                        {  0 [ 2drop ] }&lt;br /&gt;                    } case&lt;br /&gt;                ]&lt;br /&gt;            }&lt;br /&gt;        } case&lt;br /&gt;    ]&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;Note that I&apos;m throwing away useful information for the sake of simplicity; the zoom gesture gives you a precise zoom amount, not just +1/-1. The swipe gestures seem to be completely discrete though. I haven&apos;t implemented rotation gestures yet because I haven&apos;t figured out what to use them for.&lt;br /&gt;&lt;br /&gt;With the above code written, the UI now sends multi-touch gestures to gadgets however no gadgets used them yet. I fired up the gesture logger, &quot;gesture-logger&quot; run, and tested that the new actions actually get sent. They were, except the first time I messed up the code and got the signs the wrong way round.&lt;br /&gt;&lt;br /&gt;With that fixed, I could proceed to add gesture handlers to the various UI tools. I edited various source files:&lt;br /&gt;&lt;pre&gt;browser-gadget &quot;multi-touch&quot; f {&lt;br /&gt;    { T{ left-action } com-back }&lt;br /&gt;    { T{ right-action } com-forward }&lt;br /&gt;} define-command-map&lt;br /&gt;&lt;br /&gt;inspector-gadget &quot;multi-touch&quot; f {&lt;br /&gt;    { T{ left-action } &amp;back }&lt;br /&gt;} define-command-map&lt;br /&gt;&lt;br /&gt;workspace &quot;multi-touch&quot; f {&lt;br /&gt;    { T{ zoom-out-action } com-listener }&lt;br /&gt;    { T{ up-action } refresh-all }&lt;br /&gt;} define-command-map&lt;/pre&gt;&lt;br /&gt;I think its pretty clear from the above what the default gesture assignments are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;In the browser tool, swipe left/right navigate the help history.&lt;/li&gt;&lt;li&gt;In the inspector, swipe left goes back.&lt;/li&gt;&lt;li&gt;In any tool, a pinch (zoom out) closes the current tool, leaving only the listener visible. A swipe up reloads any changed source files (I&apos;m not sure if I like this yet).&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;15 minutes of Google, 20 minutes of hacking, and now Factor supports the fanciest feature of Apple&apos;s latest hardware.</content></entry><entry><title type="html">Slava Pestov: Improvements to io.monitors; faster refresh-all</title><link href="http://factor-language.blogspot.com/2008/04/improvements-to-iomonitors-faster.html"/><published>2008-04-11T13:19:00.002-04:00</published><content type="html">Factor&apos;s &lt;code&gt;io.monitors&lt;/code&gt; library previously supported &lt;a href=&quot;http://factor-language.blogspot.com/2008/02/file-system-change-monitoring-on-mac-os.html&quot;&gt;Mac OS X&lt;/a&gt;, &lt;a href=&quot;http://factor-language.blogspot.com/2008/02/file-system-change-notification-on.html&quot;&gt;Windows and Linux&lt;/a&gt;. Now it also supports BSD, but in a much more restricted fashion than the other platforms. Basically you cannot monitor directories, just individual files. This is because &lt;code&gt;kqueue()&lt;/code&gt; only provides very limited functionality in this regard. However, having something is better than nothing, and the functionality provided on BSDs is still useful for monitoring log files and such.&lt;br /&gt;&lt;br /&gt;On Linux, &lt;code&gt;inotify&lt;/code&gt; doesn&apos;t directly support monitoring recursive directory hierarchies so Factor&apos;s monitors didn&apos;t support recursive monitoring, but &lt;a href=&quot;http://mail.gnome.org/archives/dashboard-hackers/2004-October/msg00022.html&quot;&gt;a mailing list post by Robert Love&lt;/a&gt; discusses how to solve this issue in user-space. I used his solution to implement recursive monitors on Linux.&lt;br /&gt;&lt;br /&gt;Another oddity relating to &lt;code&gt;inotify&lt;/code&gt; is that if you add the same directory twice to the same inotify, you get the same watch ID both times, and events are only reported once. This means that the previous implementation where there was one global inotify instance shared by all monitors wasn&apos;t really as general as one would hope, because you couldn&apos;t run two programs that monitor overlapping portions of the file system. I thought of several possible fixes but in the end just changed the monitors API to accommodate this case. All monitor operations must now be wrapped in a &lt;code&gt;with-monitors&lt;/code&gt; combinator. On Linux, it creates an inotify instance and stores it in a dynamically-scoped variable, so that subsequent calls to &lt;code&gt;&amp;lt;monitor&lt;/code&gt; use this inotify. Independent inotifies in different threads don&apos;t interact at all. On Mac OS X, BSD and WIndows, &lt;code&gt;with-monitors&lt;/code&gt; just calls the quotation without doing any special setup.&lt;br /&gt;&lt;br /&gt;Another issue I fixed was that on Mac OS X, monitors would only work when used from the UI because no run loop was running otherwise. I made a run loop run all of the time and this allows monitors to work in the terminal listener.&lt;br /&gt;&lt;br /&gt;Now that monitors are working better, I was able to use them to make &lt;code&gt;refresh-all&lt;/code&gt;. This word finds all changed source files in the vocabulary roots and reloads them. It does this by comparing cached CRC32 checksums with the actual checksum of the file. Previously it would also compare modification times, but I took that code out because filesystem meta-data queries got moved out of the VM and into the native I/O code, which isn&apos;t available during bootstrap. A side-effect of this is that &lt;code&gt;refresh-all&lt;/code&gt; became much slower, because it had to read all files. Using monitors I was able to make this faster than it has ever been. A thread waiting on a monitor is started on startup. Then, the source tree only has to be checksummed in its entirety the first time &lt;code&gt;refresh-all&lt;/code&gt; is used in a session. Subsequently, only files for which the monitor reported changes have to be scanned. So &lt;code&gt;refresh-all&lt;/code&gt; runs instantly if there are no changes, and so on.</content></entry><entry><title type="html">Slava Pestov: The golden rule of writing comments</title><link href="http://factor-language.blogspot.com/2008/04/golden-rule-of-programming.html"/><published>2008-04-08T20:36:00.004-04:00</published><content type="html">This is something I&apos;ve wanted to say for a while, and I think many (most?) programmers don&apos;t realize it:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Never comment out code.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Comments are for natural-language descriptions of code, or pseudocode maybe.&lt;br /&gt;&lt;br /&gt;If you comment out some code, then the parser isn&apos;t checking the syntax, the compiler isn&apos;t checking semantics, and the unit tests are not unit testing it.&lt;br /&gt;&lt;br /&gt;So the code may as well not work. Why have non-working code in your program, especially if it&apos;s not called anywhere?&lt;br /&gt;&lt;br /&gt;But perhaps the code is there so that you can see the what the program &lt;i&gt;used&lt;/i&gt; to do. In that case, just fire up your favorite graphical GIT/SVN/P4/whatever frontend and check out an older revision.</content></entry><entry><title type="html">Slava Pestov: Multi-methods and hooks</title><link href="http://factor-language.blogspot.com/2008/04/multi-methods-and-hooks.html"/><published>2008-04-08T20:14:00.003-04:00</published><content type="html">For a while now Factor has had &apos;hooks&apos;, which are generic words dispatching on a dynamically scoped variable. Hooks can be used with variables which are essentially global: the current OS, current CPU, UI backend, etc -- or variables which are truly context-specific, such as the current database connection.&lt;br /&gt;&lt;br /&gt;I added support for hooks to the &lt;code&gt;extra/multi-methods&lt;/code&gt; library, which is going in the core soon. While doing this I was able to significantly generalize the concept. Suppose we want to define a piece of functionality which depends on both the operating system and processor type.&lt;br /&gt;&lt;br /&gt;We can begin with defining an ordinary generic word:&lt;br /&gt;&lt;pre&gt;GENERIC: cache-size ( -- l1 l2 )&lt;/pre&gt;&lt;br /&gt;Notice that it is defined as taking no inputs from the stack.&lt;br /&gt;&lt;br /&gt;I don&apos;t really know the APIs involved here, but suppose that Linux gives us a way to get this info that works across all CPUs. So we define a method specializing on the &lt;code&gt;os&lt;/code&gt; dynamic variable (which is really treated like global):&lt;br /&gt;&lt;pre&gt;METHOD: cache-size { { os linux } } ... read something from /proc ... ;&lt;/pre&gt;&lt;br /&gt;Now suppose on Mac OS X we use different APIs per CPU:&lt;br /&gt;&lt;pre&gt;METHOD: cache-size { { os macosx } { cpu ppc } } ... ;&lt;br /&gt;&lt;br /&gt;METHOD: cache-size { { os macosx } { cpu x86.32 } } ... ;&lt;br /&gt;&lt;br /&gt;METHOD: cache-size { { os macosx } { cpu x86.64 } } ... ;&lt;/pre&gt;&lt;br /&gt;But perhaps on ARM CPUs, we just use an instruction to read the cache size without any OS-specific calls at all:&lt;br /&gt;&lt;pre&gt;METHOD: cache-size { { cpu arm } } ... ;&lt;/pre&gt;&lt;br /&gt;Now in this case, you have an issue where if you&apos;re on Linux and ARM, the method that ends up being called depends on the order in which they were defined. If you wanted to explicitly resolve this ambiguity, you would define a new method on &lt;code&gt;{ { os linux } { cpu arm } }&lt;/code&gt;; because it is more specific than the other two, it is always called first.&lt;br /&gt;&lt;br /&gt;The powerful thing about this new implementation of hooks is that not only can you dispatch on multiple variables, but you can add methods to any old generic which dispatches on a variable and the original designer of the generic does not have to explicitly take this into account.&lt;br /&gt;&lt;br /&gt;For example, Factor&apos;s compiler currently has a large number of hooks dispatching on the CPU type (as an aside, Phil Dawes wrote an excellent &lt;a href=&quot;http://www.phildawes.net/blog/2008/04/08/digging-into-factors-compiler/&quot;&gt;introduction to the Factor compiler&lt;/a&gt; recently). If those hooks need to be further refined by OS, as is often the case with FFI-related components, the method implementation on the hook needs to perform its own dispatch; this is the &quot;double dispatch&quot; pattern and design patterns are something to be avoided if one wants to write quality code. When multi-methods go in the core, the compiler will simply define a series of generic words taking no inputs from the stack, and each method will specialize on the CPU, and maybe an OS too.&lt;br /&gt;&lt;br /&gt;Another new capability is dispatching off stack values and variables in the same method. Among other things, this will be useful in eliminating a case of double dispatch in the core right now, where the &lt;code&gt;&amp;lt;client&gt;&lt;/code&gt; word for opening a client socket has to dispatch off the address type on the stack, and then call another hook which dispatches on the I/O backend stored in a variable. This can be combined into a single generic word where some methods dispatch on stack values, and others dispatch on the I/O backend variable.&lt;br /&gt;&lt;br /&gt;The other nice thing about this is that the multi-method &lt;code&gt;GENERIC:&lt;/code&gt; word unifies and generalizes four words in the core, &lt;code&gt;GENERIC:&lt;/code&gt;, &lt;code&gt;GENERIC#&lt;/code&gt; for dispatching on a value further down on the stack, &lt;code&gt;HOOK:&lt;/code&gt; for dispatching on a variable, and &lt;code&gt;MATH:&lt;/code&gt; which performs double dispatch on numbers only.&lt;br /&gt;&lt;br /&gt;One of my goals for Factor 1.0 is to get the object system &quot;right&quot;, and with the new-style slot accessors, inheritance, and singletons, we&apos;re almost there. All that remains to be done is to merge the multi-methods code. The code is still not quite ready to go in the core, though. The only feature that single dispatch has and multiple-dispatch lacks is &lt;code&gt;call-next-method&lt;/code&gt;, which is easy to implement. A bigger hurdle to clear is performance; right now multi-methods are implemented in a naive way, where the dispatch time is &lt;code&gt;O(mn)&lt;/code&gt; with &lt;code&gt;m&lt;/code&gt; methods and &lt;code&gt;n&lt;/code&gt; stack positions and variables to dispatch on. This can be improved significantly and I will find time reading the literature on optimizing method dispatch over the next few weeks.</content></entry><entry><title type="html">Phil Dawes: Digging into Factor&amp;#8217;s compiler</title><link href="http://www.phildawes.net/blog/2008/04/08/digging-into-factors-compiler/"/><published>2008-04-08T08:27:16Z</published><content type="html">&lt;p&gt;I wrote this post partly as an advocacy piece and partly to put down a bunch of things I&amp;#8217;ve learnt about the factor compiler over the last few weeks. I should point out that I&amp;#8217;m no expert in this area and so there are probably inaccuracies and omissions - hopefully Slava or one of the factor gurus will point them out. With that in mind, check this out!:&lt;/p&gt;
&lt;p&gt;Factor has an optimising compiler which generates machine code as you type code into the &lt;a href=&quot;http://en.wikipedia.org/wiki/REPL&quot;&gt;REPL&lt;/a&gt;. If you have gdb on your system you can see this in action by firing up factor and using the tools.disassembler vocab:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
( scratchpad ) : hello &quot;hello&quot; print ;           ! defines the word hello
( scratchpad ) USING: tools.disassembler ;
( scratchpad ) \ hello disassemble
&lt;em&gt;
Using host libthread_db library &amp;#8220;/lib/tls/i686/cmov/libthread_db.so.1&amp;#8243;.
[Thread debugging using libthread_db enabled]
[New Thread -1213614400 (LWP 25688)]
0xffffe410 in __kernel_vsyscall ()
Dump of assembler code from 0xb0f694b0 to 0xb0f694c7:
0xb0f694b0: mov    $0xb0f694b0,%ecx
0xb0f694b5: mov    0xb0f694e4,%ebx
0xb0f694bb: mov    %ebx,0&amp;#215;4(%esi)
0xb0f694be: add    $0&amp;#215;4,%esi
0xb0f694c1: jmp    0xb123e7d0
&amp;#8230;
End of assembler dump.
&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;This is very cool in itself, but for me the real beauty of the factor compiler is the very modular design, composed of small pieces that you can pull apart and tinker with in isolation. This makes the compiler accessible to people both as a learning tool and for those wanting to generate highly optimized code for tight loops. &lt;/p&gt;
&lt;p&gt;The three stages of the compiler are&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Parsing the code and generating a &amp;#8216;dataflow&amp;#8217; abstract syntax tree. (Also called &amp;#8216;IR&amp;#8217; - intermediate representation)&lt;/li&gt;
&lt;li&gt;Optimizing the dataflow tree&lt;/li&gt;
&lt;li&gt;Generating machine code from the dataflow tree&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I&amp;#8217;ll dig into each of these steps in order:&lt;/p&gt;
&lt;h3&gt;Stage 1: Parsing factor code to dataflow IR&lt;/h3&gt;
&lt;p&gt;The first step parses factor code into a dataflow datastructure. You can run and inspect the results of this yourself using the dataflow word:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
USE: inference
[ &quot;hello&quot; print ] dataflow pprint
&lt;em&gt;
=&gt; T{
    #push
    T{
        node
        f
        f
        f
        V{ T{ value f &amp;#8220;hello&amp;#8221; 673850 f } }
        f
        f
        f
        f
        f
        f
        T{
            #call
            T{
                node
                f
                print
                V{ T{ value f &amp;#8220;hello&amp;#8221; 673850 f } }
                V{ }
                f
                f
                f
                f
                f
                f
                T{
                    #return
                    T{ node f f V{ } f f f f f f f f f }
                }
                f
            }
        }
        f
    }
}
&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Obviously inspecting this datastructure manually is pretty cumbersome, so fortunately there&amp;#8217;s some dataflow inspection functionality in the &amp;#8216;optimizer.debugger&amp;#8217; vocab. The dataflow&gt;quot word renders the dataflow structure back into quotations (code blocks) that you can print and inspect. I use it here to define some words for dataflow pretty-printing:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
USE: optimizer.debugger
: print-dataflow f dataflow&gt;quot pprint nl ;
: print-annotated-dataflow t dataflow&gt;quot pprint nl ;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;So now we can turn quotations into dataflow graphs and back again:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
[ &quot;hello&quot; print ] dataflow print-dataflow
&lt;em&gt;=&gt; [ &amp;#8220;hello&amp;#8221; print ]&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;(N.B. there are already words in the optimizer.debugger vocab for displaying optimized dataflows, but for this post I wanted to be able to print dataflows prior to optimisation)&lt;/p&gt;
&lt;p&gt;This also works for pre-defined words using &amp;#8216;word-dataflow&amp;#8217; :&lt;br /&gt;
&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
: print-hello &quot;hello&quot; print ;
USE: generator
\ print-hello word-dataflow print-dataflow
&lt;em&gt;=&gt; [ &amp;#8220;hello&amp;#8221; print ]&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;In most cases the output quotation will be the same as the input quotation, however there are a couple of expansions that happen at this stage. The first is that words marked &amp;#8216;inline&amp;#8217; are inlined directly into the dataflow:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
: inlinedword &quot;this&quot; &quot;is&quot; &quot;an&quot; &quot;inlined&quot; &quot;word&quot; ; inline
[ inlinedword ] dataflow print-dataflow
&lt;em&gt;=&gt; [ &amp;#8220;this&amp;#8221; &amp;#8220;is&amp;#8221; &amp;#8220;an&amp;#8221; &amp;#8220;inlined&amp;#8221; &amp;#8220;word&amp;#8221; ]&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Also any compiler-transforms (macros) are evaluated at this stage. &lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
USE shuffle
[ 1 2 2 ndup ] dataflow print-dataflow      ! ndup is a macro
&lt;em&gt;=&gt; [ 1 2 2 drop 2 drop &gt;r dup r&gt; swap 2 drop &gt;r dup r&gt; swap ]&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;Stage 2: Dataflow Optimisation&lt;/h3&gt;
&lt;p&gt;Here&amp;#8217;s where the fun starts. You can get a feel for how this stage works by looking at &amp;#8216;optimizer.factor&amp;#8217;. Here it is in its entirety:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
! Copyright (C) 2006, 2008 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel namespaces optimizer.backend optimizer.def-use
optimizer.known-words optimizer.math optimizer.control
optimizer.inlining inference.class ;
IN: optimizer

: optimize-1 ( node -- newnode ? )
    [
        H{ } clone class-substitutions set
        H{ } clone literal-substitutions set
        H{ } clone value-substitutions set
        dup compute-def-use
        kill-values
        dup detect-loops
        dup infer-classes
        optimizer-changed off
        optimize-nodes
        optimizer-changed get
    ] with-scope ;

: optimize ( node -- newnode )
    optimize-1 [ optimize ] when ;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&amp;#8216;optimize&amp;#8217; iteratively calls &amp;#8216;optimize-1&amp;#8242; until nothing changes in the output graph - i.e. that it has reached a fixed point and no more optimizations can be performed. If you dig into the words used by optimize-1 (try executing them individually and inspecting the dataflow result) you&amp;#8217;ll find that optimize-1 performs a number of inferences and optimizations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It tracks the types (classes) of stack elements created within the code block&lt;/li&gt;
&lt;li&gt;It inlines specific generic word implementations (methods) when it can deduce the class instance on the stack&lt;/li&gt;
&lt;li&gt;It prunes unused literals and flushable words. (this is actually more useful than it sounds, since other optimisations can generate unused code)
&lt;/li&gt;
&lt;li&gt;It performs branch analysis, marking tail calls in loops and pruning branches that can&amp;#8217;t be executed
&lt;/li&gt;
&lt;li&gt;It evaluates &amp;#8216;foldable&amp;#8217; words at compile time if the values of arguments are known
&lt;/li&gt;
&lt;li&gt;It executes any custom inference code attached to words, allowing words to evaluate their results at compile time if inputs are known&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Examples:&lt;/p&gt;
&lt;h5&gt;evaluating foldable words at compile time&lt;/h5&gt;
&lt;p&gt;&amp;#8216;+&amp;#8217; is a foldable word (see help for &amp;#8216;+&amp;#8217;), so the optimizer evaluates it at compile time if the values of both arguments are known. Here&amp;#8217;s the dataflow before and after optimization:&lt;br /&gt;
&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
[ 2 3 + ] dataflow print-dataflow            ! before optimization
&lt;em&gt;=&gt; [ 2 3 + ]&lt;/em&gt;

[ 2 3 + ] dataflow optimize print-dataflow   ! after optimization
&lt;em&gt;=&gt; [ 5 ]&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;h5&gt;type inference and inlining generic word implementations&lt;/h5&gt;
&lt;p&gt;To illustrate this we first create two tuples (classes) with constructors, and a generic word&lt;br /&gt;
&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
TUPLE: classa ;
C: &amp;lt;classa&gt; classa
TUPLE: classb ;
C: &amp;lt;classb&gt; classb 

GENERIC: dosomething ( obj -- val )
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Now we create an implementation of the generic word specialised for each class:&lt;br /&gt;
&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
M: classa dosomething drop &quot;something for class a&quot; ;
M: classb dosomething drop &quot;something for class b&quot; ;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Finally, some code which calls &amp;#8216;dosomething&amp;#8217; with an instance of &amp;#8216;classa&amp;#8217; on the stack, before and after optimization:&lt;br /&gt;
&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
[ &amp;lt;classa&gt; dosomething ] dataflow print-dataflow            ! before optimization
&lt;em&gt;=&gt; [  classa drop  classa 2 &amp;lt;tuple -boa&gt; dosomething ]&lt;/em&gt;

[ &amp;lt;classa&gt; dosomething ] dataflow optimize print-dataflow    ! after optimization
&lt;em&gt;=&gt; [  classa 2 &amp;lt;tuple -boa&gt; drop &amp;#8220;something for class a&amp;#8221; ]&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;It&amp;#8217;s a little messy because of the inlined tuple creation, but you can see that prior to optimization &amp;#8216;dosomething&amp;#8217; is a word call in the dataflow, and afterwards the optimizer has inlined the implementation of &amp;#8216;dosomething&amp;#8217; specialized on &amp;#8216;classa&amp;#8217;. (if you look you can also see an example of pruning literals here, as the first dataflow has resulted in a superflous &amp;#8216;\ classa drop&amp;#8217;).&lt;/p&gt;
&lt;p&gt;This is easier to see if I cheat a bit and use factor&amp;#8217;s &amp;#8216;declare&amp;#8217; word, which declares that elements on the top of the stack are instances of specific classes. So this quotation assumes that top stack element before it is called is of type &amp;#8216;classa&amp;#8217;:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
[ { classa } declare dosomething ] dataflow optimize print-dataflow
&lt;em&gt;=&gt; [ drop &amp;#8220;something for class a&amp;#8221; ]&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;N.B. you wouldn&amp;#8217;t normally use &amp;#8216;declare&amp;#8217; in user code, but it could be really handy for optimizing performance sensitive tight loops where the results of an external word call are known to the programmer but not the compiler.&lt;/p&gt;
&lt;h5&gt;conditional folding&lt;/h5&gt;
&lt;p&gt;The compiler optimizes out conditional branches when it can deduce the outcome of the conditional at compile time:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
[ 1 0 =  [ &quot;do if true&quot; ] [ &quot;do if false&quot; ] if ] dataflow optimize print-dataflow
&lt;em&gt;=&gt; [ &amp;#8220;do if false&amp;#8221; ]&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;1 isn&amp;#8217;t equal to 0 so it optimizes this whole block into the contents of the false quotation.&lt;br /&gt;
This is a simple example, but it turns out to be really cool in performance sensitive code (e.g. tight loops) because you can use a generic library function whose behaviour depends on a conditional, specialize it with a hardcoded &amp;#8216;f&amp;#8217; and the compiler will optimize the conditional branch right out of the resulting code. You get the elegance of the generic combinator with the speed of a hand coded loop. &lt;/p&gt;
&lt;h3&gt;Stage 3: Machine Code Generation&lt;/h3&gt;
&lt;p&gt;Code generation is implemented by the &amp;#8216;generate&amp;#8217; word. This iterates through the nodes calling &amp;#8216;generate-node&amp;#8217; on each. &lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
: generate-nodes ( node -- )
    [ node@ generate-node ] iterate-nodes end-basic-block ;

: generate ( node word label -- )
    [
        init-generate-nodes
        [ generate-nodes ] with-node-iterator
    ] with-generator ;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&amp;#8216;generate-node&amp;#8217; is a generic word with specialized implementations for each type of dataflow node. &lt;/p&gt;
&lt;p&gt;As described in &lt;a href=&quot;http://factor-language.blogspot.com/2006/04/look-at-new-compiler-design.html&quot;&gt;this post from Slava&amp;#8217;s excellent Factor blog&lt;/a&gt; there are a number of dataflow node types, the important ones being:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;
    * #push - push literals on the data stack&lt;br /&gt;
    * #shuffle - permute the elements of the data or call stack&lt;br /&gt;
    * #call - call a word&lt;br /&gt;
    * #label - an inlined recursive block (loop, etc)&lt;br /&gt;
    * #if - conditional with two child nodes&lt;br /&gt;
    * #dispatch - jump table with multiple nodes; jumps to the node indexed by a number on the data stack
&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;The generate-node implementations invoke lower level words in the &amp;#8216;architecture&amp;#8217; vocabulary, which in turn are generic words that write out small pieces of machine code specialized for each CPU architecture. &lt;/p&gt;
&lt;p&gt;The machine code generation code is particularly cool and easy to follow because factor has an assembler DSL for each cpu architecture it supports. The assembler words match the commands and registers of the target cpu architecture and evaluate to their corresponding machine code.&lt;/p&gt;
&lt;p&gt;You can even try this out in the REPL using &amp;#8216;make&amp;#8217; to collect the results into an array. I&amp;#8217;m on x86 so I load the cpu.x86.assembler vocabulary:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
USE: cpu.x86.assembler

[ EAX 35 MOV ] { } make .   ! postfix assembler evaluates to machine code!

&lt;em&gt;=&gt; { 184 35 0 0 0 }&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The assembler DSLs make code generation easy to follow because you can see the assembler in the generation code and then check it against the disassembled machine code using the &amp;#8216;disassemble&amp;#8217; word we used at the start of the post. When following the code it helps to know which registers are used for what purpose. I found this information in assembler files in the factor VM source - I&amp;#8217;m an x86 so for me the declares are in the &amp;#8216;cpu-x86.32.S&amp;#8217; file:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
#define ARG0 %eax
#define ARG1 %edx
#define XT_REG %ecx
#define STACK_REG %esp
#define DS_REG %esi
#define RETURN_REG %eax

#define CELL_SIZE 4

#define PUSH_NONVOLATILE 
	push %ebx ; 
	push %ebp

#define POP_NONVOLATILE 
	pop %ebp ; 
	pop %ebx

register CELL ds asm(&quot;esi&quot;);
register CELL rs asm(&quot;edi&quot;);
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;So lets check this against some generated code for a really basic word that just pops the number 42 on the stack:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
: myfunc 42 ;
 myfunc disassemble
&lt;em&gt;=&gt;
Using host libthread_db library &amp;#8220;/lib/tls/i686/cmov/libthread_db.so.1&amp;#8243;.
[Thread debugging using libthread_db enabled]
[New Thread -1213696320 (LWP 32499)]
0xffffe410 in __kernel_vsyscall ()
Dump of assembler code from 0xb136b230 to 0xb136b247:
0xb136b230: mov    $0xb136b230,%ecx
0xb136b235: mov    $0&amp;#215;150,%ebx
0xb136b23a: mov    %ebx,0&amp;#215;4(%esi)
0xb136b23d: add    $0&amp;#215;4,%esi
0xb136b240: ret &lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The first assembler line puts the address of the word into the XT_REG, which is %ecx. For some reason the start of each function puts it&amp;#8217;s address into this register - not quite sure why.&lt;br /&gt;
The second line puts the number 42 into the ebx register. Note that factor uses the first 3 bits of a value (&amp;#8217;cell&amp;#8217;) to store its type (called a tag - see layouts.h). In this case it&amp;#8217;s a fixnum which is 000. 42 shifted left 3 bits is 336, which in hex is 0&amp;#215;150.&lt;br /&gt;
The third line puts the number onto the stack, and the forth updates the stack pointer to point to the new top of the stack.&lt;/p&gt;
&lt;h4&gt;code generation optimizations&lt;/h4&gt;
&lt;p&gt;Factor has another couple of tricks up its sleeve during the code generation stages:&lt;/p&gt;
&lt;h5&gt;optimizing shuffle words&lt;/h5&gt;
&lt;p&gt;The first is that stack shuffle words (e.g. dup, swap, tuck etc..) don&amp;#8217;t get translated into machine code. Instead the compiler has a compile time &amp;#8216;phantom stack&amp;#8217; which records the positions of items in the stack. When it generates the machine code values are accessed from the stack out of order (the runtime stack is after all a random access piece of memory). This makes stack shuffling words and the retain stack effectively &amp;#8216;free&amp;#8217; within a code block. A #merge node in the dataflow signifies a code boundary (usually before a subroutine call) which causes the compiler to output instructions which synchronise the physical runtime stack with its phantom stack.&lt;/p&gt;
&lt;h5&gt;word intrinsics&lt;/h5&gt;
&lt;p&gt;The second trick is word &amp;#8216;intrinsics&amp;#8217;. Word intrinsics are essentially blocks of open-coded assembler that are output in place of a subroutine call. They are associated with the word via a &amp;#8216;word-property&amp;#8217;, which is a nifty feature of factor that allows meta information to be attached to each word. For example the &amp;#8216;fixnum+fast&amp;#8217; word has intrinsics which you can see using &amp;#8216;word-prop&amp;#8217;:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;/p&gt;
&lt;pre&gt;
 fixnum+fast &quot;intrinsics&quot; word-prop pprint
&lt;em&gt;=&gt;
{
    {
        [ &amp;#8220;x&amp;#8221; operand &amp;#8220;y&amp;#8221; operand ADD ]
        H{
            { +output+ { &amp;#8220;x&amp;#8221; } }
            { +input+ { { f &amp;#8220;x&amp;#8221; } { [ small-tagged? ] &amp;#8220;y&amp;#8221; } } }
        }
    }
    {
        [ &amp;#8220;x&amp;#8221; operand &amp;#8220;y&amp;#8221; operand ADD ]
        H{
            { +output+ { &amp;#8220;x&amp;#8221; } }
            { +input+ { { f &amp;#8220;x&amp;#8221; } { f &amp;#8220;y&amp;#8221; } } }
        }
    }
}&lt;/em&gt;
&lt;/pre&gt;
&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;This tells the compiler to inline the x86 ADD instruction instead of making a subroutine call to the implementation of fixnum+fast. You can add assembler intrinsics to existing words with &amp;#8216;define-intrinsics&amp;#8217;; Here&amp;#8217;s a description from the help for the define-intrinsics word:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;
Defines a set of assembly intrinsics for the word. When a call to the word is being compiled, each intrinsic is tested in turn; the first applicable one will be called to generate machine code. If no suitable intrinsic is found, a simple call to the word is compiled instead.
&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;What I particularly like about this feature is that it neatly provides the ability to specialize a highly optimized implementation for a particular hardware set, and then fall back gracefully on other architectures.&lt;/p&gt;
&lt;p&gt;&amp;#8211;&lt;/p&gt;
&lt;p&gt;That concludes my ad-hoc tour of the factor compiler. I&amp;#8217;ve skipped over a number of things and no doubt there are bits I haven&amp;#8217;t discovered yet and some inaccuracies, but I hope I&amp;#8217;ve supplied enough information to spark interest in this excellent compiler. &lt;/p&gt;
&lt;p&gt;As I mentioned in a previous post I got interested in factor as a direct result of tinkering with &lt;a href=&quot;http://annexia.org/forth&quot;&gt;jonesforth&lt;/a&gt;, which takes you through the entire forth bootstrap process starting with raw assembly. I&amp;#8217;ve been delighted to find that factor retains a lot of the &amp;#8216;right-down-to-the-metal&amp;#8217; accessibility of its low level cousin.&lt;/p&gt;
</content></entry><entry><title type="html">Daniel Ehrenberg: Programming in a series of trivial one-liners</title><link href="http://useless-factor.blogspot.com/2008/04/programming-in-series-of-trivial-one.html"/><published>2008-04-06T19:06:00.000-07:00</published><content type="html">Among Perl programmers, a one-line program is considered a useful piece of hackage, something to show off to your friends as a surprisingly simple way to do a particular Unix or text-processing task. Outsiders tend to deride these one-liners as line noise, but there&apos;s a certain virtue to it: in just one line, in certain programming languages, it&apos;s possible to create meaningful functionality.&lt;br /&gt;&lt;br /&gt;APL, lived on by its derivatives like K, Q, J and Dyalog pioneered the concept of writing entire programs in a bunch of one-liners. Because their syntax is so terse and because of the powerful and high-level constructs of array processing, you can pack a lot into just 80 characters. In most K programs I&apos;ve seen, each one does something non-trivial, though this isn&apos;t always the case. It can take some time to decode just a single line. Reading Perl one-liners is the same way.&lt;br /&gt;&lt;br /&gt;Factor continues the one-line tradition. In general, it&apos;s considered good style to write your words in one, or sometimes two or three, lines each. But this isn&apos;t because we like to pack a lot into each line. Rather, each word is rather trivial, using the words defined before it. After enough simple things are combined, something non-trivial can result, but each step is easy to understand.&lt;br /&gt;&lt;br /&gt;Because Factor is concatenative (concatenation of programs denotes composition) it&apos;s easier to split things into these trivial one-liners. It can be done by copy and paste after the initial code is already written; there are no local variables whose name has to be changed. One liners in Factor aren&apos;t exceptional or an eccentric trait of the community. They&apos;re the norm and programs written otherwise are considered in bad style.&lt;br /&gt;&lt;br /&gt;Enough philosophizing. How does this work in practice? I&apos;m working on encodings right now, so I&apos;ll break down how this worked out in implementing 8-bit encodings like ISO-8859 and Windows-1252. These encodings are just a mapping of bytes to characters. Conveniently, a bunch of resource files describing these mappings which are all in exactly the same format is &lt;a href=&quot;ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/&quot;&gt;already exists&lt;/a&gt; on the Unicode website. &lt;br /&gt;&lt;br /&gt;The first thing to do in implementing this is to parse and process the resource file, turning it into two tables for fast lookup in either direction. Instead of putting this in one word, it&apos;s defined in five, each one or two lines long. First, &lt;code&gt;tail-if&lt;/code&gt; is a utility word which works like &lt;code&gt;tail&lt;/code&gt; but leaves the sequence as it is if it&apos;s shorter.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: tail-if ( seq n -- newseq )&lt;br /&gt;    2dup swap length &lt;= [ tail ] [ drop ] if ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Using that, &lt;code&gt;process-contents&lt;/code&gt; an array of lines and turns it into an associative mapping (in the form of an array of pairs) from octets to code points.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: process-contents ( lines -- assoc )&lt;br /&gt;    [ &quot;#&quot; split1 drop ] map [ empty? not ] subset&lt;br /&gt;    [ &quot;\t&quot; split 2 head [ 2 tail-if hex&gt; ] map ] map ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;code&gt;byte&gt;ch&lt;/code&gt; takes this assoc, the product of &lt;code&gt;process-contents&lt;/code&gt; and produces an array which can be used to get the code point corresponding to a byte.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: byte&gt;ch ( assoc -- array )&lt;br /&gt;    256 replacement-char &amp;lt;array&gt;&lt;br /&gt;    [ [ swapd set-nth ] curry assoc-each ] keep ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;code&gt;ch&gt;byte&lt;/code&gt; is the opposite, taking the original assoc and producing an efficiently indexable mapping from code points to octets.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: ch&gt;byte ( assoc -- newassoc )&lt;br /&gt;    [ swap ] assoc-map &gt;hashtable ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Finally, &lt;code&gt;parse-file&lt;/code&gt; puts these all together and makes both mappings, given a stream for the resource file.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: parse-file ( stream -- byte&gt;ch ch&gt;byte )&lt;br /&gt;    lines process-contents&lt;br /&gt;    [ byte&gt;ch ] [ ch&gt;byte ] bi ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Next, the structure of the encoding itself is defined. A single tuple named &lt;code&gt;8-bit&lt;/code&gt; is used to represent all 8-bit encodings. It contains the encoding and decoding table, as well as the name of the encoding. The &lt;code&gt;encode-8-bit&lt;/code&gt; and &lt;code&gt;decode-8-bit&lt;/code&gt; words just take some encoding or decoding information and look the code point or octet up in the given table.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;TUPLE: 8-bit name decode encode ;&lt;br /&gt;&lt;br /&gt;: encode-8-bit ( char stream assoc -- )&lt;br /&gt;    swapd at* [ encode-error ] unless swap stream-write1 ;&lt;br /&gt;&lt;br /&gt;M: 8-bit encode-char&lt;br /&gt;    encode&gt;&gt; encode-8-bit ;&lt;br /&gt;&lt;br /&gt;: decode-8-bit ( stream array -- char/f )&lt;br /&gt;    swap stream-read1 dup&lt;br /&gt;    [ swap nth [ replacement-char ] unless* ] [ nip ] if ;&lt;br /&gt;&lt;br /&gt;M: 8-bit decode-char&lt;br /&gt;    decode&gt;&gt; decode-8-bit ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I wanted to design this, like existing Unicode functionality, to read resource files at parsetime rather than to generate Factor source code. Though I don&apos;t expect these encodings to change, the result is still more maintainable as it leaves a lower volume of code. If I were implementing this in C or Java or R5RS Scheme or Haskell98, this wouldn&apos;t be possible. So &lt;code&gt;make-8-bit&lt;/code&gt; defines an encoding given a word and the lookup tables to use:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: make-8-bit ( word byte&gt;ch ch&gt;byte -- )&lt;br /&gt;    [ 8-bit construct-boa ] 2curry dupd curry define ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;code&gt;define-8-bit-encoding&lt;/code&gt; puts everything together. It takes a string for the name of an encoding to be defined and a stream, reads the appropriate resource file and defines an 8-bit encoding.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: define-8-bit-encoding ( name stream -- )&lt;br /&gt;    &gt;r in get create r&gt; parse-file make-8-bit ;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;To top it all off, here&apos;s what&apos;s needed to define all the 8-bit encodings we want:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;: mappings {&lt;br /&gt;    { &quot;latin1&quot; &quot;8859-1&quot; }&lt;br /&gt;    { &quot;latin2&quot; &quot;8859-2&quot; }&lt;br /&gt;    ! ...&lt;br /&gt;} ;&lt;br /&gt;&lt;br /&gt;: encoding-file ( file-name -- stream )&lt;br /&gt;    &quot;extra/io/encodings/8-bit/&quot; &quot;.TXT&quot;&lt;br /&gt;    swapd 3append resource-path ascii &lt;file-reader&gt; ;&lt;br /&gt;&lt;br /&gt;[&lt;br /&gt;    &quot;io.encodings.8-bit&quot; in [&lt;br /&gt;        mappings [ encoding-file define-8-bit-encoding ] assoc-each&lt;br /&gt;    ] with-variable&lt;br /&gt;] with-compilation-unit&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So by combining these trivial one-liners or two-liners, you can make something that&apos;s not as trivial. The end product is that hard things are made easy, which is the goal of every practical programming language. The point of this isn&apos;t to say that this code is perfect (it&apos;s very far from that), but just to demonstrate how clear things become when they&apos;re broken down in this way.&lt;br /&gt;&lt;br /&gt;When I first started programming Factor, I thought that it only made sense to define things separately when it was conceivable that something else would use them, or that it&apos;d be individually useful for testing, or something like that. But actually, it&apos;s useful for more than that: for just making your program clear. In a way, the hardest thing to do when programming in Factor once you have the basics is to name each of these pieces and factor them out properly from your program. The result is far more maintainable and readable than if the factoring process has not been done.</content></entry><entry><title type="html">Slava Pestov: GC fixes and improvements</title><link href="http://factor-language.blogspot.com/2008/04/gc-fixes-and-improvements.html"/><published>2008-04-05T05:27:00.005-04:00</published><content type="html">With some advice from &lt;a href=&quot;http://useless-factor.blogspot.com&quot;&gt;Daniel Ehrenberg&lt;/a&gt;, I have made a few fixes and improvements to Factor&apos;s garbage collector.&lt;br /&gt;&lt;br /&gt;First of all, there was a potential memory leak situation I overlooked. Suppose that there is a compiled definition in the code heap which references a very large object in the data heap, and neither the compiled definition or the large object is referenced from anywhere else. Because the code heap was only ever collected when it filled up, it was possible that this large object would never be reclaimed, and it would incur unnecessary collection cycles and memory pressure as a result. This could even result in unbounded heap growth if this was done in a loop. For example, the following test case would crash Factor even though it should have run in constant space:&lt;br /&gt;&lt;pre&gt;: leak-step 800000 f &amp;lt;array&gt; 1quotation call drop ;&lt;br /&gt;&lt;br /&gt;: leak-loop 1000 [ leak-step ] times ;&lt;/pre&gt;&lt;br /&gt;The fix is to always collect the code heap when collecting the oldest generation. Only collecting the code heap when it fills up is simply unsound because the code heap can reference objects in the data heap. If this was not the case then collecting them independently would be okay, but it isn&apos;t.&lt;br /&gt;&lt;br /&gt;The next thing I did was improve how allocation of large objects is handled. Previously, if a new object was too large to fit in the nursery, the entire heap would grow, and every time the heap grew it would increase the size of all generations. This is generally not what you want, because if the nursery is too large, we lose locality, and if the accumulation space is too large we waste time copying objects back and forth that should really be in tenured space.&lt;br /&gt;&lt;br /&gt;Now, the nursery and accumulation space have a fixed size that can be changed on startup with command line switches, but never changes while Factor is running. If an attempt is made to allocate an object larger than the nursery, it is directly allocated in tenured space. Presumably, if you&apos;re making a 2 megabyte array, you&apos;re going to do &lt;i&gt;something&lt;/i&gt; with it, and hold on top it for at least a few collection cycles, rather than discard it immediately, so it makes sense to avoid the copying altogether and stick it in tenured space. On retarded microbenchmarks this change might result in worse performance, but on more realistic workloads it should be better; having a small nursery helps with locality and avoiding the inevitable copying of the large array from the nursery to accumulation space and then to tenured space is good too.&lt;br /&gt;&lt;br /&gt;Future improvements to the GC will include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Shrinking the heap when memory usage lowers and remains low for several collections (Zed Shaw experimented with implementing this but didn&apos;t finish due to lack of time).&lt;/li&gt;&lt;li&gt;Allocating chunks from the OS in small increments, say 1mb, instead of allocating one large heap. This would make heap growth more efficient (no need to copy everything over) and it would also allow full use of the entire address space (and not half). However it will require rethinking Factor&apos;s card marking write barrier, which currently assumes a contiguous heap.&lt;/li&gt;&lt;li&gt;Using mark/sweep/compact for the oldest generation.&lt;/li&gt;&lt;li&gt;Incremental marking and sweeping.&lt;/li&gt;&lt;/ul&gt;Daniel and I will be working on all of this over the summer.</content></entry><entry><title type="html">Chris Double: Collection of Factor Articles in PDF</title><link href="http://www.bluishcoder.co.nz/2008/04/collection-of-factor-articles-in-pdf.html"/><published>2008-04-04T12:37:00.002+13:00</published><content type="html">Factor has experienced some rapid change in the libraries and language over the past few years. I&apos;ve written a few blog posts about Factor in the past and many of them suffer from bitrot due to this, making it hard to try out the examples in the latest Factor versions.&lt;br /&gt;&lt;br /&gt;I&apos;ve collected some of these articles, put them in a PDF document, and am slowly working through them so they are up to date with recent Factor versions. The document is split into sections for the articles that should work, and those that don&apos;t. Even the out of date articles make in