[ planet-factor ]

John Benediktsson: Rainbows

Rainbows are awesome, especially the ones that are double rainbows which can be quite intense when they are a double rainbow all the way across the sky. They can also be awesome when they show up as rainbow flags which are used to indicate that a place is welcome, accepting, and safe for people.

Given that this is Pride Month, it might be fun to make some rainbows today using Factor.

I bumped into this nice tutorial on making annoying rainbows in javascript that has some background on color theory including links to how color vision actually works as well as detailed science on light and the eye and a “better rainbow method” called the sinebow.

After implementing a lot of color support for Factor, I get nerd sniped sometimes when it comes to colors. Instead of using HSL colors, lets just use the more common RGB color model to make some rainbows!

:: rainbow-phase. ( str phase -- )
    2pi str length / :> frequency

    str >graphemes [| s i |
        frequency i * 2 + phase + sin 0.5 +
        frequency i * 0 + phase + sin 0.5 +
        frequency i * 4 + phase + sin 0.5 +
        1.0 <rgba> :> color

        s "" like H{ { foreground color } } format
    ] each-index nl ;

: rainbow. ( str -- ) 0 rainbow-phase. ;

This supports Unicode by calling >graphemes to split a word by grouping on visual characters.

And, it looks like this:

Happy Pride Month!

Fri, 2 Jun 2023 15:00:00

John Benediktsson: ZIM Builder

Apparently, it is just too much fun building tools to make an offline Wikipedia and the next thing we needed to build was a way to make offline Factor documentation. This documentation is available inside each Factor instance and generated by the Factor help system.

Since we implemented the zim vocabulary with support for reading the ZIM file format and the zim.server vocabulary with support for serving those files out as websites, the natural follow up is the zim.builder vocabulary to make the ZIM files in the first place!

Yesterday, I wrote the build-zim word that can archive all of the files in the current-directory into a ZIM file at the specified output path. Just now, I generated some “offline Factor documentation” by running this command on a docs directory holding all the HTML files uploaded by a recent nightly build:

IN: scratchpad USE: zim.builder

IN: scratchpad "resource:docs" [ "resource:docs.zim" build-zim ] with-directory

You can then run a local Factor documentation server like so:

$ ./factor -run=zim.server docs.zim

It’s interesting that our Factor documentation is 1.4 GB of HTML files, 52 MB as a docs.tar.gz, and 47 MB as a docs.zim file using Zstandard compression. It’s a cool file format for serving this type of content.

I posted a ZIM snapshot of the Factor documentation if you’d like to download it and give this a try with a recent nightly build.

Wed, 17 May 2023 14:30:00

John Benediktsson: Offline Wikipedia

Pretty much everyone agrees that Wikipedia is awesome (except maybe during one of their controversial fundraising campaigns). In addition to Wikipedia, the Wikimedia Foundation operates:

Even though the official Wikipedia iOS app and Wikipedia Android app are both great, they still require access to the internet to be useful. I am not alone when wondering how to build your own Hitchhiker’s Guide with Wikipedia and looking through the options to download a Wikipedia database.

One way you can do this is to implement support for the ZIM file format, for example using the libzim project. There are many archives available to download as a ZIM file for Wikipedia and various popular websites like StackOverflow, Project Gutenberg, and even some open source projects. You can also build your own ZIM file if you want to archive custom content.

ZIM stands for “Zeno IMproved”, as it replaces the earlier Zeno file format. Its file compression uses LZMA2, as implemented by the xz-utils library, and, more recently, Zstandard. The openZIM project is sponsored by Wikimedia CH, and supported by the Wikimedia Foundation.

Let’s implement this using Factor!

Each ZIM file starts with a header in little endian format:

PACKED-STRUCT: zim-header
    { magic-number uint32_t }
    { major-version uint16_t }
    { minor-version uint16_t }
    { uuid uint64_t[2] }
    { entry-count uint32_t }
    { cluster-count uint32_t }
    { url-ptr-pos uint64_t }
    { title-ptr-pos uint64_t }
    { cluster-ptr-pos uint64_t }
    { mime-list-ptr-pos uint64_t }
    { main-page uint32_t }
    { layout-page uint32_t }
    { checksum-pos uint64_t } ;

In addition to 16-bit, 32-bit, and 64-bit little-endian numbers, we need to be able to read null-terminated strings typically stored as UTF-8. For example, when reading the mime-type list:

: read-string ( -- str )
    { 0 } read-until 0 assert= utf8 decode ;

: read-mime-types ( -- seq )
    [ read-string dup empty? not ] [ ] produce nip ;

That’s enough to parse the header file, the list of mime-types, and the lists of pointers to urls, titles, and clusters used for indexing into the ZIM file.

TUPLE: zim path header mime-types urls titles clusters ;

: read-zim ( path -- zim )
    dup binary [
        zim-header read-struct dup {
            [ magic-number>> 0x44D495A assert= ]
                mime-list-ptr-pos>> seek-absolute seek-input
            ] [
                dup url-ptr-pos>> seek-absolute seek-input
                entry-count>> [ 8 read le> ] replicate
            ] [
                dup title-ptr-pos>> seek-absolute seek-input
                entry-count>> [ 4 read le> ] replicate
            ] [
                dup cluster-ptr-pos>> seek-absolute seek-input
                cluster-count>> [ 8 read le> ] replicate
        } cleave zim boa
    ] with-file-reader ;


There are two types of directory entries:

  1. content entries
TUPLE: content-entry mime-type parameter-len namespace
    revision cluster-number blob-number url title parameter ;

: read-content-entry ( mime-type -- content-entry )
    4 read le>
    4 read le>
    4 read le>
    content-entry boa
    dup parameter-len>> read >>parameter ;
  1. redirect entries
TUPLE: redirect-entry mime-type parameter-len namespace revision
    redirect-index url title parameter ;

: read-redirect-entry ( mime-type -- redirect-entry )
    4 read le>
    4 read le>
    redirect-entry boa
    dup parameter-len>> read >>parameter ;

The mime-type indicates which type of entry we are reading:

: read-entry ( -- entry )
    2 read le> dup 0xffff =
    [ read-redirect-entry ] [ read-content-entry ] if ;

Now we can read the entry at index n in a ZIM file:

: read-entry-index ( n zim -- entry/f )
    urls>> nth seek-absolute seek-input read-entry ;


Content is stored as clusters of data, where each cluster is a sequence of binary blobs contained at an offset into the cluster. And the cluster is stored either uncompressed or with optional compression (typically LZMA or ZStandard).

We can read the “no compression” version:

: read-cluster-none ( -- offsets blobs )
    4 read le>
    [ 4 /i 1 - [ 4 read le> ] replicate ] [ prefix ] bi
    dup [ last ] [ first ] bi - read ;

And then read the “ZStandard compression” version:

: read-cluster-zstd ( -- offsets blobs )
    zstd-uncompress-stream-frame dup uint32_t deref
    [ 4 /i uint32_t <c-direct-array> ] [ tail-slice ] 2bi
    2dup [ [ last ] [ first ] bi - ] [ length assert= ] bi* ;

The cluster can then be read by checking the compression type in use:

: read-cluster ( -- offsets blobs )
    read1 [ 5 bit? f assert= ] [ 4 bits ] bi {
        { 1 [ read-cluster-none ] }
        { 2 [ "zlib not supported" throw ] }
        { 3 [ "bzip2 not supported" throw ] }
        { 4 [ "lzma not supported" throw ] }
        { 5 [ read-cluster-zstd ] }
    } case ;

To read the blob at index n, we read the entire cluster, then offset into the blobs data:

:: read-cluster-blob ( n -- blob )
    read-cluster :> ( offsets blobs )
    0 offsets nth :> zero
    n offsets nth :> from
    n 1 + offsets nth :> to
    from to [ zero - ] bi@ blobs subseq ;

Now we can read the blob by index into a given cluster in a ZIM file:

: read-blob-index ( blob-number cluster-number zim -- blob )
    clusters>> nth seek-absolute seek-input read-cluster-blob ;

And we can read the entry content from each entry type or index:

GENERIC#: read-entry-content 1 ( entry zim -- blob mime-type )

M:: content-entry read-entry-content ( entry zim -- blob mime-type )
    entry blob-number>>
    entry cluster-number>>
    zim read-blob-index
    entry mime-type>>
    zim mime-types>> nth ;

M: redirect-entry read-entry-content
    [ redirect-index>> ] [ read-entry-content ] bi* ;

M: integer read-entry-content
    [ read-entry-index ] keep '[ _ read-entry-content ] [ f f ] if* ;

Reading the “main page” content is simple using the index stored in the ZIM header:

: read-main-page ( zim -- blob/f mime-type/f )
    [ header>> main-page>> ] [ read-entry-content ] bi ;

We can find an entry by searching using a namespace and url, taking advantage of the fact the entries are sorted by <namespace><url> to perform a binary search. Some common namespaces include:

  • A - Article
  • C - User Content
  • M - ZIM metadata
  • W - Well known entries
  • X - Search indexes
:: find-entry-url ( namespace url zim -- entry/f )
    f zim header>> entry-count>> <iota> [
        nip zim read-entry-index
        namespace over namespace>> <=>
        dup +eq+ = [ drop url over url>> <=> ] when
    ] search 2drop dup {
        [ ] [ namespace>> namespace = ] [ url>> url = ]
    } 1&& [ drop f ] unless ;

If we find the entry after searching, we can read it’s content:

: read-entry-url ( namespace url zim -- blob/f mime-type/f )
    [ find-entry-url ] keep '[ _ read-entry-content ] [ f f ] if* ;

Web Server

This is all kinda awesome, but basically these ZIM files hold HTML data for an offline instance of the various wiki-type servers. So, wouldn’t it be awesome to make a HTTP server responder that loads a ZIM file and then returns data from it on a local Factor HTTP server?


TUPLE: zim-responder zim ;

: <zim-responder> ( path -- zim-responder )
    read-zim zim-responder boa ;

M: zim-responder call-responder*
        dup { [ length 1 > ] [ first length 1 = ] } 1&&
        [ unclip-slice first ] [ CHAR: A ] if swap "/" join
    ] dip [
        zim>> dup path>> binary [
            over empty? [ read-entry-url ] [ 2nip read-main-page ] if
        ] with-file-reader
    ] bi* 2dup and [
        <content> binary >>content-encoding
    ] [
        2drop <404>
    ] if ;

We use that to make a little entry point that creates a zim-responder and then sets it as the main-responder and calls httpd to start a web server. Using the latest development version, we can run it like so:

$ ./factor -run=zim.server /path/to/wiki.zim [port]

There are few features that would be nice to add — like searching URLs, titles, and content, or dealing with split ZIM files (when over 4GB on file systems like FAT32) — but this is a pretty sweet neat new tool we have available now in a nightly build and will be released soon in Factor 0.99.

Tue, 16 May 2023 14:30:00

John Benediktsson: Calendar Ranges

A post recently titled Python’s Missing Batteries: Essential Libraries You’re Missing Out On caught my eye. One of my favorite parts about Factor is the large standard library that we ship with. Looking at blogs like these sometimes helps me notice functionality that we are missing.

One of the provided examples from the timeutils module is the daterange word that provides an iterator between a start and stop date:

start_date = date(year=2023, month=4, day=9)
end_date = date(year=2023, month=4, day=30)

for day in timeutils.daterange(start_date, end_date, step=(0, 0, 2)):
    # datetime.date(2023, 4, 9)
    # datetime.date(2023, 4, 11)
    # datetime.date(2023, 4, 13)
    # ...

I realize that although we have numeric ranges, the current support for numbers doesn’t allow extending them so that timestamp arithmetic is implicitly supported. Some future version of Factor might fix this when we finish merging support for multiple dispatch, but in the meantime I added a timestamp-range object that works identically to range but with calendar objects.

The above Python example would look something like this:

IN: scratchpad USE: calendar.ranges

IN: scratchpad 2023 4 9 <date-utc>
               2023 4 30 <date-utc>
               2 days <timestamp-range> [ . ] each
T{ timestamp { year 2023 } { month 4 } { day 9 } }
T{ timestamp { year 2023 } { month 4 } { day 11 } }
T{ timestamp { year 2023 } { month 4 } { day 13 } }

The current implementation has <timestamp-range> work the same way as <range> as it assumes an inclusive range [from,to]. Give it a try!

Tue, 9 May 2023 16:00:00

John Benediktsson: Case Conversion

One aspect of exposure to different programming languages and programmers is differing opinions on proper case conventions for class names, variable names, and other attribute names. Sometimes you want to convert between them for various reasons.

Looking around at other programming languages, you can find modules such as Change Case for Javascript, case-converter for Python, a code golf challenge, a regular expression approach to convert string to different case styles, and even a PHP module written by Jawira Portugal called Case Converter that handles quite a few, ahem, cases:

Convert strings between 13 naming conventions: Snake case, Camel case, Kebab case, Pascal case, Ada case, Train case, Cobol case, Macro case, Upper case, Lower case, Title case, Sentence case and Dot notation.

Examples of which might look something like:

  • snake_case
  • camelCase
  • kebab-case
  • PascalCase
  • Ada_Case
  • Train-Case
  • lower case
  • Title Case
  • Sentence case
  • dot.case

I thought it would be an interesting example, to make a Unicode-aware case conversion library for Factor that handles all of those same cases in a small amount of code (less than 35 lines of code!).

The first word looks for a lowercase grapheme, then finds the next one that is not lowercase:

: case-index ( str -- i/f )
    dup [ lower? ] find [
        swap [ lower? not ] find-from drop
    ] [ nip ] if ;

We can then use that method to split the graphemes at these case boundaries:

: split-case ( str -- words )
    >graphemes [ dup empty? not ] [
        dup [ case-index ] [ length or ] bi
        cut-slice swap concat
    ] produce nip ;

Splitting tokens, first on the common token separators, and then on the case boundaries.

: split-tokens ( str -- words )
    " -_." split [ split-case ] map concat ;

And now the core of the algorithm that splits an input string into tokens, with two variants (one that applies a quotation to each token and another that handles the first token differently than the rest) before joining the tokens using a provided glue character.

: case1 ( str quot glue -- str' )
    [ split-tokens ] [ map ] [ join ] tri* ; inline

: case2 ( str first-quot rest-quot glue -- str' )
        [ split-tokens 0 over ]
        [ change-nth dup rest-slice ]
        [ map! drop ]
        [ join ]
    } spread ; inline

Now that’s everything we need to implement all the case conversions!

: >camelcase ( str -- str' ) [ >lower ] [ >title ] "" case2 ;
: >pascalcase ( str -- str' ) [ >title ] "" case1 ;
: >snakecase ( str -- str' ) [ >lower ] "_" case1 ;
: >adacase ( str -- str' ) [ >title ] "_" case1 ;
: >macrocase ( str -- str' ) [ >upper ] "_" case1 ;
: >kebabcase ( str -- str' ) [ >lower ] "-" case1 ;
: >traincase ( str -- str' ) [ >title ] "-" case1 ;
: >cobolcase ( str -- str' ) [ >upper ] "-" case1 ;
: >lowercase ( str -- str' ) [ >lower ] " " case1 ;
: >uppercase ( str -- str' ) [ >upper ] " " case1 ;
: >titlecase ( str -- str' ) [ >title ] " " case1 ;
: >sentencecase ( str -- str' ) [ >title ] [ >lower ] " " case2 ;
: >dotcase ( str -- str' ) [ >lower ] "." case1 ;

These are available in the tokencase vocabulary and is included in the latest nightly builds.

Fri, 5 May 2023 13:00:00

John Benediktsson: Unicode

The Rust programming language is pretty cool. I’ve enjoyed many aspects of the Rewrite It In Rust meme that appears as a part of the Rust Evangelism Strike Force. The Rust documentation includes a pretty awesome Rust book that is probably a gold standard for programming language documentation.

In the Rust book, there is a section on Storing UTF-8 Encoded Text with Strings. It contains a neat example that I would like to use to show how Factor string objects work, how we handle Unicode and other character encodings, and show how we probably can make some improvements in the future. At the time of this blog post, we support Unicode 15.0.0 which was released in September 2022.

Factor strings are a sequence of Unicode code points which we explore to see how they work.


If we look at the Hindi word “नमस्ते” written in the Devanagari script, it is stored as a vector of u8 values that looks like this:

IN: scratchpad "नमस्ते" utf8 encode .
    224 164 168 224 164 174 224 164 184 224 165 141 224 164 164
    224 165 135

Or, as a series of hex values:

IN: scratchpad "नमस्ते" utf8 encode .h
    0xe0 0xa4 0xa8 0xe0 0xa4 0xae 0xe0 0xa4 0xb8 0xe0 0xa5 0x8d
    0xe0 0xa4 0xa4 0xe0 0xa5 0x87

You could instead print them as octal or binary quite easily.

Code Points

That’s 18 bytes and is how computers ultimately store this data. If we look at them as Unicode scalar values, which are what Rust’s char type is, those bytes look like this:

IN: scratchpad "नमस्ते" [ 1string ] { } map-as .
{ "न" "म" "स" "्" "त" "े" }

You can see what the code point numeric values are:

IN: scratchpad "नमस्ते" >array .
{ 2344 2350 2360 2381 2340 2375 }

Or even see what the code point names are:

IN: scratchpad "नमस्ते" [ char>name ] { } map-as .


There are six char values here, but the fourth and sixth are not letters: they’re diacritics that don’t make sense on their own. Finally, if we look at them as grapheme clusters, we’d get what a person would call the four letters that make up the Hindi word:

IN: scratchpad "नमस्ते" >graphemes [ >string ] map .
{ "न" "म" "स्" "ते" }

These graphemes are code points grouped like so:

IN: scratchpad "नमस्ते" >graphemes [ >array ] map .
{ { 2344 } { 2350 } { 2360 2381 } { 2340 2375 } }


Rust provides different ways of interpreting the raw string data that computers store so that each program can choose the interpretation it needs, no matter what human language the data is in.

Factor supports many encodings which can be used for interacting with other computer systems. These include ASCII, many legacy 8-bit encodings (including MacRoman, EBCDIC, and others), other Unicode variants (such as UTF-7, UTF-16, and UTF-32), ISO-2022, and several others.

There are a couple of space optimizations to save memory when only small code points are used, which is common in English as well as formats such as Base64. Looking at the Rust standard library, the improvements made to the Python unicode support, or other languages such as Strings and Characters in Swift, there are likely improvements we can make when working with text in Factor.

Thu, 4 May 2023 16:00:00

John Benediktsson: Color Picker Game

In the SwiftUI by Tutorials book by Ray Wenderlich, there is a tutorial on building RGBullsEye, which is a game for adjusting RGB Colors using sliders to match a provided random color and providing a “color score” to the user showing how well they matched it. Some users have even posted their solutions on GitHub.

I thought it would be fun to build a version of this example using Factor.

We could generate a color by using random-unit to make three random values for the red, green, and blue slots. Instead, we can pick randomly from the standard color database.

: random-color ( -- color )
    named-colors random named-color ;

Comparing two colors can use the rgba-distance word from the colors.distances vocabulary, returning an integer score out of 100 points:

: color-score ( color1 color2 -- n )
    rgba-distance 1.0 swap - 100.0 * round >integer ;

We can define a gadget type that can be used to find our object in a gadget hierarchy.

TUPLE: color-picker-game < track ;

Given a child of the color-picker-game instance, we can pull out the color-preview gadgets in a slightly fragile way by knowing where they are in the layout:

: find-color-previews ( gadget -- preview1 preview2 )
    [ color-picker-game? ] find-parent
    children>> first children>> first2 ;

Using that, we can make a button that, when clicked:

  1. finds the two color-preview objects
  2. grabs the latest color value from their models
  3. calculates the “color score”
  4. displays it by modifying the button text
: <match-button> ( -- button )
    "Match Color" [
        dup find-color-previews
        [ model>> compute-model ] bi@
        color-score "Your score: %d" sprintf
        over children>> first text<< relayout
    ] <border-button> ;

Another button can be used to reset the color we are trying to match against to a new random color, setting it on the model used by the left color-preview:

: <reset-button> ( -- button )
    "Random" [
        find-color-previews drop model>>
        random-color swap set-model
    ] <border-button> ;

Using these two buttons, and some gadgets from the color picker vocabulary, we can build up our interface, choosing a random color to start, and then laying out the other components we need:

:: <color-picker-game> ( -- gadget )
    vertical color-picker-game new-track { 5 5 } >>gap

    random-color <model>     :> left-model
    \ <rgba> <color-sliders> :> ( sliders right-model )

    horizontal <track>
        left-model <color-preview> 1/2 track-add
        right-model <color-preview> 1/2 track-add
    1 track-add

    sliders                     f track-add
    right-model <color-status>  f track-add
    <match-button>              f track-add
    <reset-button>              f track-add ;

We can make a main entry point, constructing the game and providing it as the main gadget:

MAIN-WINDOW: color-picker-game-window
    { { title "Color Picker Game" } }
    <color-picker-game> >>gadgets ;

This is available in the development version and includes some additional features such as support for additional color spaces along with some improvements to our tabbed gadgets. Give it a try!

Wed, 3 May 2023 16:00:00

John Benediktsson: OpenAI

It’s been pretty hard to avoid all of the incredible stories about artificial intelligence over the past few months. There seem to be incredible applications to the area of generative AI occurring on a daily basis. Image generation with Midjourney is pretty next-level. Code generation using GitHub Copilot seems pretty amazing. Interacting with large language models like GPT-4 or Bard or Bing Chat or Facebook LLaMA or StableLM and so many others seems like science fiction. Audio models like Whisper used for audio transcription even make popular audio assistants look pretty dated.

With all of the hype, it seemed inevitable that Factor would gain some kind of AI functionality.

Despite the non-profit vs for-profit controversy of OpenAI, they do seem to have a momentary lead in the race to make tools that others can build upon. One of those tools is the OpenAI API, which is made available using JSON and HTTP. Besides the OpenAI API Reference, there is an OpenAI Cookbook and popular libraries such as OpenAI Python for building systems using it.

Recently, I contributed the openai vocabulary which allows using all the methods currently made available by OpenAI. You will need an OpenAI API Key.

Once you obtain that, you can set it in the listener.

IN: scratchpad USE: openai

IN: scratchpad "sk-....................." openai-api-key set-global

And now you have enough to try it out…

IN: scratchpad "text-davinci-003"
               "what is the factor programming language"
               <completion> 100 >>max_tokens create-completion
               "choices" of first "text" of print

Factor is a stack-oriented programming language, designed for creating
flexible, reusable software components. It combines elements from both
object-oriented and functional programming, and provides powerful features,
including static typing and static type checking, an interactive program
development environment, built-in automated testing, and a wide range of
built-in data types. The language is designed to be easy to use, yet provide
a high degree of flexibility.


We even have a Discord bot using OpenAI that answers on the Factor Discord server.

Tue, 25 Apr 2023 08:15:00


planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp.