[ planet-factor ]

John Benediktsson: Neovim

Neovim is a modern implementation of a vim-like editor. It started as a refactor, but “not a rewrite but a continuation and extension of Vim”. It does have some ability to load plugins built in Vimscript, but most new plugins seem to be written using the Lua programming language.

Factor has many different editor integrations supporting various text editors as well as plugins for some that provide additional features. One of these is the factor.vim plugin for Vim, which I happen to use frequently.

In the Big Omarchy 2.0 Tour, DHH presents about the Omarchy customization of Linux. I noticed that they have a pretty nice Neovim integration, particularly with the system themes. It turns out to be based somewhat on the Lazyvim system.

In any event, I wondered about how easy it would be to make a Neovim plugin for Factor. It isn’t fully necessary as there is support for Vimscript plugins and the Factor one works pretty well out of the box. However, I thought I’d ask Claude Code to go off and YOLO an implementation based on the existing one. Thankfully this was not a FAFO moment, and after a few cycles it came back with something that mostly works!

This is available in the factor.nvim repository and should be pretty easy to integrate into your Neovim setup. Perhaps give it a try and see what you think? I’ve been using it and it seems to work pretty well.

Wed, 27 Aug 2025 15:00:00

John Benediktsson: New Icon

Encouraged by a fun rant about MacOS Tahoe’s Dead-Canary Utility App Icons, the reality that Apple is moving into the wonderful squircle-filled future, and the particularly annoying fact that legacy icons look terrible on macOS Tahoe – we have a new icon for Factor!

The latest development version includes new icon files in both PNG and SVG formats. And these are being used across macOS, Windows, and Linux builds. And, it might be burying the lede, but this is a particularly good time to do this as we finally have high-resolution “HiDPI” support working on Windows and Linux.

The next release is likely to be a good one!

Wed, 27 Aug 2025 01:00:00

John Benediktsson: TA-Lib

The TA-Lib is a C project that supports adding “technical analysis to your own financial market trading applications”. It was originally created in 2001 and is well-tested, recently released, and popular:

200 indicators such as ADX, MACD, RSI, Stochastic, Bollinger Bands etc… See complete list…

Candlestick patterns recognition

Core written in C/C++ with API also available for Python.

Open-Source (BSD License). Can be freely integrated in your own open-source or commercial applications.

Of course, I wanted to be able to call the library using Factor. We have a C library interface that makes it pretty easy to interface with C libraries.

First, we add the library we expect to load:

<< "ta-lib" {
    { [ os windows? ] [ "libta-lib.dll" ] }
    { [ os macos?   ] [ "libta-lib.dylib" ] }
    { [ os unix?    ] [ "libta-lib.so" ] }
} cond cdecl add-library >>

LIBRARY: ta-lib

Then, we can define some types and some library functions to calculate the relative strength index:

TYPEDEF: int TA_RetCode

FUNCTION: TA_RetCode TA_RSI ( int startIdx, int endIdx, double* inReal, int optInTimePeriod, int* outBegIdx, int* outNBElement, double* outReal )
FUNCTION: int TA_RSI_Lookback ( int optInTimePeriod )

We use a simple code generator to define all the functions, as well as wrappers that can be used to call it:

:: RSI ( real timeperiod -- real )
    0 int <ref> :> outbegidx
    0 int <ref> :> outnbelement
    real check-array :> inreal
    inreal length :> len
    inreal check-begidx1 :> begidx
    len 1 - begidx - :> endidx
    timeperiod TA_RSI_Lookback begidx + :> lookback
    len lookback make-double-array :> outreal
    0 endidx inreal begidx tail-slice timeperiod outbegidx outnbelement outreal lookback tail-slice TA_RSI ta-check-success
    outreal ;

And, now we can use it!

IN: scratchpad 10 10 randoms 3 RSI .
double-array{
    0/0.
    0/0.
    0/0.
    50.0
    62.16216216216216
    31.506849315068497
    46.38069705093834
    32.33644859813084
    59.75541967759867
    66.53570603189276
}

You’ll note that the first few values are 0/0. which represents a NaN when we don’t have enough data to compute an answer – either because we are in the lookback phase or because the inputs have NaNs.

For convenience, we convert the inputs to double-array to perform the calculation, but if the input is already a double-array then there is not any data conversion cost.

There are some advanced techniques including use of the Abstract API for meta-programming, default values for parameters, candlestick settings, streaming indicator support, and documentation that we probably should think about adding as well.

This is available on my GitHub.

Sun, 24 Aug 2025 15:00:00

John Benediktsson: String Length

I was reminded recently about a great article about unicode string lengths:

It’s Not Wrong that "🤦🏼‍♂️".length == 7

But It’s Better that "🤦🏼‍♂️".len() == 17 and Rather Useless that len("🤦🏼‍♂️") == 5

This comes at a time of excessive emoji tsunami thanks to the proliferation of large language models and probably lots of Gen Z in the training data sets. Sometimes emojis are fun and useful like in Base256Emoji and sometimes it can get carried away like in the Emoji Kitchen.

I have written about Factor’s unicode support before and wanted to use this example to show a bit more about how Factor represents text using the Unicode standard.

IN: scratchpad "🤦" length .
1

IN: scratchpad "🤦🏼‍♂️" length .
5

Wat.

Well, what is happening is that the current strings vocabulary stores Unicode code points. This can be both useful and useless depending on the task at hand. We can print out which ones are used in this example:

IN: scratchpad "🤦🏼‍♂️" [ char>name . ] each
"face-palm"
"emoji-modifier-fitzpatrick-type-3"
"zero-width-joiner"
"male-sign"
"variation-selector-16"

When a developer expresses a need to store or retrieve textual data, they likely need to know about character encodings. In this case, we can see the number of bytes required to store this string in different encodings:

IN: scratchpad "🤦🏼‍♂️" utf8 encode length .
17

IN: scratchpad "🤦🏼‍♂️" utf16 encode length .
16

IN: scratchpad "🤦🏼‍♂️" utf32 encode length .
24

But, what if we just want to know how many visual characters are in the string?

IN: scratchpad "🤦🏼‍♂️" >graphemes length .
1

This is covered in The Absolute Minimum Every Software Developer Must Know About Unicode in 2023, which is also a great article and covers this as well as a number of other aspects of the Unicode standard.

Sat, 23 Aug 2025 15:00:00

John Benediktsson: Anubis

Tavis Ormany wrote a great blog post about the Anubis project asking a very valid sounding question:

Hey… quick question, why are anime catgirls blocking my access to the Linux kernel?

The answer seems to be that it “weighs the soul of your connection using one or more challenges in order to protect upstream resources from scraper bots”. In particular, this project is an attempt to fight the AI scraperbot scourge which is making many popular websites annoying to use these days and spawning a kind of arms race amongst website owners, content delivery networks, and well-funded and morally-ambiguous AI firms.

Tavis goes into great detail about the estimated costs and inconvenience of this approach — and why it might likely inconvenience scraper bots which use many different IP addresses more than normal traffic which typically does not — as well as how the proof-of-work methodology is implemented using a solution written in the C programming language.

Without going into the safety versus security debate (a focus of the discussion on Lobsters), I thought I would show how to implement this using Factor.

How does Anubis work?

The Anubis challenge is a message, for example 5d737f0600ff2dd, which we use as a prefix while trying up to 262144 different nonce suffixes to find a SHA-256 hash that starts with 16 bits of zero.

The Anubis response in this case is 47224, which has a SHA-256 hash starting with 0000:

$ printf "5d737f0600ff2dd%d" 47224 | sha256sum
000043f7c4392a781a04419a7cb503089ebcf3164e2b1d4258b3e6c15b8b07f1  -

Solving in Factor

Factor includes support for SHA-2 checksums that we can use to solve the example puzzle:

USING: checksums checksums.sha kernel math math.parser sequences ;

: find-anubis ( message -- nonce anubis )
    18 2^ <iota> [
        >dec append sha-256 checksum-bytes
        [ B{ 0 0 } head? ] keep f ?
    ] with map-find swap ;

And a test showing that it works:

{
    47224
    "000043f7c4392a781a04419a7cb503089ebcf3164e2b1d4258b3e6c15b8b07f1"
} [ "5d737f0600ff2dd" find-anubis bytes>hex-string ] unit-test

But, I noticed that it’s not that fast:

IN: scratchpad [ "5d737f0600ff2dd" find-anubis ] time
Running time: 0.216027939 seconds

Solving in Factor using C

Tanis used some functions from the OpenSSL library to compute the checksum.

We can take the same approach using the C library interface. It would be great to be able to parse header files and make this a little simpler, but for now we can define these C functions that we would like to call:

LIBRARY: libcrypto

STRUCT: SHA256_CTX
    { h uint[8] }
    { Nl uint }
    { Nh uint }
    { data uint[16] }
    { num uint }
    { md_len uint } ;

FUNCTION: int SHA256_Init ( SHA256_CTX* c )
FUNCTION: int SHA256_Update ( SHA256_CTX* c, void* data, size_t len )
FUNCTION: int SHA256_Final ( uchar* md, SHA256_CTX* c )

And then build the same C program in Factor:

:: find-anubis ( message -- nonce anubis )
    SHA256_CTX new :> base
    base SHA256_Init 1 assert=
    base message binary encode dup length SHA256_Update 1 assert=

    32 <byte-array> :> hash
    18 2^ <iota> [
        base clone :> ctx
        ctx swap >dec binary encode dup length SHA256_Update 1 assert=
        hash ctx SHA256_Final 1 assert=
        hash B{ 0 0 } head?
    ] find nip hash ;

Is it fast?

IN: scratchpad [ "5d737f0600ff2dd" find-anubis ] time
Running time: 0.009508132 seconds

Sure is!

How does that compare to the original C program?

$ gcc -Ofast -march=native anubis-miner.c -lcrypto -o anubis-miner
$ time ./anubis-miner 5d737f0600ff2dd
47224

real    0m0.005s
user    0m0.003s
sys     0m0.002s

Pretty favorably!

Solving in Factor using C approach

Part of the reason the C approach is fast, is that it hashes the message and then only has to hash the additional bytes of the nonce and check the result passes. We can try this, by cloning our sha-256-state and checking on each iteration whether it passes the test:

USING: checksums checksums.sha kernel math math.parser sequences ;

: find-anubis ( message -- nonce anubis )
    sha-256 initialize-checksum-state swap add-checksum-bytes
    18 2^ <iota> [
        [ clone ] dip >dec add-checksum-bytes
        get-checksum [ B{ 0 0 } head? ] keep f ?
    ] with map-find swap ;

Is that faster?

IN: scratchpad [ "5d737f0600ff2dd" find-anubis ] time
Running time: 0.183254613 seconds

A little bit. But what’s the problem?

IN: scratchpad [ "5d737f0600ff2dd" find-anubis ] profile

IN: scratchpad top-down profile.
depth   time ms  GC %  JIT %  FFI %   FT %
  13     183.0   0.00   0.00  13.11   0.00 T{ thread f "Initial" ~quotation~ ~quotation~ 19 ~box~ f t f H{...
  14     149.0   0.00   0.00   8.72   0.00   M\ sha-256-state get-checksum
  15      85.0   0.00   0.00   2.35   0.00     M\ sha2-short checksum-block
  16      27.0   0.00   0.00   7.41   0.00       4be>
  17      10.0   0.00   0.00   0.00   0.00         M\ virtual-sequence nth-unsafe
  18       8.0   0.00   0.00   0.00   0.00           M\ slice virtual@
  19       6.0   0.00   0.00   0.00   0.00             +
  17       6.0   0.00   0.00   0.00   0.00         M\ fixnum shift
  17       4.0   0.00   0.00  50.00   0.00         fixnum-shift
  17       4.0   0.00   0.00   0.00   0.00         M\ byte-array nth-unsafe
  17       1.0   0.00   0.00   0.00   0.00         bitor
  17       1.0   0.00   0.00   0.00   0.00         >
  17       1.0   0.00   0.00   0.00   0.00         M\ slice length
  16       7.0   0.00   0.00   0.00   0.00       M\ fixnum integer>fixnum
  16       5.0   0.00   0.00   0.00   0.00       M\ fixnum >fixnum
  16       4.0   0.00   0.00   0.00   0.00       be>
  17       2.0   0.00   0.00   0.00   0.00         M\ slice length
  16       2.0   0.00   0.00   0.00   0.00       (byte-array)
  16       2.0   0.00   0.00   0.00   0.00       M\ slice length
  16       1.0   0.00   0.00   0.00   0.00       c-ptr?
  16       1.0   0.00   0.00   0.00   0.00       M\ fixnum integer>fixnum-strict
  15      35.0   0.00   0.00  25.71   0.00     pad-last-block
  16      26.0   0.00   0.00  26.92   0.00       %
  17       7.0   0.00   0.00  85.71   0.00         set-alien-unsigned-1
  17       7.0   0.00   0.00   0.00   0.00         M\ growable nth-unsafe
  18       1.0   0.00   0.00   0.00   0.00           M\ byte-vector underlying>>
  17       4.0   0.00   0.00   0.00   0.00         M\ growable set-nth-unsafe
  18       1.0   0.00   0.00   0.00   0.00           M\ byte-vector underlying>>
  17       2.0   0.00   0.00  50.00   0.00         resize-byte-array
  17       1.0   0.00   0.00   0.00   0.00         M\ growable lengthen
  18       1.0   0.00   0.00   0.00   0.00           M\ byte-vector length>>
  17       1.0   0.00   0.00   0.00   0.00         M\ growable length
  17       1.0   0.00   0.00   0.00   0.00         M\ byte-array set-nth-unsafe
  17       1.0   0.00   0.00   0.00   0.00         integer?
  16       5.0   0.00   0.00   0.00   0.00       >slow-be
  17       2.0   0.00   0.00   0.00   0.00         M\ fixnum shift
  17       1.0   0.00   0.00   0.00   0.00         fixnum-shift
  17       1.0   0.00   0.00   0.00   0.00         M\ fixnum integer>fixnum
  16       1.0   0.00   0.00 100.00   0.00       <byte-array>
  16       1.0   0.00   0.00 100.00   0.00       set-alien-unsigned-1
  16       1.0   0.00   0.00   0.00   0.00       M\ growable set-nth
  17       1.0   0.00   0.00   0.00   0.00         M\ byte-vector underlying>>
  16       1.0   0.00   0.00   0.00   0.00       ,
  17       1.0   0.00   0.00   0.00   0.00         assoc-stack
  18       1.0   0.00   0.00   0.00   0.00           M\ hashtable at*
  15      16.0   0.00   0.00   6.25   0.00     >slow-be
  16       8.0   0.00   0.00   0.00   0.00       M\ fixnum shift
  16       2.0   0.00   0.00  50.00   0.00       (byte-array)
  16       1.0   0.00   0.00   0.00   0.00       fixnum-shift
  16       1.0   0.00   0.00   0.00   0.00       <
  16       1.0   0.00   0.00   0.00   0.00       M\ fixnum integer>fixnum
  15       3.0   0.00   0.00   0.00   0.00     M\ chunking nth-unsafe
  16       2.0   0.00   0.00   0.00   0.00       M\ groups group@
  17       1.0   0.00   0.00   0.00   0.00         M\ fixnum min
  17       1.0   0.00   0.00   0.00   0.00         *
  15       2.0   0.00   0.00   0.00   0.00     >be
  15       1.0   0.00   0.00   0.00   0.00     M\ sha2-state H>>
  15       1.0   0.00   0.00 100.00   0.00     fixnum/i
  15       1.0   0.00   0.00   0.00   0.00     M\ sha2-state clone
  16       1.0   0.00   0.00   0.00   0.00       M\ sha2-state H<<
  15       1.0   0.00   0.00   0.00   0.00     M\ uint-array nth-unsafe
  14      23.0   0.00   0.00  30.43   0.00   M\ block-checksum-state add-checksum-bytes
  15      18.0   0.00   0.00  27.78   0.00     >byte-vector
  16       7.0   0.00   0.00   0.00   0.00       M\ virtual-sequence nth-unsafe
  17       1.0   0.00   0.00   0.00   0.00         M\ slice virtual@
  18       1.0   0.00   0.00   0.00   0.00           +
  16       5.0   0.00   0.00  80.00   0.00       set-alien-unsigned-1
  16       2.0   0.00   0.00   0.00   0.00       M\ byte-array nth-unsafe
  16       2.0   0.00   0.00   0.00   0.00       M\ growable nth-unsafe
  17       2.0   0.00   0.00   0.00   0.00         M\ byte-vector underlying>>
  16       1.0   0.00   0.00 100.00   0.00       (byte-array)
  16       1.0   0.00   0.00   0.00   0.00       M\ slice length
  15       2.0   0.00   0.00 100.00   0.00     fixnum/i
  15       1.0   0.00   0.00   0.00   0.00     M\ string nth-unsafe
  15       1.0   0.00   0.00   0.00   0.00     >
  15       1.0   0.00   0.00   0.00   0.00     number=
  14       4.0   0.00   0.00   0.00   0.00   head?
  15       1.0   0.00   0.00   0.00   0.00     M\ byte-array length
  15       1.0   0.00   0.00   0.00   0.00     >
  15       1.0   0.00   0.00   0.00   0.00     M\ byte-array nth-unsafe
  15       1.0   0.00   0.00   0.00   0.00     integer?
  14       3.0   0.00   0.00  66.67   0.00   M\ fixnum positive>dec
  15       2.0   0.00   0.00 100.00   0.00     <string>
  14       2.0   0.00   0.00 100.00   0.00   M\ sha2-state clone
  15       1.0   0.00   0.00 100.00   0.00     M\ checksum-state clone
  16       1.0   0.00   0.00 100.00   0.00       (clone)
  15       1.0   0.00   0.00 100.00   0.00     M\ uint-array clone
  16       1.0   0.00   0.00 100.00   0.00       (clone)
  14       1.0   0.00   0.00   0.00   0.00   M\ integer >base
  14       1.0   0.00   0.00   0.00   0.00   reverse!

Visualizing the profile using the flamegraph vocabulary allows us to dig a little bit further:

Looks like a lot of generic dispatch, inefficient byte swapping, memory allocations, and type conversions. Probably this could be made much faster by looking into how we handle block checksums.

PRs welcome!

Thu, 21 Aug 2025 15:00:00

John Benediktsson: Left to Right

An article about Left to Right Programming was posted a few days ago with a good discussions on Hacker News and on Lobsters. It’s a nice read with some syntax examples in different languages looking at some code blocks that are structured left-to-right or right-to-left.

We can look at a few of the shared examples and think about how they might look like naturally in Factor, which inherits a natural data flow style due to the nature of being a concatenative language.

The Challenge

The blog post discusses Graic’s 2024 Advent of Code solution, written in Python:

len(list(filter(lambda line: all([abs(x) >= 1 and abs(x) <= 3 for x in line]) and (all([x > 0 for x in line]) or all([x < 0 for x in line])), diffs)))

And compares it to an equivalent improved form in JavaScript:

diffs.filter(line => 
    line.every(x => Math.abs(x) >= 1 && Math.abs(x) <= 3) &&
    (line.every(x => x > 0) || line.every(x => x < 0))
).length;

There’s nothing quite like syntax wars – the nerd version of the linguistic wars – to get people interested in a topic. It is only one dimension, but perhaps the most visible one, to evaluate a programming language on.

I usually get excited for any code that solves a problem, and I give kudos to Graic for their efforts solving the Advent of Code! It’s often only working code that we can make iterative improvements upon, and that should be appreciated.

The Response

On Hacker News, someone shared a version using Python’s list comprehensions:

len([line for line in diffs
     if all(1 <= abs(x) <= 3 for x in line)
     and (all(x > 0 for x in line) or all(x < 0 for x in line))])

As well as a direct translation in Rust:

diffs.iter().filter(|line| {
    line.iter().all(|x| x.abs() >= 1 && x.abs() <= 3) &&
    (line.iter().all(|x| x > 0) || line.iter().all(|x| x < 0))
}).count()

And a single pass version in Rust with improved performance:

diffs.iter().filter(|line| {
    let mut iter = line.iter();
    let range = match iter.next() {
        Some(-3..=-1) => -3..=-1,
        Some(1..=3) => 1..=3,
        Some(_) => return false,
        None => return true,
    };
    iter.all(|x| range.contains(x))
}).count()

Someone else shared a version using numpy arrays:

sum(1 for line in diffs
    if ((np.abs(line) >= 1) & (np.abs(line) <= 3)).all()
       and ((line > 0).all() or (line < 0).all()))

And another comment shared a version in Kotlin:

diffs.countIf { line -> 
    line.all { abs(it) in 1..3 } and ( 
        line.all { it > 0} or
        line.all { it < 0}
    )
}

There was also a shared version in Python perhaps a bit more idiomatic:

sum(map(lambda line: all(1 <= abs(x) <= 3 for x in line)
                     and (all(x > 0 for x in line) or all(x < 0 for x in line)),
        diffs))

What about Factor?

As you might imagine, I was also curious about what this would look like in Factor.

Directly translating using local variables does up to three passes through the line:

[| line |
    line [ abs 1 3 between? ] all?
    line [ 0 > ] all?
    line [ 0 < ] all? or and
] count

However, it would be better if we only check against zero if the first check passes:

[| line |
    line [ abs 1 3 between? ] all? [
        line [ 0 > ] all?
        line [ 0 < ] all? or
    ] [ f ] if
] count

And, despite still being three passes, it is better if we only check negative if the positive check fails:

[| line |
    line [ abs 1 3 between? ] all? [
        line [ 0 > ] all? [ t ] [
            line [ 0 < ] all?
        ] if
    ] [ f ] if
] count

We can do these short-circuiting checks using a short-circuit combinator:

[
    {
        [ [ abs 1 3 between? ] all? ]
        [ { [ [ 0 > ] all? ] [ [ 0 < ] all? ] } 1|| ]
    } 1&&
] count

Checking that the sign of all the numbers are the same only does two passes through the line:

[| line
    line empty? [ t ] [
        line [ abs 1 3 between? ] all?
        line unclip sgn '[ sgn _ = ] all? and
    ] if
] count

Comparing the first value to subsequent values does a single pass through the line:

[
    [ t ] [
        unclip { [ abs 1 3 between? ] [ sgn ] } 1&& [
            '[ { [ abs 1 3 between? ] [ sgn ] } 1&& _ = ] all?
        ] [ drop f ] if*
    ] if-empty
] count

We often encourage writing combinators to do algorithmic things:

:: all-same? ( seq quot: ( elt -- obj/f ) -- ? )
    seq [ t ] [
        unclip quot call [ '[ quot call _ = ] all? ] [ drop f ] if*
    ] if-empty ; inline

Which makes for a satisfyingly simple version:

[ [ { [ abs 1 3 between? ] [ sgn ] } 1&& ] all-same? ] count

We could even do something like the Rust version above, getting the endpoints from the first value to check that the subsequent ones match:

[
    [ t ] [
        unclip {
            { [ dup 1 3 between? ] [ drop 1 3 ] }
            { [ dup -3 -1 between? ] [ drop -3 -1 ] }
            [ drop f f ]
        } cond [ '[ _ _ between? ] all? ] [ nip ] if*
    ] if-empty
] count

And that simplifies even more if we use range syntax:

[
    [ t ] [
        unclip { 1 ..= 3 -3 ..= -1 } [ in? ] with find nip
        [ '[ _ in? ] all? ] [ drop f ] if*
    ] if-empty
] count

As usual, there is more than one way to do it, and that’s okay.

Are any of these best? How else might we write this better?

Tue, 19 Aug 2025 15:00:00

John Benediktsson: Pickle

Pretty much everything pickle is great: sweet, dill, bread and butter, full sour, half sour, gherkins, achar, even pickleball. In addition to being both yummy and fun and a great Tuesday night on the Playa, pickle is also the name for Python object serialization.

There are currently 6 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

  • Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.
  • Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
  • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
  • Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7.
  • Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.
  • Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5.

While recently learning about how the pickle protocol works, I was able to build a basic unpickler in Factor. The implementation is about 300 lines of code, and has some decent tests. There are a few more features we should add for completeness, but it’s a good start!

I thought I’d go over a few parts of the implementation here.

The pickle protocol is stack-based, which we represent by a growable vector, and uses a memoization feature to refer to objects by integer keys when they repeat in the data stream, which we store in a hashtable:

CONSTANT: stack V{ }

CONSTANT: memo H{ }

ERROR: invalid-memo key ;

: get-memo ( i -- )
    memo ?at [ stack push ] [ invalid-memo ] if ;

: put-memo ( i -- )
    [ stack last ] dip memo set-at ;

It also has the concept of markers which can be placed using the +marker+ symbol and then used, for example, to pop all items on the stack until the last marker was seen:

SYMBOL: +marker+

: pop-from-marker ( -- items )
    +marker+ stack last-index
    [ 1 + stack swap tail ] [ stack shorten ] bi ;

Unpickling starts with a dispatch loop that acts on each supported opcode. We can use a +no-return+ symbol to indicate that we are not ready to return an object until the STOP opcode is seen.

ERROR: invalid-opcode opcode ;

SYMBOL: +no-return+

: unpickle-dispatch ( opcode -- value )
    +no-return+ swap {
        ! Protocol 0 and 1
        { CHAR: ( [ load-mark ] }
        { CHAR: . [ drop stack pop ] }
        { CHAR: 0 [ load-pop ] }
        { CHAR: 1 [ load-pop-mark ] }
        { CHAR: 2 [ load-dup ] }
        { CHAR: F [ load-float ] }
        { CHAR: I [ load-int ] }
        { CHAR: J [ load-binint ] }
        { CHAR: K [ load-binint1 ] }
        { CHAR: L [ load-long ] }
        { CHAR: M [ load-binint2 ] }
        { CHAR: N [ load-none ] }
        { CHAR: P [ load-persid ] }
        { CHAR: Q [ load-binpersid ] }
        { CHAR: R [ load-reduce ] }
        { CHAR: S [ load-string ] }
        { CHAR: T [ load-binstring ] }
        { CHAR: U [ load-short-binstring ] }
        { CHAR: V [ load-unicode ] }
        { CHAR: X [ load-binunicode ] }
        { CHAR: a [ load-append ] }
        { CHAR: b [ load-build ] }
        { CHAR: c [ load-global ] }
        { CHAR: d [ load-dict ] }
        { CHAR: } [ load-empty-dict ] }
        { CHAR: e [ load-appends ] }
        { CHAR: g [ load-get ] }
        { CHAR: h [ load-binget ] }
        { CHAR: i [ load-inst ] }
        { CHAR: j [ load-long-binget ] }
        { CHAR: l [ load-list ] }
        { CHAR: ] [ load-empty-list ] }
        { CHAR: o [ load-obj ] }
        { CHAR: p [ load-put ] }
        { CHAR: q [ load-binput ] }
        { CHAR: r [ load-long-binput ] }
        { CHAR: s [ load-setitem ] }
        { CHAR: t [ load-tuple ] }
        { CHAR: ) [ load-empty-tuple ] }
        { CHAR: u [ load-setitems ] }
        { CHAR: G [ load-binfloat ] }

        ! Protocol 2
        { 0x80 [ load-proto ] }
        { 0x81 [ load-newobj ] }
        { 0x82 [ load-ext1 ] }
        { 0x83 [ load-ext2 ] }
        { 0x84 [ load-ext4 ] }
        { 0x85 [ load-tuple1 ] }
        { 0x86 [ load-tuple2 ] }
        { 0x87 [ load-tuple3 ] }
        { 0x88 [ load-true ] }
        { 0x89 [ load-false ] }
        { 0x8a [ load-long1 ] }
        { 0x8b [ load-long4 ] }

        ! Protocol 3 (Python 3.x)
        { CHAR: B [ load-binbytes ] }
        { CHAR: C [ load-short-binbytes ] }

        ! Protocol 4 (Python 3.4-3.7)
        { 0x8c [ load-short-binunicode ] }
        { 0x8d [ load-binunicode8 ] }
        { 0x8e [ load-binbytes8 ] }
        { 0x8f [ load-empty-set ] }
        { 0x90 [ load-additems ] }
        { 0x91 [ load-frozenset ] }
        { 0x92 [ load-newobj-ex ] }
        { 0x93 [ load-stack-global ] }
        { 0x94 [ load-memoize ] }
        { 0x95 [ load-frame ] }

        ! Protocol 5 (Python 3.8+)
        { 0x96 [ load-bytearray8 ] }
        { 0x97 [ load-readonly-buffer ] }
        { 0x98 [ load-next-buffer ] }

        [ invalid-opcode ]
    } case ;

With that, we can build our unpickle word that acts on an input-stream, first clearing state and then looping until we see an object to return:

: unpickle ( -- obj )
    stack delete-all memo clear-assoc
    f [ drop read1 unpickle-dispatch dup +no-return+ = ] loop ;

For convenience, a pickle> word acts on concrete data:

GENERIC: pickle> ( string -- obj )

M: string pickle> [ unpickle ] with-string-reader ;

M: byte-array pickle> binary [ unpickle ] with-byte-reader ;

In addition, we needed to support Python’s string escapes which are slightly different than the ones that Factor defines – mainly \u#### and \U########, and then add support for some of the basic class types that we might encounter such as byte-arrays, decimals, timestamps, etc.

We currently do not support: persistent id’s, readonly vs read/write buffers, out-of-band buffers, the object build opcode, and the extension registry. And of course, this is just unpickling, we do not yet support pickling of Factor objects, although that shouldn’t be too hard to add.

But, despite that, it works pretty well!

Here’s an example where we store some mixed data in a pickles file with Python:

>>> data = ["abc", 123, 4.56, {"a":1+5j}, {17,37,52}]

>>> import pickle

>>> with open('pickles', 'wb') as f:
...     pickle.dump(data, f)
...

And then look at and then load that pickles file with Factor!

IN: scratchpad USE: tools.hexdump

IN: scratchpad "pickles" hexdump-file
00000000  80 04 95 54 00 00 00 00 00 00 00 5d 94 28 8c 03  ...T.......].(..
00000010  61 62 63 94 4b 7b 47 40 12 3d 70 a3 d7 0a 3d 7d  abc.K{G@.=p...=}
00000020  94 8c 01 61 94 8c 08 62 75 69 6c 74 69 6e 73 94  ...a...builtins.
00000030  8c 07 63 6f 6d 70 6c 65 78 94 93 94 47 3f f0 00  ..complex...G?..
00000040  00 00 00 00 00 47 40 14 00 00 00 00 00 00 86 94  .....G@.........
00000050  52 94 73 8f 94 28 4b 11 4b 34 4b 25 90 65 2e     R.s..(K.K4K%.e.
0000005f

IN: scratchpad USE: pickle

IN: scratchpad "pickles" binary file-contents pickle> .
V{ "abc" 123 4.56 H{ { "a" C{ 1.0 5.0 } } } HS{ 17 52 37 } }

This is available in the latest development version.

Mon, 18 Aug 2025 15:00:00

John Benediktsson: Marp

Marp, also known as the Markdown Presentation Ecosystem, is a way to “create beautiful slide decks using an intuitive Markdown experience”.

If you’ve seen a presentation about Factor before, you might notice that we have a slides vocabulary that allows us to build presentations in Factor and then present it using the Factor UI. Many of our previous talks are shared in the factor-talks repository, including the slides for the SVFIG talk.

Today, I thought it would be fun to merge these two concepts together, and allow us to build slides in Factor but do the presentation using Marp.

What does a Factor slide look like?

Let’s start by examining a $slide, and see what it looks like:

{ $slide "Quotations"
    "Quotation: un-named blocks of code"
    { $code "[ \"Hello, World\" print ]" }
    "Combinators: words taking quotations"
    { $code "10 dup 0 < [ 1 - ] [ 1 + ] if ." }
    { $code "{ -1 1 -2 0 3 } [ 0 max ] map ." }
}

As with most businessy slides, it starts with a title, and then has a sequence of various blocks to render.

What would a Marp slide look like?

We can manually translate this to a similar-looking slide using Markdown:

---

# Quotations
- Quotation: un-named blocks of code
```factor
[ "Hello, World" print ]
```

- Combinators: words taking quotations
```factor
10 dup 0 < [ 1 - ] [ 1 + ] if .
```

```factor
{ -1 1 -2 0 3 } [ 0 max ] map .
```

Can we automate this?

Of course!

Our slides vocabulary uses elements from the help system to provide markup (which is how we render it in the Factor user interface). These elements are specified as a kind of array, with typing provided by their first argument.

We can leverage this to manually support a few types:

GENERIC: write-marp ( element -- )

M: string write-marp write ;

M: array write-marp
    unclip {
        { \ $slide [ write-slide ] }
        { \ $code [ write-code ] }
        { \ $link [ write-link ] }
        { \ $vocab-link [ write-vocab-link ] }
        { \ $url [ write-url ] }
        { \ $snippet [ write-snippet ] }
        [ write-marp [ write-marp ] each ]
    } case ;

Using that, we can create a Marp file, with some chosen style:

: write-marp-file ( slides -- )
    "---
marp: true
theme: gaia
paginate: true
backgroundColor: #1e1e2e
color: #cdd6f4
style: |
  section {
    font-family: 'SF Pro Display', 'Segoe UI', sans-serif;
  }
  h1 {
    color: #89b4fa;
  }
  h2 {
    color: #94e2d5;
  }
  h3 {
    color: #f5c2e7;
  }
  code {
    background-color: #313244;
    color: #cdd6f4;
    border-radius: 0.25em;
  }
  pre {
    background-color: #313244;
    border-radius: 0.5em;
  }
  ul {
    list-style: none;
    padding-left: 0;
  }
  ul li::before {
    content: \"▸ \";
    color: #89b4fa;
    font-weight: bold;
  }" print [ write-marp ] each ;

And now we can use that to generate a Marp file of our talk!

IN: scratchpad "~/FACTOR.md" utf8 [
                   svfig-slides write-marp-file
               ] with-file-writer

And then use the Marp CLI to convert it to HTML and open in a browser!

$ marp FACTOR.md
[  INFO ] Converting 1 markdown...
[  INFO ] FACTOR.md => FACTOR.html

$ open -a Safari FACTOR.html

And then view the slides, embedded below for convenience:

This is available on my GitHub.

Fri, 15 Aug 2025 15:00:00

Blogroll


planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp.

Syndicate