[ planet-factor ]

John Benediktsson: SHA-256 from URL

Álvaro Ramírez wrote a blog post about generating a SHA-256 hash from URL, the easy way where they describe wanting to download a file and generate a SHA-256 hash of the contents easily. Their solution involves copying a URL and then having some Emacs Lisp be able to read the clipboard, download the file, then generate and return the hash on the clipboard.

I thought I’d show how this can be done in Factor, by breaking the problem into smaller parts.

USING: checksums checksums.sha http.client io.directories io.files.temp kernel
math.parser namespaces sequences ui.clipboards ;

The first step is downloading a file to a temporary file, returning the path of the downloaded file:

: download-to-temp ( url -- path )
    dup download-name temp-file [
        [ ?delete-file ] [ download-to ] bi
    ] keep ;

The next step is to build a word that applies a checksum to the downloaded file contents:

: checksum-url ( url checksum -- value )
    [ download-to-temp ] [ checksum-file ] bi* ;

The last step is to use the clipboard to access the URL that was copied – checking minimally that it looks like an http or https URL – and then putting the checksum value back onto the clipboard:

: checksum-clipboard ( checksum -- )
    clipboard get clipboard-contents
    dup "http" head? [ throw ] unless
    swap checksum-url bytes>hex-string
    clipboard get set-clipboard-contents ;

This could be improved with better error checking, and maybe cleaning up the temporary file that was downloaded after running the checksum.

Give it a try!

IN: scratchpad sha-256 checksum-clipboard

Tue, 12 Sep 2023 15:00:00

John Benediktsson: Sequence Case

Some languages allow ranges to be used in switch statements. For example, like in this Swift code:

let count = 3_000_000_000_000
let countedThings = "stars in the Milky Way"
var naturalCount: String
switch count {
case 0:
    naturalCount = "no"
case 1...3:
    naturalCount = "a few"
case 4...9:
    naturalCount = "several"
case 10...99:
    naturalCount = "tens of"
case 100...999:
    naturalCount = "hundreds of"
case 1000...999_999:
    naturalCount = "thousands of"
default:
    naturalCount = "millions and millions of"
}
println("There are \(naturalCount) \(countedThings).")

Let’s build this functionality in Factor!

Range Syntax

First, let’s look at how we can construct a range.

1000 999,999 [a..b]

If we wanted to use that in a case statement, we could wrap it with a literal:

{ $[ 1000 999,999 [a..b] ] [ "thousands of" ] }

But that’s not that elegant, instead let’s define some syntax words to construct our range:

SYNTAX: ..= dup pop scan-object [a..b] suffix! ;

This is much cleaner:

{ 1000 ..= 999,999 [ "thousands of" ] }

Combinators

In Factor, we have combinators which are, at some level, just words that take code as input. We sometimes refer to these as higher-level concepts, but they allow us to be more expressive with fewer tokens.

Let’s build our combinator using a macro to transform some input cases:

MACRO: sequence-case ( assoc -- quot )
    [
        dup callable? [
            [ first dup set? [ in? ] [ = ] ? '[ dup _ @ ] ]
            [ second '[ drop @ ] ] bi 2array
        ] unless
    ] map [ cond ] curry ;

Now we can write a Factor version of the original Swift example above:

3,000,000,000,000 {
    { 0 [ "no" ] }
    { 1 ..= 3 [ "a few" ] }
    { 4 ..= 9 [ "several" ] }
    { 10 ..= 99 [ "tens of" ] }
    { 100 ..= 999 [ "hundreds of" ] }
    { 1000 ..= 999,999 [ "thousands of" ] }
    [ drop "millions and millions of" ]
} sequence-case "There are %s stars in the Milky Way.\n" printf

Cool!

This is available in the combinators.extras vocabulary in the development version of Factor.

Sun, 10 Sep 2023 15:00:00

John Benediktsson: ASCII Art

Raymond Hettinger, an active contributor to Python and responsible for many useful improvements to the language, likes to tweet short and sweet bits of useful Python knowledge. I stumbled into one fun one-liner from long ago:

I thought I would show how it might translate into Factor. To translate this code into a concatenative language, we are going to work from the inside and move out.

Step 1

We start with the sum of the cartesian product of eight sequences of the numbers 0 through 5:

map(sum, product(range(6), repeat=8))

vs.

8 6 <iota> <repetition> [ sum ] product-map

Step 2

Next, we see that it counts each element, and produces a sorted list of items:

sorted(Counter(...).items())

vs.

histogram sort-keys

Step 3

And finally, create a string of lines of stars and print it:

print "\n".join('*'*(c//2000) for i,c in ...)

vs.

values [ 2000 /i CHAR: * <string> ] map "\n" join print

Solution

Putting it all together:

IN: scratchpad USING: assocs io math math.statistics sequences
               sequences.product sorting strings ;

IN: scratchpad 8 6 <iota> <repetition> [ sum ] product-map
               histogram sort-keys values
               [ 2000 /i CHAR: * <string> ] map "\n" join print

It makes this nice ASCII Art visualization:

*
***
*****
********
************
******************
*************************
********************************
*****************************************
*************************************************
********************************************************
**************************************************************
******************************************************************
*******************************************************************
******************************************************************
**************************************************************
********************************************************
*************************************************
*****************************************
********************************
*************************
******************
************
********
*****
***
*

Fri, 8 Sep 2023 15:00:00

John Benediktsson: LEB128

LEB128 – Little Endian Base 128 – is a variable-length encoding format designed to store arbitrarily large integers in a small number of bytes. There are two variations: unsigned LEB128 and signed LEB128. These vary slightly, so a user program that wants to decode LEB128 values would explictly choose the appropriate unsigned or signed methods.

Recently, I implemented these in Factor and wanted to describe the implementation:

Decoding

For decoding unsigned LEB128 values, we accumulate the integer 7 bits at a time:

: uleb128> ( byte-array -- n )
    0 [ [ 7 bits ] [ 7 * shift ] bi* + ] reduce-index ;

For decoding signed LEB128 values, we do the same thing, but the last byte indicates if the value was negative, and if it was we bring the sign bit back in:

: leb128> ( byte-array -- n )
    [ uleb128> ] keep dup last 6 bit?
    [ length 7 * 2^ neg bitor ] [ drop ] if ;

Encoding

For encoding unsigned LEB128 values, we output in 7-bit segments, and then for the final segment use the eighth bit to indicate the end of the stream of bytes.

:: >uleb128 ( n -- byte-array )
    BV{ } clone :> accum
    n assert-non-negative [
        [ -7 shift dup zero? not ] [ 7 bits ] bi
        over [ 0x80 bitor ] when accum push
    ] loop drop accum B{ } like ;

For encoding signed LEB128 values, we output in 7-bit segments, but our exit condition depends on reaching the end of the integer and conditionally whether the sixth bit was set in that segment.

:: >leb128 ( n -- byte-array )
    BV{ } clone :> accum
    n [
        [ -7 shift dup ] [ 7 bits ] bi :> ( i b )
        {
            { [ i  0 = ] [ b 6 bit? not ] }
            { [ i -1 = ] [ b 6 bit? ] }
            [ f ]
        } cond b over [ 0x80 bitor ] when accum push
    ] loop drop accum B{ } like ;

Testing

Some test cases for unsigned LEB128:

[ -1 >uleb128 ] [ non-negative-number-expected? ] must-fail-with
{ B{ 255 255 127 } } [ 0x1fffff >uleb128 ] unit-test
{ 0x1fffff } [ B{ 255 255 127 } uleb128> ] unit-test
{ B{ 0xe5 0x8e 0x26 } } [ 624485 >uleb128 ] unit-test
{ 624485 } [ B{ 0xe5 0x8e 0x26 } uleb128> ] unit-test

Some test cases for signed LEB128:

{ B{ 255 255 255 0 } } [ 0x1fffff >leb128 ] unit-test
{ 0x1fffff } [ B{ 255 255 255 0 } leb128> ] unit-test
{ B{ 0xc0 0xbb 0x78 } } [ -123456 >leb128 ] unit-test
{ -123456 } [ B{ 0xc0 0xbb 0x78 } leb128> ] unit-test

Performance

This is available in a recent nightly build, with some support for reading and writing LEB128 values to streams, and some performance improvements by specifying some type information so the compiler can understand we are working with integers and byte-arrays and produce more optimal code.

My current laptop decodes and encodes around 40 million per second, which is pretty good for a dynamic language. There are some references in the fast decoding section to some papers that present some SIMD techniques called “Masked VByte” and “Stream VByte” for significantly improving upon this simple scalar implementation.

Sun, 3 Sep 2023 15:00:00

John Benediktsson: Periodic Table

Despite the difficulty that I had with freshman chemistry, I’ve always enjoyed learning about the periodic table, writing blog posts about making words with element symbols, and I do think it would be awesome to have a giant wall-mounted periodic table like Bill Gates has in his office.

The other day I wanted to demonstrate the UI framework by making another simple UI gadget. It occurred to me that we could make a simple periodic table as an example.

The code is about 140 lines to define all the elements and their groups, 10 lines to organize them into a table, and about 40 lines to build the element boxes, put them into a gadget, and add a legend at the bottom. Each element is a <roll-button> that opens the appropriate Wikipedia page for each chemical element.

I was reminded of this recently by a funny comment on the Factor Discord:

Factor crushing the competition thanks to periodic-table vocab

https://codegolf.stackexchange.com/questions/264714/is-it-an-element/264723#264723

You can view the source code for the periodic table vocabulary or try it yourself:

IN: scratchpad "periodic-table" run

Fri, 1 Sep 2023 23:00:00

John Benediktsson: RGBA Clock

Today, I’d like to build a little UI gadget clock that changes color as the time changes in Factor.

Unix time is measured as the time – usually seconds – since the Unix epoch at 00:00:00 UTC on January 1, 1970. It is used on many systems and, perhaps, will cause Year 2038 problems on some when it exceeds the maximum value of a signed 32-bit integer.

This is perfect – we need a way to convert a timestamp into a rgba color – we can calculate a unix time in integer seconds and then take each 8-bit segment to refer to a red, green, blue, and alpha value.

: timestamp>rgba ( timestamp -- color/f )
    timestamp>unix-time >integer
    24 2^ /mod 16 2^ /mod 8 2^ /mod
    [ 255 /f ] 4 napply <rgba> ;

We can then extend a UI label to have a timer that starts when the gadget becomes visible and updates its background colors based on the current time and chooses an appropriate foreground text color to match. For that we can use the contrast-text-color word from the colors.contrast vocabulary to select either white-over-background or black-over-background depending on the relative luminance of the background color.

TUPLE: rgba-clock < label timer ;

M: rgba-clock graft*
    [ timer>> start-timer ] [ call-next-method ] bi ;

M: rgba-clock ungraft*
    [ timer>> stop-timer ] [ call-next-method ] bi ;

: update-colors ( color label -- )
    [ [ contrast-text-color ] dip font>> foreground<< ]
    [ [ <solid> ] dip interior<< ] 2bi ;

: <rgba-clock> ( -- gadget )
    "99:99:99" rgba-clock new-label
        monospace-font >>font
        dup '[
            _ now
            [ timestamp>hms >>string ]
            [ timestamp>rgba swap update-colors ] bi
        ] f 1 seconds <timer> >>timer ;

And then we can try it out!

IN: scratchpad <rgba-clock> gadget.

This is on my GitHub.

Tue, 29 Aug 2023 14:00:00

John Benediktsson: Drunken Bishop

The OpenSSH project is a widely available tool for working with the SSH protocol in a variety of ways on a variety of operating systems. Their project description states:

OpenSSH is the premier connectivity tool for remote login with the SSH protocol. It encrypts all traffic to eliminate eavesdropping, connection hijacking, and other attacks. In addition, OpenSSH provides a large suite of secure tunneling capabilities, several authentication methods, and sophisticated configuration options.

One of the interesting features that it contains is a method of visualizing public key fingerprints to allow a user to more easily see that a key has changed by examining a visual output that looks something like this:

+----[RSA 2048]---+
|        . o.+o  .|
|     . + *  +o...|
|      + * .. ... |
|       o + .   . |
|        S o   .  |
|         o     . |
|          .     o|
|               .o|
|               Eo|
+------[MD5]------+

This is the Drunken Bishop algorithm, a variant of a technique called random art that was originally described in the paper Hash Visualization: a New Technique to improve Real-World Security. You can see more information about it in The drunken bishop: An analysis of the OpenSSH fingerprint visualization algorithm.

This OpenSSH feature is controlled by the VisualHostKey flag:

VisualHostKey

If this flag is set to yes, an ASCII art representation of the remote host key fingerprint is printed in addition to the fingerprint string at login and for unknown host keys. If this flag is set to no (the default), no fingerprint strings are printed at login and only the fingerprint string will be printed for unknown host keys.

This can be enabled by adding to your ~/.ssh/config:

VisualHostKey yes

Or by adding this option in your ssh command:

$ ssh -o VisualHostKey=yes your.host.name

Implementation

We are going to be implementing this in the Factor programming language.

The algorithm begins by defining a visual board – by default 9 rows by 17 columns – and a starting position in the middle of the board. Each 8-bit byte of input is split into 2-bit groups which indicate where the bishop moves:

Value Direction
00
01
10
11

As the bishop moves around the board diagonally, it increments a counter in each cell. However, the bishop cannot move through the walls on the edge of the board, remaining stuck in whichever direction is blocked.

CONSTANT: WIDTH 17
CONSTANT: HEIGHT 9

:: drunken-bishop ( bytes -- board )
    HEIGHT [ WIDTH 0 <array> ] replicate :> board
    HEIGHT 2/ :> y!
    WIDTH 2/ :> x!

    15 x y board nth set-nth ! starting position

    bytes [
        { 0 -2 -4 -6 } [
            shift 2 bits {
                { 0b00 [ -1 -1 ] }
                { 0b01 [ -1  1 ] }
                { 0b10 [  1 -1 ] }
                { 0b11 [  1  1 ] }
            } case :> ( dy dx )
            dy y + 0 HEIGHT 1 - clamp y!
            dx x + 0 WIDTH 1 - clamp x!
            x y board nth [ dup 14 < [ 1 + ] when ] change-nth
        ] with each
    ] each

    16 x y board nth set-nth ! ending position

    board ;

The output is rendered using an .o+=*BOX@%&#/^ alphabet with S for the starting position and E for the ending position specially rendered:

CONSTANT: SYMBOLS " .o+=*BOX@%&#/^SE"

: drunken-bishop. ( bytes -- )
    drunken-bishop [ SYMBOLS nths print ] each ;

We can try this out and see that it works:

IN: scratchpad "fc94b0c1e5b0987c5843997697ee9fb7"
               hex-string>bytes drunken-bishop.
       .=o.  .   
     . *+*. o    
      =.*..o     
       o + ..    
        S o.     
         o  .    
          .  . . 
              o .
               E.

This is available in the drunken-bishop vocabulary in a recent development version.

Sun, 27 Aug 2023 15:00:00

John Benediktsson: Next Combination

Eleven years ago, I blogged about computing the next permutation of a sequence.

As part of some performance improvements to the math.combinatorics vocabulary that I was making around the same time, I needed a way of calculating the indices of the next combination of a given sequence.

Implementation

Without going into a great explanation, we split the problem into a few steps, operating on a sequence of indices and modifying in place – either incrementing the indices after the “max index” or the last index.

: find-max-index ( seq n -- i )
    over length - '[ _ + >= ] find-index drop ; inline

: increment-rest ( i seq -- )
    [ nth-unsafe ] [ swap index-to-tail <slice-unsafe> ] 2bi
    [ drop 1 + dup ] map! 2drop ; inline

: increment-last ( seq -- )
    [ index-of-last [ 1 + ] change-nth-unsafe ] unless-empty ; inline

:: next-combination ( seq n -- )
    seq n find-max-index [
        1 [-] seq increment-rest
    ] [
        seq increment-last
    ] if* ; inline

We can show how it works by using clone to see the intermediate results of “5 choose 3”:

IN: scratchpad { 0 1 2 } 10 [
                   [ clone 5 next-combination ] keep
               ] replicate nip .
{
    { 0 1 2 }
    { 0 1 3 }
    { 0 1 4 }
    { 0 2 3 }
    { 0 2 4 } 
    { 0 3 4 }
    { 1 2 3 }
    { 1 2 4 }
    { 1 3 4 }
    { 2 3 4 }
}

This is used to implement our various combinations words: each-combination, map-combinations, filter-combinations, all-combinations, find-combination, reduce-combinations, etc.

Benchmarking

The benchmark I have chosen is one that intentionally creates a large array of almost 65 million items with the result of allocating all combinations of 200 items choosing 4 items at a time.

We can run this benchmark on an Apple Mac mini (2018) with a 3.2 GHz 6-Core Intel Core i7 processor and 16 GB of memory that is used as part of the Factor nightly build farm.

Factor 0.99 takes 8.3 seconds:

IN: scratchpad [ 200 <iota> 4 all-combinations length ] time .
Running time: 8.269610590999999 seconds

Additional information was collected.
dispatch-stats.  - Print method dispatch statistics
gc-events.       - Print all garbage collection events
gc-stats.        - Print breakdown of different garbage collection events
gc-summary.      - Print aggregate garbage collection statistics
64684950

Look at the gc-summary. to see that more than half the time (60% or 4.9 seconds) is involved in garbage collection as part of allocating the almost 65 million results:

IN: scratchpad gc-summary.
Collections:          1,480
Cards scanned:        7,070,518
Decks scanned:        9,124
Code blocks scanned:  2
Total time:           4,910,722 µs
Card scan time:       3,256,711 µs
Code block scan time: 162 µs
Marking time:         0 µs
Data heap sweep time: 0 µs
Code heap sweep time: 0 µs
Data compaction time: 0 µs

Other Languages

On that same machine we can try some other programming languages and see how they compare, attempting to get the same behavior as the Factor example above.

Python 3.11.4 takes 8.4 seconds:

In [1]: from itertools combinations

In [2]: %time print(len(list(combinations(range(200), 4))))
64684950
CPU times: user 5.64 s, sys: 2.77 s, total: 8.41 s
Wall time: 8.44 s

PyPy 7.3.12 (compatible with Python 3.10) takes 17 seconds:

>>>> from itertools import combinations
>>>> from time import time
>>>> t0 = time(); print(len(list(combinations(range(200), 4)))); time() - t0
64684950
17.00031805038452

Ruby 3.2.2 takes 37.4 seconds:

irb(main):001:0> measure
TIME is added.
=> nil

irb(main):002:0> (0...200).to_a.combination(4).to_a.length
processing time: 37.413635s
=> 64684950

Julia 1.9.2 takes 10.7 seconds:

julia> ]add Combinatorics

julia> using Combinatorics

julia> @time length(collect(combinations(range(1, 200), 4)))
 10.668982 seconds (129.37 M allocations: 8.193 GiB, 37.74% gc time)
64684950

Crystal 1.9.2 takes 515 seconds (interpreted) or 13.7 seconds (compiled):

# Run crystal interpreted
$ crystal i
icr:1> t0 = Time.utc;
       puts (0...200).to_a.combinations(4).to_a.size;
       Time.utc - t0
64684950 => 00:08:34.728442000

# Make a crystal source file
$ echo "t0 = Time.utc
puts (0...200).to_a.combinations(4).to_a.size
puts (Time.utc - t0)" > combo.cr

# Run crystal compiled
$ crystal combo.cr
64684950
00:00:13.739838000

Clojure 1.11.1 takes 10.7 seconds:

user=> (ns user (:use [clojure.math.combinatorics]))

user=> (time (count (combinations (range 0 200) 4)))
"Elapsed time: 10685.687736 msecs"
64684950

Rakudo v2023.08 takes 808.9 seconds:

$ time raku -e "print (1..200).combinations(4).Array.elems"
759.45s user 44.32s system 99% cpu 13:28.86 total
64684950

Considering that we only take the length of the large array of combinations, it is a bit artificial as a benchmark, with room for various optimizations, but it does seem to highlight both algorithmic and garbage collector issues in various languages.

Factor compares pretty favorably!

Fri, 25 Aug 2023 15:00:00

Blogroll


planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp.

Syndicate