Rainbows are awesome, especially the ones that are double rainbows which can be quite intense when they are a double rainbow all the way across the sky. They can also be awesome when they show up as rainbow flags which are used to indicate that a place is welcome, accepting, and safe for people.
Given that this is Pride Month, it might be fun to make some rainbows today using Factor.
I bumped into this nice tutorial on making annoying rainbows in javascript that has some background on color theory including links to how color vision actually works as well as detailed science on light and the eye and a “better rainbow method” called the sinebow.
After implementing a lot of color support for Factor, I get nerd sniped sometimes when it comes to colors. Instead of using HSL colors, lets just use the more common RGB color model to make some rainbows!
:: rainbow-phase. ( str phase -- )
2pi str length / :> frequency
str >graphemes [| s i |
frequency i * 2 + phase + sin 0.5 +
frequency i * 0 + phase + sin 0.5 +
frequency i * 4 + phase + sin 0.5 +
1.0 <rgba> :> color
s "" like H{ { foreground color } } format
] each-index nl ;
: rainbow. ( str -- ) 0 rainbow-phase. ;
This supports Unicode by calling >graphemes to split a word by grouping on visual characters.
And, it looks like this:
Happy Pride Month!
Apparently, it is just too much fun building tools to make an offline Wikipedia and the next thing we needed to build was a way to make offline Factor documentation. This documentation is available inside each Factor instance and generated by the Factor help system.
Since we implemented the zim vocabulary with support for reading the ZIM file format and the zim.server vocabulary with support for serving those files out as websites, the natural follow up is the zim.builder vocabulary to make the ZIM files in the first place!
Yesterday, I wrote the build-zim
word that can archive all of the files
in the
current-directory
into a ZIM file at the specified output path. Just now, I generated some
“offline Factor documentation” by running this command on a docs
directory holding all the HTML files uploaded by a recent nightly build:
IN: scratchpad USE: zim.builder
IN: scratchpad "resource:docs" [ "resource:docs.zim" build-zim ] with-directory
You can then run a local Factor documentation server like so:
$ ./factor -run=zim.server docs.zim
It’s interesting that our Factor documentation is 1.4 GB of HTML files, 52 MB
as a docs.tar.gz
, and 47 MB as a docs.zim
file using Zstandard
compression. It’s a cool file format for
serving this type of content.
I posted a ZIM snapshot of the Factor documentation if you’d like to download it and give this a try with a recent nightly build.
Pretty much everyone agrees that Wikipedia is awesome (except maybe during one of their controversial fundraising campaigns). In addition to Wikipedia, the Wikimedia Foundation operates:
Even though the official Wikipedia iOS app and Wikipedia Android app are both great, they still require access to the internet to be useful. I am not alone when wondering how to build your own Hitchhiker’s Guide with Wikipedia and looking through the options to download a Wikipedia database.
One way you can do this is to implement support for the ZIM file format, for example using the libzim project. There are many archives available to download as a ZIM file for Wikipedia and various popular websites like StackOverflow, Project Gutenberg, and even some open source projects. You can also build your own ZIM file if you want to archive custom content.
ZIM stands for “Zeno IMproved”, as it replaces the earlier Zeno file format. Its file compression uses LZMA2, as implemented by the xz-utils library, and, more recently, Zstandard. The openZIM project is sponsored by Wikimedia CH, and supported by the Wikimedia Foundation.
Let’s implement this using Factor!
Each ZIM file starts with a header in little endian format:
PACKED-STRUCT: zim-header
{ magic-number uint32_t }
{ major-version uint16_t }
{ minor-version uint16_t }
{ uuid uint64_t[2] }
{ entry-count uint32_t }
{ cluster-count uint32_t }
{ url-ptr-pos uint64_t }
{ title-ptr-pos uint64_t }
{ cluster-ptr-pos uint64_t }
{ mime-list-ptr-pos uint64_t }
{ main-page uint32_t }
{ layout-page uint32_t }
{ checksum-pos uint64_t } ;
In addition to 16-bit, 32-bit, and 64-bit little-endian numbers, we need to be able to read null-terminated strings typically stored as UTF-8. For example, when reading the mime-type list:
: read-string ( -- str )
{ 0 } read-until 0 assert= utf8 decode ;
: read-mime-types ( -- seq )
[ read-string dup empty? not ] [ ] produce nip ;
That’s enough to parse the header file, the list of mime-types, and the lists of pointers to urls, titles, and clusters used for indexing into the ZIM file.
TUPLE: zim path header mime-types urls titles clusters ;
: read-zim ( path -- zim )
dup binary [
zim-header read-struct dup {
[ magic-number>> 0x44D495A assert= ]
[
mime-list-ptr-pos>> seek-absolute seek-input
read-mime-types
] [
dup url-ptr-pos>> seek-absolute seek-input
entry-count>> [ 8 read le> ] replicate
] [
dup title-ptr-pos>> seek-absolute seek-input
entry-count>> [ 4 read le> ] replicate
] [
dup cluster-ptr-pos>> seek-absolute seek-input
cluster-count>> [ 8 read le> ] replicate
]
} cleave zim boa
] with-file-reader ;
There are two types of directory entries:
TUPLE: content-entry mime-type parameter-len namespace
revision cluster-number blob-number url title parameter ;
: read-content-entry ( mime-type -- content-entry )
read1
read1
4 read le>
4 read le>
4 read le>
read-string
read-string
f
content-entry boa
dup parameter-len>> read >>parameter ;
TUPLE: redirect-entry mime-type parameter-len namespace revision
redirect-index url title parameter ;
: read-redirect-entry ( mime-type -- redirect-entry )
read1
read1
4 read le>
4 read le>
read-string
read-string
f
redirect-entry boa
dup parameter-len>> read >>parameter ;
The mime-type indicates which type of entry we are reading:
: read-entry ( -- entry )
2 read le> dup 0xffff =
[ read-redirect-entry ] [ read-content-entry ] if ;
Now we can read the entry at index n
in a ZIM file:
: read-entry-index ( n zim -- entry/f )
urls>> nth seek-absolute seek-input read-entry ;
Content is stored as clusters of data, where each cluster is a sequence of binary blobs contained at an offset into the cluster. And the cluster is stored either uncompressed or with optional compression (typically LZMA or ZStandard).
We can read the “no compression” version:
: read-cluster-none ( -- offsets blobs )
4 read le>
[ 4 /i 1 - [ 4 read le> ] replicate ] [ prefix ] bi
dup [ last ] [ first ] bi - read ;
And then read the “ZStandard compression” version:
: read-cluster-zstd ( -- offsets blobs )
zstd-uncompress-stream-frame dup uint32_t deref
[ 4 /i uint32_t <c-direct-array> ] [ tail-slice ] 2bi
2dup [ [ last ] [ first ] bi - ] [ length assert= ] bi* ;
The cluster can then be read by checking the compression type in use:
: read-cluster ( -- offsets blobs )
read1 [ 5 bit? f assert= ] [ 4 bits ] bi {
{ 1 [ read-cluster-none ] }
{ 2 [ "zlib not supported" throw ] }
{ 3 [ "bzip2 not supported" throw ] }
{ 4 [ "lzma not supported" throw ] }
{ 5 [ read-cluster-zstd ] }
} case ;
To read the blob at index n
, we read the entire cluster, then offset into
the blobs data:
:: read-cluster-blob ( n -- blob )
read-cluster :> ( offsets blobs )
0 offsets nth :> zero
n offsets nth :> from
n 1 + offsets nth :> to
from to [ zero - ] bi@ blobs subseq ;
Now we can read the blob by index into a given cluster in a ZIM file:
: read-blob-index ( blob-number cluster-number zim -- blob )
clusters>> nth seek-absolute seek-input read-cluster-blob ;
And we can read the entry content from each entry type or index:
GENERIC#: read-entry-content 1 ( entry zim -- blob mime-type )
M:: content-entry read-entry-content ( entry zim -- blob mime-type )
entry blob-number>>
entry cluster-number>>
zim read-blob-index
entry mime-type>>
zim mime-types>> nth ;
M: redirect-entry read-entry-content
[ redirect-index>> ] [ read-entry-content ] bi* ;
M: integer read-entry-content
[ read-entry-index ] keep '[ _ read-entry-content ] [ f f ] if* ;
Reading the “main page” content is simple using the index stored in the ZIM header:
: read-main-page ( zim -- blob/f mime-type/f )
[ header>> main-page>> ] [ read-entry-content ] bi ;
We can find an entry by searching using a namespace
and url
, taking
advantage of the fact the entries are sorted by <namespace><url>
to perform
a binary
search. Some
common namespaces include:
A
- ArticleC
- User ContentM
- ZIM metadataW
- Well known entriesX
- Search indexes:: find-entry-url ( namespace url zim -- entry/f )
f zim header>> entry-count>> <iota> [
nip zim read-entry-index
namespace over namespace>> <=>
dup +eq+ = [ drop url over url>> <=> ] when
] search 2drop dup {
[ ] [ namespace>> namespace = ] [ url>> url = ]
} 1&& [ drop f ] unless ;
If we find the entry after searching, we can read it’s content:
: read-entry-url ( namespace url zim -- blob/f mime-type/f )
[ find-entry-url ] keep '[ _ read-entry-content ] [ f f ] if* ;
This is all kinda awesome, but basically these ZIM files hold HTML data for an offline instance of the various wiki-type servers. So, wouldn’t it be awesome to make a HTTP server responder that loads a ZIM file and then returns data from it on a local Factor HTTP server?
Yes!
TUPLE: zim-responder zim ;
: <zim-responder> ( path -- zim-responder )
read-zim zim-responder boa ;
M: zim-responder call-responder*
[
dup { [ length 1 > ] [ first length 1 = ] } 1&&
[ unclip-slice first ] [ CHAR: A ] if swap "/" join
] dip [
zim>> dup path>> binary [
over empty? [ read-entry-url ] [ 2nip read-main-page ] if
] with-file-reader
] bi* 2dup and [
<content> binary >>content-encoding
] [
2drop <404>
] if ;
We use that to make a little entry point that creates a zim-responder
and
then sets it as the
main-responder
and calls
httpd to
start a web server. Using the latest development
version, we can run it like so:
$ ./factor -run=zim.server /path/to/wiki.zim [port]
There are few features that would be nice to add — like searching URLs, titles, and content, or dealing with split ZIM files (when over 4GB on file systems like FAT32) — but this is a pretty sweet neat new tool we have available now in a nightly build and will be released soon in Factor 0.99.
A post recently titled Python’s Missing Batteries: Essential Libraries You’re Missing Out On caught my eye. One of my favorite parts about Factor is the large standard library that we ship with. Looking at blogs like these sometimes helps me notice functionality that we are missing.
One of the provided examples from the
timeutils module
is the daterange
word that provides an iterator between a start
and
stop
date:
start_date = date(year=2023, month=4, day=9)
end_date = date(year=2023, month=4, day=30)
for day in timeutils.daterange(start_date, end_date, step=(0, 0, 2)):
print(repr(day))
# datetime.date(2023, 4, 9)
# datetime.date(2023, 4, 11)
# datetime.date(2023, 4, 13)
# ...
I realize that although we have numeric
ranges, the
current support for
numbers doesn’t
allow extending them so that timestamp
arithmetic
is implicitly supported. Some future version of Factor might fix this when
we finish merging support for multiple
dispatch, but in the
meantime I added a timestamp-range
object that works identically to
range
but with
calendar
objects.
The above Python example would look something like this:
IN: scratchpad USE: calendar.ranges
IN: scratchpad 2023 4 9 <date-utc>
2023 4 30 <date-utc>
2 days <timestamp-range> [ . ] each
T{ timestamp { year 2023 } { month 4 } { day 9 } }
T{ timestamp { year 2023 } { month 4 } { day 11 } }
T{ timestamp { year 2023 } { month 4 } { day 13 } }
...
The current
implementation
has <timestamp-range>
work the same way as <range>
as it assumes an
inclusive range [from,to]
. Give it a try!
One aspect of exposure to different programming languages and programmers is differing opinions on proper case conventions for class names, variable names, and other attribute names. Sometimes you want to convert between them for various reasons.
Looking around at other programming languages, you can find modules such as Change Case for Javascript, case-converter for Python, a code golf challenge, a regular expression approach to convert string to different case styles, and even a PHP module written by Jawira Portugal called Case Converter that handles quite a few, ahem, cases:
Convert strings between 13 naming conventions: Snake case, Camel case, Kebab case, Pascal case, Ada case, Train case, Cobol case, Macro case, Upper case, Lower case, Title case, Sentence case and Dot notation.
Examples of which might look something like:
snake_case
camelCase
kebab-case
PascalCase
Ada_Case
Train-Case
COBOL-CASE
MACRO_CASE
UPPER CASE
lower case
Title Case
Sentence case
dot.case
I thought it would be an interesting example, to make a Unicode-aware case conversion library for Factor that handles all of those same cases in a small amount of code (less than 35 lines of code!).
The first word looks for a lowercase grapheme, then finds the next one that is not lowercase:
: case-index ( str -- i/f )
dup [ lower? ] find [
swap [ lower? not ] find-from drop
] [ nip ] if ;
We can then use that method to split the graphemes at these case boundaries:
: split-case ( str -- words )
>graphemes [ dup empty? not ] [
dup [ case-index ] [ length or ] bi
cut-slice swap concat
] produce nip ;
Splitting tokens, first on the common token separators, and then on the case boundaries.
: split-tokens ( str -- words )
" -_." split [ split-case ] map concat ;
And now the core of the algorithm that splits an input string into tokens, with
two variants (one that applies a
quotation
to each token and another that handles the first token differently than the
rest) before
joining the
tokens using a provided glue
character.
: case1 ( str quot glue -- str' )
[ split-tokens ] [ map ] [ join ] tri* ; inline
: case2 ( str first-quot rest-quot glue -- str' )
{
[ split-tokens 0 over ]
[ change-nth dup rest-slice ]
[ map! drop ]
[ join ]
} spread ; inline
Now that’s everything we need to implement all the case conversions!
: >camelcase ( str -- str' ) [ >lower ] [ >title ] "" case2 ;
: >pascalcase ( str -- str' ) [ >title ] "" case1 ;
: >snakecase ( str -- str' ) [ >lower ] "_" case1 ;
: >adacase ( str -- str' ) [ >title ] "_" case1 ;
: >macrocase ( str -- str' ) [ >upper ] "_" case1 ;
: >kebabcase ( str -- str' ) [ >lower ] "-" case1 ;
: >traincase ( str -- str' ) [ >title ] "-" case1 ;
: >cobolcase ( str -- str' ) [ >upper ] "-" case1 ;
: >lowercase ( str -- str' ) [ >lower ] " " case1 ;
: >uppercase ( str -- str' ) [ >upper ] " " case1 ;
: >titlecase ( str -- str' ) [ >title ] " " case1 ;
: >sentencecase ( str -- str' ) [ >title ] [ >lower ] " " case2 ;
: >dotcase ( str -- str' ) [ >lower ] "." case1 ;
These are available in the tokencase vocabulary and is included in the latest nightly builds.
The Rust programming language is pretty cool. I’ve enjoyed many aspects of the Rewrite It In Rust meme that appears as a part of the Rust Evangelism Strike Force. The Rust documentation includes a pretty awesome Rust book that is probably a gold standard for programming language documentation.
In the Rust book, there is a section on Storing UTF-8 Encoded Text with Strings. It contains a neat example that I would like to use to show how Factor string objects work, how we handle Unicode and other character encodings, and show how we probably can make some improvements in the future. At the time of this blog post, we support Unicode 15.0.0 which was released in September 2022.
Factor strings are a sequence of Unicode code points which we explore to see how they work.
If we look at the Hindi word “नमस्ते” written in the Devanagari script, it is stored as a vector of
u8
values that looks like this:
IN: scratchpad "नमस्ते" utf8 encode .
B{
224 164 168 224 164 174 224 164 184 224 165 141 224 164 164
224 165 135
}
Or, as a series of hex values:
IN: scratchpad "नमस्ते" utf8 encode .h
B{
0xe0 0xa4 0xa8 0xe0 0xa4 0xae 0xe0 0xa4 0xb8 0xe0 0xa5 0x8d
0xe0 0xa4 0xa4 0xe0 0xa5 0x87
}
You could instead print them as octal or binary quite easily.
That’s 18 bytes and is how computers ultimately store this data. If we look at them as Unicode scalar values, which are what Rust’s
char
type is, those bytes look like this:
IN: scratchpad "नमस्ते" [ 1string ] { } map-as .
{ "न" "म" "स" "्" "त" "े" }
You can see what the code point numeric values are:
IN: scratchpad "नमस्ते" >array .
{ 2344 2350 2360 2381 2340 2375 }
Or even see what the code point names are:
IN: scratchpad "नमस्ते" [ char>name ] { } map-as .
{
"devanagari-letter-na"
"devanagari-letter-ma"
"devanagari-letter-sa"
"devanagari-sign-virama"
"devanagari-letter-ta"
"devanagari-vowel-sign-e"
}
There are six char values here, but the fourth and sixth are not letters: they’re diacritics that don’t make sense on their own. Finally, if we look at them as grapheme clusters, we’d get what a person would call the four letters that make up the Hindi word:
IN: scratchpad "नमस्ते" >graphemes [ >string ] map .
{ "न" "म" "स्" "ते" }
These graphemes are code points grouped like so:
IN: scratchpad "नमस्ते" >graphemes [ >array ] map .
{ { 2344 } { 2350 } { 2360 2381 } { 2340 2375 } }
Rust provides different ways of interpreting the raw string data that computers store so that each program can choose the interpretation it needs, no matter what human language the data is in.
Factor supports many encodings which can be used for interacting with other computer systems. These include ASCII, many legacy 8-bit encodings (including MacRoman, EBCDIC, and others), other Unicode variants (such as UTF-7, UTF-16, and UTF-32), ISO-2022, and several others.
There are a couple of space optimizations to save memory when only small code points are used, which is common in English as well as formats such as Base64. Looking at the Rust standard library, the improvements made to the Python unicode support, or other languages such as Strings and Characters in Swift, there are likely improvements we can make when working with text in Factor.
In the SwiftUI by Tutorials book by Ray Wenderlich, there is a tutorial on building RGBullsEye, which is a game for adjusting RGB Colors using sliders to match a provided random color and providing a “color score” to the user showing how well they matched it. Some users have even posted their solutions on GitHub.
I thought it would be fun to build a version of this example using Factor.
We could generate a
color by
using
random-unit
to make three random values for the red
, green
, and blue
slots. Instead, we can pick randomly from the standard color database.
: random-color ( -- color )
named-colors random named-color ;
Comparing two colors can use the rgba-distance word from the colors.distances vocabulary, returning an integer score out of 100 points:
: color-score ( color1 color2 -- n )
rgba-distance 1.0 swap - 100.0 * round >integer ;
We can define a gadget type that can be used to find our object in a gadget hierarchy.
TUPLE: color-picker-game < track ;
Given a child of the color-picker-game
instance, we can pull out the
color-preview
gadgets in a slightly fragile way by knowing where they
are in the layout:
: find-color-previews ( gadget -- preview1 preview2 )
[ color-picker-game? ] find-parent
children>> first children>> first2 ;
Using that, we can make a button that, when clicked:
color-preview
objects: <match-button> ( -- button )
"Match Color" [
dup find-color-previews
[ model>> compute-model ] bi@
color-score "Your score: %d" sprintf
over children>> first text<< relayout
] <border-button> ;
Another button can be used to reset the color we are trying to match against
to a new random color, setting it on the model used by the left
color-preview
:
: <reset-button> ( -- button )
"Random" [
find-color-previews drop model>>
random-color swap set-model
] <border-button> ;
Using these two buttons, and some gadgets from the color picker vocabulary, we can build up our interface, choosing a random color to start, and then laying out the other components we need:
:: <color-picker-game> ( -- gadget )
vertical color-picker-game new-track { 5 5 } >>gap
random-color <model> :> left-model
\ <rgba> <color-sliders> :> ( sliders right-model )
horizontal <track>
left-model <color-preview> 1/2 track-add
right-model <color-preview> 1/2 track-add
1 track-add
sliders f track-add
right-model <color-status> f track-add
<match-button> f track-add
<reset-button> f track-add ;
We can make a main entry point, constructing the game and providing it as the main gadget:
MAIN-WINDOW: color-picker-game-window
{ { title "Color Picker Game" } }
<color-picker-game> >>gadgets ;
This is available in the development version and includes some additional features such as support for additional color spaces along with some improvements to our tabbed gadgets. Give it a try!
It’s been pretty hard to avoid all of the incredible stories about artificial intelligence over the past few months. There seem to be incredible applications to the area of generative AI occurring on a daily basis. Image generation with Midjourney is pretty next-level. Code generation using GitHub Copilot seems pretty amazing. Interacting with large language models like GPT-4 or Bard or Bing Chat or Facebook LLaMA or StableLM and so many others seems like science fiction. Audio models like Whisper used for audio transcription even make popular audio assistants look pretty dated.
With all of the hype, it seemed inevitable that Factor would gain some kind of AI functionality.
Despite the non-profit vs for-profit controversy of OpenAI, they do seem to have a momentary lead in the race to make tools that others can build upon. One of those tools is the OpenAI API, which is made available using JSON and HTTP. Besides the OpenAI API Reference, there is an OpenAI Cookbook and popular libraries such as OpenAI Python for building systems using it.
Recently, I contributed the openai vocabulary which allows using all the methods currently made available by OpenAI. You will need an OpenAI API Key.
Once you obtain that, you can set it in the listener.
IN: scratchpad USE: openai
IN: scratchpad "sk-....................." openai-api-key set-global
And now you have enough to try it out…
IN: scratchpad "text-davinci-003"
"what is the factor programming language"
<completion> 100 >>max_tokens create-completion
"choices" of first "text" of print
Factor is a stack-oriented programming language, designed for creating
flexible, reusable software components. It combines elements from both
object-oriented and functional programming, and provides powerful features,
including static typing and static type checking, an interactive program
development environment, built-in automated testing, and a wide range of
built-in data types. The language is designed to be easy to use, yet provide
a high degree of flexibility.
Cool!
We even have a Discord bot using OpenAI that answers on the Factor Discord server.
planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp.