While looking into the multibase group of self-identifying base encodings, I discovered Base256Emoji which is an encoding format described by an emoji to use for each byte of an input buffer. This spec is implemented, for example, in both Go and Rust
Despite replacing each byte with a typically 3-byte or 4-byte UTF-8 sequence – which is unusual for byte encodings (they often seek to reduce the length of an input sequence) – there are some nice use cases.
We’re going to implement this in Factor and then discuss some variants.
First, we define a 256 item sequence of emojis, one for each byte:
CONSTANT: base256>emoji "๐๐ชโ๐ฐ๐๐๐๐๐๐๐๐๐๐๐๐๐โ๐ป๐ฅ\
๐พ๐ฟ๐โค๐๐คฃ๐๐๐๐ญ๐๐๐
๐๐๐ฅ๐ฅฐ๐๐๐๐ข๐ค๐๐๐ช๐โบ๐๐ค๐๐๐๐\
๐น๐คฆ๐๐โโจ๐คท๐ฑ๐๐ธ๐๐๐๐๐๐๐๐๐คฉ๐๐๐ค๐๐ฏ๐๐๐ถ๐๐คญโฃ๐๐\
๐๐ช๐๐ฅ๐๐๐ฉ๐ก๐คช๐๐ฅณ๐ฅ๐คค๐๐๐ณโ๐๐๐ด๐๐ฌ๐๐๐ท๐ป๐โญโ
๐ฅบ๐๐\
๐ค๐ฆโ๐ฃ๐๐โน๐๐๐ โ๐๐บ๐๐ป๐๐๐๐๐น๐ฃ๐ซ๐๐๐ต๐ค๐๐ด๐ค๐ผ๐ซโฝ๐ค\
โ๐๐คซ๐๐ฎ๐๐ป๐๐ถ๐๐ฒ๐ฟ๐งก๐โก๐๐โโ๐๐ฐ๐คจ๐ถ๐ค๐ถ๐ฐ๐๐ข๐ค๐๐จ๐จ\
๐คฌโ๐๐บ๐ค๐๐๐ฑ๐๐ถ๐ฅดโถโกโ๐๐ธโฌ๐จ๐๐ฆ๐ท๐บโ ๐
๐๐ต๐๐คฒ๐ค ๐คง๐๐ต๐
๐ง\
๐พ๐๐๐ค๐๐คฏ๐ทโ๐ง๐ฏ๐๐๐ค๐๐โ๐ด๐ฃ๐ธ๐๐๐ฅ๐คข๐
๐ก๐ฉ๐๐ธ๐ป๐ค๐คฎ๐ผ๐ฅต\
๐ฉ๐๐๐ผ๐๐ฃ๐ฅ"
And then we compute the reverse mapping:
CONSTANT: emoji>base256 $[ base256>emoji H{ } zip-index-as ]
With those two data blocks, we can define words to convert into and out of base256emoji:
: >base256emoji ( bytes -- str )
[ base256>emoji nth ] "" map-as ;
: base256emoji> ( str -- bytes )
[ emoji>base256 at ] B{ } map-as ;
You can try it out:
IN: scratchpad "Hello, Factor!" >base256emoji .
"๐โ๐๐๐๐ช๐
๐๐คค๐๐๐๐ฅบ๐"
IN: scratchpad "๐โ๐๐๐๐ช๐
๐๐คค๐๐๐๐ฅบ๐" base256emoji> "" like .
"Hello, Factor!"
The most interesting use case for base256emoji seems to be the ERC-7673: Distinguishable base256emoji Addresses proposal for Ethereum. This proposal seeks to “address spoofing attacks [that] have mislead tens of thousands of ether, and countless other tokens” by using visual emoji-based strings – a similar justification to the Drunken Bishop algorithm.
We can try base256emoji out with checksums to yield a similar benefit, displaying 16 emojis instead of a 32-byte hex-string:
IN: scratchpad "resource:license.txt" md5 checksum-bytes
bytes>hex-string .
"ebb5ab617e3a88ed43f7d247c6466d95"
IN: scratchpad "resource:license.txt" md5 checksum-bytes
>base256emoji .
"๐๐จ๐คจ๐คค๐ โจ๐น๐ฅ๐๐ผ๐ค ๐คฉโฌ๐๐ท๐ค"
Given a desire to use visually dissimilar emojis in identities, it would be useful to think about the chosen emoji set and how that might be interpreted in a visual context. Some criticism of this particular group of emojis, which are sometimes magnified by smaller font sizes, might focus on the visual similarity of:
It might be desirable to choose emojis not based on community membership or common usage, but on their most dissimilar visual identity. to make it even harder for scammers to deliberately pick lookalike emojis and rely on small text sizes, platform font differences, or user inattention.
A couple of other emoji sets can be found, for example @Equim-chan/base256 (which uses one of my favorite emojis: ๐พ), npm/base-emoji, or even the KittenMoji crate.
Fun!
This is available on my GitHub.
ICANN recently posted an update on Launching RDAP; Sunsetting WHOIS which got some discussion on Hacker News. Andy Newton, one of the creators of RDAP, has published A Guide to the Registration Data Access Protocol (RDAP), which is a pretty useful resource for understanding how it works. More information comes from the main RDAP website which describes it as:
The Registration Data Access Protocol (RDAP) is the successor to WHOIS. Like WHOIS, RDAP provides access to information about Internet resources (domain names, autonomous system numbers, and IP addresses). Unlike WHOIS, RDAP provides:
- A machine-readable representation of registration data;
- Differentiated access;
- Structured request and response semantics;
- Internationalisation;
- Extensibility.
The WHOIS protocol that it replaces is super simple, being described by RFC 3912 in a few paragraphs. And, in fact, you can test it out on the command-line of most computers:
$ echo -e "factorcode.org\r\n" | nc -i 1 whois.cloudflare.com 43
Domain Name: FACTORCODE.ORG
Registry Domain ID: c49c93dee3304f39b081383262d320c6-LROR
Registrar WHOIS Server: whois.cloudflare.com
Registrar URL: https://www.cloudflare.com
Updated Date: 2025-01-15T22:46:54Z
Creation Date: 2005-12-01T04:54:37Z
Registrar Registration Expiration Date: 2025-12-01T04:54:37Z
Registrar: Cloudflare, Inc.
Registrar IANA ID: 1910
Domain Status: clienttransferprohibited https://icann.org/epp#clienttransferprohibited
...
The RDAP protocol must be equally simple, right? Well, not so fast. Instead of a few paragraphs, and simple queries over sockets, you get many RFCs describing it:
These, along with things like the RDAP Extension registry, and the protocol reliance on HTTP/HTTPS, JSON, and JSONPath considerably increase the complexity of RDAP implementations.
Below we are going to start an implementation using Factor.
The first concept we have to implement is RDAP Bootstrapping which uses 5 IANA files to redirect searches to the correct upstream RDAP servers.
Type | Link |
---|---|
Forward DNS | https://data.iana.org/rdap/dns.json |
IPv4 Addresses | https://data.iana.org/rdap/ipv4.json |
IPv6 Addresses | https://data.iana.org/rdap/ipv6.json |
Autonomous System Numbers | https://data.iana.org/rdap/asn.json |
Object Tags | https://data.iana.org/rdap/object-tags.json |
We can abstract these by making a word to convert a type to a URL:
: bootstrap-url ( type -- url )
"https://data.iana.org/rdap/" ".json" surround ;
We don’t want to retrieve these files all the time, so let’s cache them for 30 days:
INITIALIZED-SYMBOL: bootstrap-cache [ 30 days ]
: bootstrap-get ( type -- data )
bootstrap-url cache-directory bootstrap-cache get
download-outdated-into path>json ;
And provide a way to force delete the cached bootstrap files:
CONSTANT: bootstrap-files { "asn" "dns" "ipv4" "ipv6" "object-tags" }
: reset-bootstrap ( -- )
[ bootstrap-files [ ".json" append ?delete-file ] each ] with-cache-directory ;
Each bootstrap file is described in RFC
9224,
but basically we want to extract and manipulate the "services"
block,
modifying the keys of the
assoc
for convenient searching:
: parse-services ( data quot: ( key -- key' ) -- services )
[ "services" of ] dip '[ [ _ map ] dip ] assoc-map ; inline
: search-services ( services quot: ( key -- ? ) -- urls )
'[ drop _ any? ] assoc-find drop nip ; inline
We can then provide bootstrap data structures that are used for searching. For example, we find the longest subdomain that has an entry in the dns bootstrap list to handle both SLD and TLD:
: dns-bootstrap ( -- services )
"dns" bootstrap-get "services" of ;
: split-domain ( domain -- domains )
"." split dup length <iota> [ tail "." join ] with map ;
: domain-endpoints ( domain -- urls )
split-domain dns-bootstrap [ swap member? ] with search-services ;
You can see that different domain names are directed to different RDAP endpoints:
IN: scratchpad "factorcode.org" domain-endpoints .
{ "https://rdap.publicinterestregistry.org/rdap/" }
IN: scratchpad "google.com" domain-endpoints .
{ "https://rdap.verisign.com/com/v1/" }
Or, find the correct endpoint for a given IPV4 address from the ipv4 bootstrap list:
: ipv4-bootstrap ( -- services )
"ipv4" bootstrap-get [ >ipv4-network ] parse-services ;
: ipv4-endpoints ( ipv4 -- urls )
ipv4-aton ipv4-bootstrap [ ipv4-contains? ] with search-services ;
The RDAP data is typically available from HTTP or HTTPS web servers, as JSON
files, but it uses a custom mime-type application/rdap+json
. We can
write a simple word to make the request and convert the response:
: accept-rdap ( request -- request )
"application/rdap+json" "Accept" set-header ;
: rdap-get ( url -- response rdap )
<get-request> accept-rdap http-request
dup string? [ utf8 decode ] unless json> ;
And now we can build a word to lookup a domain:
: lookup-domain ( domain -- results )
[ domain-endpoints random ]
[ "domain/%s" sprintf derive-url rdap-get nip ] bi ;
Or to lookup an IPV4 address:
: lookup-ipv4 ( ipv4 -- results )
[ ipv4-endpoints random ]
[ "ip/%s" sprintf derive-url rdap-get nip ] bi ;
And we can try it out, getting an extensive response:
IN: scratchpad "factorcode.org" lookup-domain
--- Data stack:
LH{ { "rdapConformance" ~array~ } { "notices" ~array~ } { ...
It would be nice to print the output in a more human-readable format. For now, we will just print these as a nested tree of keys and values:
GENERIC: print-rdap-nested ( padding key value -- )
M: linked-assoc print-rdap-nested
[ over write write ":" print " " append ] dip
[ swapd print-rdap-nested ] with assoc-each ;
M: array print-rdap-nested
[ print-rdap-nested ] 2with each ;
M: object print-rdap-nested
present [ 2drop ] [ [ ": " [ write ] tri@ ] dip print ] if-empty ;
: print-rdap ( results -- )
[ "" -rot print-rdap-nested ] assoc-each ;
This could probably be improved a fair bit – for example, the keys could be made more readable, and it doesn’t handle vCard entries very well.
You can try this out, by lookup up a domain name:
IN: scratchpad "factorcode.org" lookup-domain print-rdap
rdapConformance: rdap_level_0
rdapConformance: icann_rdap_response_profile_0
rdapConformance: icann_rdap_technical_implementation_guide_0
ldhName: factorcode.org
unicodeName: factorcode.org
nameservers:
ldhName: carl.ns.cloudflare.com
unicodeName: carl.ns.cloudflare.com
objectClassName: nameserver
handle: c34bedeccd8e4514b917e9e82a052077-LROR
status: associated
nameservers:
ldhName: kay.ns.cloudflare.com
unicodeName: kay.ns.cloudflare.com
objectClassName: nameserver
handle: 7fc12bf413944de088f27f837349a8da-LROR
status: associated
...
Or, by lookup up an IP address:
IN: scratchpad "1.1.1.1" lookup-ipv4 print-rdap
rdapConformance: history_version_0
rdapConformance: nro_rdap_profile_0
rdapConformance: cidr0
rdapConformance: rdap_level_0
events:
eventAction: registration
eventDate: 2011-08-10T23:12:35Z
events:
eventAction: last changed
eventDate: 2023-04-26T22:57:58Z
name: APNIC-LABS
status: active
type: ASSIGNED PORTABLE
endAddress: 1.1.1.255
ipVersion: v4
startAddress: 1.1.1.0
objectClassName: ip network
handle: 1.1.1.0 - 1.1.1.255
...
Pretty cool!
This was recently committed to the development version of Factor.
A few days ago, Ryan Petermen wrote a tweet about switching to Python from C++ in a solution for the Two Sum problem from LeetCode:
Given an array of integers
nums
and an integertarget
, return indices of the two numbers such that they add up totarget
.You may assume that each input would have exactly one solution, and you may not use the same element twice.
You can return the answer in any order.
Spoilers below!
They offered this iterative C++ solution:
std::vector<int> twoSum(std::vector<int>& nums, int target) {
std::unordered_map<int, int> hashMap;
std::vector<int> result;
for (int i = 0; i < nums.size(); ++i) {
int complement = target - nums[i];
if (hashMap.find(complement) != hashMap.end()) {
result.push_back(hashMap[complement]);
result.push_back(i);
return result;
}
hashMap[nums[i]] = i;
}
return result;
}
Followed by this improved Python solution:
def two_sum(nums, target):
hashmap = {}
for i, num in enumerate(nums):
complement = target - num
if complement in hashmap:
return [hashmap[complement], i]
hashmap[num] = i
Of course, I was curious what a Factor solution would look like.
We can start by making a mostly direct translation using local variables:
:: two-sum ( nums target -- i j )
H{ } clone :> hashmap
nums <enumerated> [
first2 :> ( i num )
target num - :> complement
complement hashmap at [
i num hashmap set-at f
] unless*
] map-find ?first ;
And a few test cases to show that it works:
{ 0 1 } [ { 2 7 11 15 } 9 two-sum ] unit-test
{ 1 2 } [ { 3 2 4 } 6 two-sum ] unit-test
{ 0 1 } [ { 3 3 } 6 two-sum ] unit-test
Sometimes a higher-level word exists that can simplify logic, for example map-find-index:
:: two-sum ( nums target -- i j )
H{ } clone :> hashmap
nums [| num i |
target num - :> complement
complement hashmap at [
i num hashmap set-at f
] unless*
] map-find-index drop ;
In the spirit of concatenative
thinking, we
could reduce the amount of named variables we have slightly by not naming the
complement
internal variable:
:: two-sum ( nums target -- i j )
H{ } clone :> hashmap
nums [| num i |
target num - hashmap at [
i num hashmap set-at f
] unless*
] map-find-index drop ;
And that works, but perhaps we could try some alternative syntax features like fried quotations:
: two-sum ( nums target -- i j )
H{ } clone dup '[
swap _ over - _ at [ 2nip ] [ _ set-at f ] if*
] map-find-index drop ;
Is that better? Perhaps.
How else might you solve this problem?
Can you do it in one-line?
I had a lot of fun writing code to compute derangements recently. I thought I was done with that topic until I bumped into a question on StackOverflow asking how to generate a random derangement of a list. Being nerd sniped is a real thing, and so I started looking at solutions.
There’s a paper called “An analysis of a simple algorithm for random derangements” that has an, ahem, simple algorithm. The basic idea is to generate a random permutation of indices, breaking early if the random permutation is obviously not a derangement.
One way to take a random permutation would be to use our permutations virtual sequence:
IN: scratchpad "ABCDEF" <permutations> random .
"FCEBDA" ! is a derangement
IN: scratchpad "ABCDEF" <permutations> random .
"DFBCEA" ! is NOT a derangement
And so you could loop until a derangement of indices is found:
: random-derangement-indices ( n -- seq )
f swap <iota> <permutations>
'[ drop _ random dup derangement? not ] loop ;
But, since only 36% or so of permutations are derangements, perhaps it would be faster and better to implement the algorithm from that paper – making our own random permutation of indices and breaking early if obviously not a derangement:
:: random-derangement-indices ( n -- indices )
n <iota> >array :> seq
f [
dup :> v
n 1 (a..b] [| j |
j 1 + random :> p
p v nth j = [ t ] [ j p v exchange f ] if
] any? v first zero? or
] [ drop seq clone ] do while ;
We can use that to build a random-derangement
word:
: random-derangement ( seq -- seq' )
[ length random-derangement-indices ] [ nths ] bi ;
And then, for example, get a random derangement of the alphabet – of which there are one hundred and forty-eight septillion derangements, give or take – in under a millisecond:
IN: scratchpad "ABCDEFGHIJKLMNOPQRSTUVWXYZ" random-derangement .
"CZFABMSUXRQDEHGYJLTPVOIKWN"
We could check to make sure that we generate all derangments with equal possibility using a simple test case:
IN: scratchpad 1,000,000 [
"ABCD" random-derangement
] replicate histogram sort-keys .
{
{ "BADC" 111639 }
{ "BCDA" 110734 }
{ "BDAC" 110682 }
{ "CADB" 111123 }
{ "CDAB" 111447 }
{ "CDBA" 111147 }
{ "DABC" 111215 }
{ "DCAB" 111114 }
{ "DCBA" 110899 }
}
Looks good to me!
Derangements, also sometimes known as deranged permutations, are described as:
In combinatorial mathematics, a derangement is a permutation of the elements of a set in which no element appears in its original position. In other words, a derangement is a permutation that has no fixed points.
There is a fun online derangements generator tool that you can use to play with computing the derangements of a sequence as well as calculating the number of derangements for a given sequence size.
As an example, we can use the math.combinatorics
vocabulary,
to generate all the permutations of the sequence { 0 1 2 }
:
IN: scratchpad { 0 1 2 } all-permutations .
{
{ 0 1 2 }
{ 0 2 1 }
{ 1 0 2 }
{ 1 2 0 }
{ 2 0 1 }
{ 2 1 0 }
}
Since a derangement is a permutation that requires each element to be in a different slot, we could write a word to check the permuted indices to see if that is true:
: derangement? ( indices -- ? )
dup length <iota> [ = ] 2any? not ;
These would be the two derangements of the indices { 0 1 2 }
:
IN: scratchpad { 0 1 2 } all-permutations [ derangement? ] filter .
{
{ 1 2 0 }
{ 2 0 1 }
}
The number of derangements is the subfactorial of the length of the sequence:
: subfactorial ( n -- ? )
[ 1 ] [ factorial 1 + e /i ] if-zero ;
We can build a <derangement-iota>
that is a sequence as long as that
number:
: <derangement-iota> ( seq -- <iota> )
length subfactorial <iota> ; inline
And we can build a next-derangement
word that calculates the next
permutation that is a
derangement:
: next-derangement ( seq -- seq )
[ dup derangement? ] [ next-permutation ] do until ;
We can then build upon some of the code for iterating
permutations,
designing an internal derangements-quot
word that is similar in form to the
existing permutations-quot
word:
: derangements-quot ( seq quot -- seq quot' )
[ [ <derangement-iota> ] [ length <iota> >array ] [ ] tri ] dip
'[ drop _ next-derangement _ nths-unsafe @ ] ; inline
And then use it to build a series of words that can provide iteration across derangements:
: each-derangement ( ... seq quot: ( ... elt -- ... ) -- ... )
derangements-quot each ; inline
: map-derangements ( ... seq quot: ( ... elt -- ... newelt ) -- ... newseq )
derangements-quot map ; inline
: filter-derangements ( ... seq quot: ( ... elt -- ... ? ) -- ... newseq )
selector [ each-derangement ] dip ; inline
: all-derangements ( seq -- seq' )
[ ] map-derangements ;
: all-derangements? ( ... seq quot: ( ... elt -- ... ? ) -- ... ? )
derangements-quot all? ; inline
: find-derangement ( ... seq quot: ( ... elt -- ... ? ) -- ... elt/f )
'[ _ keep and ] derangements-quot map-find drop ; inline
: reduce-derangements ( ... seq identity quot: ( ... prev elt -- ... next ) -- ... result )
swapd each-derangement ; inline
And, now we can use this to find the nine derangements for "ABCD"
:
IN: scratchpad "ABCD" all-derangements .
{
"BADC"
"BCDA"
"BDAC"
"CADB"
"CDAB"
"CDBA"
"DABC"
"DCAB"
"DCBA"
}
This is available on my GitHub.
Years ago, I remember reading a blog post called “Why is Groovy is big?” by Slava Pestov, the original author of Factor. In it, he talks about lines of code and sets the stage for how concatenative thinking can lead to properties like conciseness and readability, ending with this:
I tend to think the majority of code people write is overly complicated, full of redundancy, and designed for such flexibility that in practice is not needed at all. I hope one day this trend reverses.
I’ve been thinking a lot recently about 20 years of Factor and I thought it would be fun to write a Zen of Factor. Perhaps inspired by the Zen of Programming book written in 1987 by Geoffrey James, there are a couple of examples I wanted to point to first.
The Python programming language was one of the first languages that I am aware of to get a Zen, contributed at least as far back as February 2002 in a kind of easter egg fashion encrypted by ROT13:
$ python -c "import this"
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Quite a bit more recently, the Zig programming language
introduced a Zen through a series of commits starting in August
2017.
You can see the current version by running zig zen
:
$ zig zen
* Communicate intent precisely.
* Edge cases matter.
* Favor reading code over writing code.
* Only one obvious way to do things.
* Runtime crashes are better than bugs.
* Compile errors are better than runtime crashes.
* Incremental improvements.
* Avoid local maximums.
* Reduce the amount one must remember.
* Focus on code rather than style.
* Resource allocation may fail; resource deallocation must succeed.
* Memory is a resource.
* Together we serve the users.
In that spirit, I thought it would be fun to put together some ideas for what a Zen of Factor might look like, incorporating aspects of the language that I enjoy, some thematic elements that might make you more successful learning and developing in it, as well as pointing out the strong community that we have and hope to grow.
Here is the current draft:
The Zen of Factor
The REPL is your playground.
Working code beats perfect theory.
Words are better than paragraphs.
Stack effects tell the story.
Any syntax you need, you can create.
Simple primitives yield powerful combinations.
First make it work, then make it beautiful.
Make it beautiful, then make it fast.
Quick hacks grow into robust solutions.
When in doubt, factor it out.
Every word should do one thing well.
Let the stack guide your logic.
Write less, compose more.
If it works, ship it.
If it doesn't work, fix it.
If you don't like it, change it.
Today's beginner is tomorrow's core developer.
Questions encouraged, PRs welcome.
I really appreciate hearing about programs and problems that developers work on, getting feedback that allows us to iteratively improve, and knowing that every commit leads us towards a better future.
Thank you!
Factor has a watching words feature that allows you to see the inputs and outputs of a word when it is called. I have wanted a way to define a watched code block that we could use to see the stack before and after some inner part of a word.
It turns out that it’s pretty simple to make this syntax using our existing watch implementation from the tools.annotations vocabulary:
DEFER: WATCH>
SYNTAX: <WATCH
\ WATCH> parse-until >quotation dup (watch) append! ;
Using this, we can now watch the inner part of a word, for example this word that increments the input by 1:
: foo ( x -- y ) 1 <WATCH + WATCH> ;
And see it used:
IN: scratchpad 10 foo
--- Entering [ + ]
x 10
x 1
--- Leaving [ + ]
x 11
--- Data stack:
11
One thing that I’d like to improve on this someday, is making it so this watch syntax has a prettyprint implementation that allows it to be rendered as it is typed.
I have added this to the latest developer version if you’d like to update and give it a try!
Factor contains a REPL – called the Listener – available on the command-line and graphically as part of the UI developer tools. For many users, this is their main interface to programming in the Factor programming language.
We sometimes get requests to better support styling the user interface. This has led to improvements such as support for light and dark themes, adjustable font sizes, and other customizations. There have been a few existing ways to style the UI listener including support for keyboard commands to increase or decrease font sizes. But, until recently this only affected new output or new Listener sessions.
Today, I improved this to make adjusting the font size much more dynamic, using
traditional keyboard shortcuts of Ctrl
– or Cmd
on
macOS – combined with +
or -
:
Give it a try, and please let us know other ways we can make improvements!
planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp.