[ planet-factor ]

John Benediktsson: Human Sorting Improved

Factor has had human-friendly sorting since December 2007. It is unclear if it is related, but Ned Batchelder wrote about “human sorting” around the same time with links to a couple of other blog posts discussing similar topics, so perhaps it was in the zeitgeist of the time.

In any event, Ned recently wrote about “human sorting improved” which deals with the topic of how to sort two strings that are human-equivalent but are different and should probably have an ordering. This was the result of fixing a problem that actually happened in the coverage.py project.

For example, comparing "x1y" and "x001y" using the original algorithm would consider these to be equal given the same human keys: { "x" 1 "y" }.

You can see this in Factor 0.100:

IN: scratchpad USE: sorting.human

IN: scratchpad "x1y" "x001y" human<=> .
+eq+

Ned suggests that instead – if two strings are equal using the human sorting method – they should be compared for lexicographic ordering as a tie-breaker.

I have made that change in the latest development version of Factor so that the behavior is changed:

IN: scratchpad "x1y" "x001y" human<=> .
+gt+

Sun, 30 Mar 2025 15:00:00

John Benediktsson: Sports Betting

I have lately been curious about the Zig Programming Language and how it might be useful to Factor. In the process of doing some research, I bumped into a blog post about Converting decimal odds to probabilities with Zig, which shares some Zig source code for calculating sports betting odds:

As sports betting expands across the world, more and more players don’t understand tha math behind it. So, the purpose of this article it’s to clarify how to convert from decimal odds to the probability behind them. The calculations are implemented in Zig, a system programming language perfect to create CLI tools because it’s fast and simple.

It is particularly interesting given the growth and popularity of both sports betting on platforms like DraftKings, but also prediction markets like Polymarket and Manifold. You can read about some background information in Sports Betting Odds: How They Work and How To Read Them.

Note: I am not encouraging gambling, and in addition you should do your own research on the various platforms as there can be controversy about potential manipulation and transparency of the markets.

I don’t know much about (and do not do any) sports betting, but I thought it would be fun to learn about and then to translate some of these concepts to Factor.

Decimal

The “decimal odds” are popular in Europe and are the total returns including each original $1 bet if you win. These “decimal odds” effectively represent inverse probabilities. That is, if the odds are 4.5, then the probability is about 22% (which is 1/4.5).

Note: a variant of this is “fractional odds” popular in Britain which might instead quote those odds as 7/2 or 7-2 and describe the amount won ($7) in addition to the original bet ($2).

We can convert decimal odds to probabilities:

: odds>probs ( m -- n ) recip 100 * ;

: probs>odds ( n -- m ) 100 / recip ;

Sometimes the odds are quoted based on different outcomes – for example, in many team games there are three outcomes: win, lose, or tie. These probabilities should sum to 1, and seem to be often quoted with the payout included (typically less than 100%). The example given in the original blogpost is this:

We can calculate the payout for a group of odds:

: compute-payout ( odds -- k )
    [ recip ] map-sum recip ;

And given the example from the blogpost, the payout is about 95%:

IN: scratchpad { 1.27 6.00 10.25 } compute-payout .
0.9509054938365546

Which means, you could convert odds to probabilities using this payout number:

: compute-probs ( odds -- probs )
    dup compute-payout '[ odds>probs _ * ] map ;

This shows the different probabilities of the outcomes, using printf to format the sequence:

IN: scratchpad { 1.27 6.0 10.25 } compute-probs "%[%.2f, %]" printf
{ 74.87, 15.85, 9.28 }

Moneyline

The “moneyline odds” are more popular in the United States. The original blogpost that I mention above did not go into details, but I was curious, so we will explore how moneyline payouts work. These are quoted as positive (underdogs) or negative (favorites) values, and are the amount you would need to bet to receive $100 of winnings.

Using the NFL example from the Investopedia article.

Let’s say a betting website (also known as an online sportsbook) priced an NFL game between the Pittsburgh Steelers and the Kansas City Chiefs with the following moneyline odds.

Steelers: +585

Chiefs: -760

We can convert moneyline odds to decimal odds:

: moneyline>odds ( moneyline -- odds )
    100 / dup 1 < [ recip neg ] when 1 + ;

: odds>moneyline ( odds -- moneyline )
    1 - dup 1 < [ recip neg ] when 100 * ;

And use that to calculate the implied relative probabilities of that game’s winner:

IN: scratchpad { 585 -760 } [ moneyline>odds ] map
               compute-probs "%[%.2f, %]" printf
{ 14.18, 85.82 }

Parlay

There are also “parlay odds” which are multiple bets linked together as one.

Letโ€™s say you want to bet three heavy favorites on the moneyline because youโ€™re confident each team will win, but not sure if theyโ€™ll cover the spread.

So you bet Packers -300 against the Lions, Patriots -200 vs. the Jets, and Eagles -150 at the Washington Commanders.

We can convert these to odds:

IN: scratchpad { -300 -200 -150 } [ moneyline>odds ] map .
{ 1+1/3 1+1/2 1+2/3 }

And write a word to convert it to a parlay outcome:

: compute-parlay ( moneyline -- parlay )
    [ moneyline>odds ] map product odds>moneyline ;

So, this example would have a parlay payout of 233:

IN: scratchpad { -300 -200 -150 } compute-parlay .
233+1/3

And a probability of 30%:

IN: scratchpad 233+1/3 moneyline>odds odds>probs .
30

There are probably aspects of this that I did not cover above, but it’s kinda fun to explore new topics that you don’t know much about and to learn!

Wed, 26 Mar 2025 15:00:00

John Benediktsson: Base256Emoji

While looking into the multibase group of self-identifying base encodings, I discovered Base256Emoji which is an encoding format described by an emoji to use for each byte of an input buffer. This spec is implemented, for example, in both Go and Rust

Despite replacing each byte with a typically 3-byte or 4-byte UTF-8 sequence – which is unusual for byte encodings (they often seek to reduce the length of an input sequence) – there are some nice use cases.

We’re going to implement this in Factor and then discuss some variants.

Implementation

First, we define a 256 item sequence of emojis, one for each byte:

CONSTANT: base256>emoji "๐Ÿš€๐Ÿชโ˜„๐Ÿ›ฐ๐ŸŒŒ๐ŸŒ‘๐ŸŒ’๐ŸŒ“๐ŸŒ”๐ŸŒ•๐ŸŒ–๐ŸŒ—๐ŸŒ˜๐ŸŒ๐ŸŒ๐ŸŒŽ๐Ÿ‰โ˜€๐Ÿ’ป๐Ÿ–ฅ\
๐Ÿ’พ๐Ÿ’ฟ๐Ÿ˜‚โค๐Ÿ˜๐Ÿคฃ๐Ÿ˜Š๐Ÿ™๐Ÿ’•๐Ÿ˜ญ๐Ÿ˜˜๐Ÿ‘๐Ÿ˜…๐Ÿ‘๐Ÿ˜๐Ÿ”ฅ๐Ÿฅฐ๐Ÿ’”๐Ÿ’–๐Ÿ’™๐Ÿ˜ข๐Ÿค”๐Ÿ˜†๐Ÿ™„๐Ÿ’ช๐Ÿ˜‰โ˜บ๐Ÿ‘Œ๐Ÿค—๐Ÿ’œ๐Ÿ˜”๐Ÿ˜Ž๐Ÿ˜‡\
๐ŸŒน๐Ÿคฆ๐ŸŽ‰๐Ÿ’žโœŒโœจ๐Ÿคท๐Ÿ˜ฑ๐Ÿ˜Œ๐ŸŒธ๐Ÿ™Œ๐Ÿ˜‹๐Ÿ’—๐Ÿ’š๐Ÿ˜๐Ÿ’›๐Ÿ™‚๐Ÿ’“๐Ÿคฉ๐Ÿ˜„๐Ÿ˜€๐Ÿ–ค๐Ÿ˜ƒ๐Ÿ’ฏ๐Ÿ™ˆ๐Ÿ‘‡๐ŸŽถ๐Ÿ˜’๐Ÿคญโฃ๐Ÿ˜œ๐Ÿ’‹\
๐Ÿ‘€๐Ÿ˜ช๐Ÿ˜‘๐Ÿ’ฅ๐Ÿ™‹๐Ÿ˜ž๐Ÿ˜ฉ๐Ÿ˜ก๐Ÿคช๐Ÿ‘Š๐Ÿฅณ๐Ÿ˜ฅ๐Ÿคค๐Ÿ‘‰๐Ÿ’ƒ๐Ÿ˜ณโœ‹๐Ÿ˜š๐Ÿ˜๐Ÿ˜ด๐ŸŒŸ๐Ÿ˜ฌ๐Ÿ™ƒ๐Ÿ€๐ŸŒท๐Ÿ˜ป๐Ÿ˜“โญโœ…๐Ÿฅบ๐ŸŒˆ๐Ÿ˜ˆ\
๐Ÿค˜๐Ÿ’ฆโœ”๐Ÿ˜ฃ๐Ÿƒ๐Ÿ’โ˜น๐ŸŽŠ๐Ÿ’˜๐Ÿ˜ โ˜๐Ÿ˜•๐ŸŒบ๐ŸŽ‚๐ŸŒป๐Ÿ˜๐Ÿ–•๐Ÿ’๐Ÿ™Š๐Ÿ˜น๐Ÿ—ฃ๐Ÿ’ซ๐Ÿ’€๐Ÿ‘‘๐ŸŽต๐Ÿคž๐Ÿ˜›๐Ÿ”ด๐Ÿ˜ค๐ŸŒผ๐Ÿ˜ซโšฝ๐Ÿค™\
โ˜•๐Ÿ†๐Ÿคซ๐Ÿ‘ˆ๐Ÿ˜ฎ๐Ÿ™†๐Ÿป๐Ÿƒ๐Ÿถ๐Ÿ’๐Ÿ˜ฒ๐ŸŒฟ๐Ÿงก๐ŸŽโšก๐ŸŒž๐ŸŽˆโŒโœŠ๐Ÿ‘‹๐Ÿ˜ฐ๐Ÿคจ๐Ÿ˜ถ๐Ÿค๐Ÿšถ๐Ÿ’ฐ๐Ÿ“๐Ÿ’ข๐ŸคŸ๐Ÿ™๐Ÿšจ๐Ÿ’จ\
๐Ÿคฌโœˆ๐ŸŽ€๐Ÿบ๐Ÿค“๐Ÿ˜™๐Ÿ’Ÿ๐ŸŒฑ๐Ÿ˜–๐Ÿ‘ถ๐Ÿฅดโ–ถโžกโ“๐Ÿ’Ž๐Ÿ’ธโฌ‡๐Ÿ˜จ๐ŸŒš๐Ÿฆ‹๐Ÿ˜ท๐Ÿ•บโš ๐Ÿ™…๐Ÿ˜Ÿ๐Ÿ˜ต๐Ÿ‘Ž๐Ÿคฒ๐Ÿค ๐Ÿคง๐Ÿ“Œ๐Ÿ”ต๐Ÿ’…๐Ÿง\
๐Ÿพ๐Ÿ’๐Ÿ˜—๐Ÿค‘๐ŸŒŠ๐Ÿคฏ๐Ÿทโ˜Ž๐Ÿ’ง๐Ÿ˜ฏ๐Ÿ’†๐Ÿ‘†๐ŸŽค๐Ÿ™‡๐Ÿ‘โ„๐ŸŒด๐Ÿ’ฃ๐Ÿธ๐Ÿ’Œ๐Ÿ“๐Ÿฅ€๐Ÿคข๐Ÿ‘…๐Ÿ’ก๐Ÿ’ฉ๐Ÿ‘๐Ÿ“ธ๐Ÿ‘ป๐Ÿค๐Ÿคฎ๐ŸŽผ๐Ÿฅต\
๐Ÿšฉ๐ŸŽ๐ŸŠ๐Ÿ‘ผ๐Ÿ’๐Ÿ“ฃ๐Ÿฅ‚"

And then we compute the reverse mapping:

CONSTANT: emoji>base256 $[ base256>emoji H{ } zip-index-as ]

With those two data blocks, we can define words to convert into and out of base256emoji:

: >base256emoji ( bytes -- str )
    [ base256>emoji nth ] "" map-as ;

: base256emoji> ( str -- bytes )
    [ emoji>base256 at ] B{ } map-as ;

You can try it out:

IN: scratchpad "Hello, Factor!" >base256emoji .
"๐Ÿ˜„โœ‹๐Ÿ€๐Ÿ€๐Ÿ˜“๐Ÿ’ช๐Ÿ˜…๐Ÿ’“๐Ÿคค๐Ÿ’ƒ๐Ÿ˜ˆ๐Ÿ˜“๐Ÿฅบ๐Ÿ‘"

IN: scratchpad "๐Ÿ˜„โœ‹๐Ÿ€๐Ÿ€๐Ÿ˜“๐Ÿ’ช๐Ÿ˜…๐Ÿ’“๐Ÿคค๐Ÿ’ƒ๐Ÿ˜ˆ๐Ÿ˜“๐Ÿฅบ๐Ÿ‘" base256emoji> "" like .
"Hello, Factor!"

Use Cases

The most interesting use case for base256emoji seems to be the ERC-7673: Distinguishable base256emoji Addresses proposal for Ethereum. This proposal seeks to “address spoofing attacks [that] have mislead tens of thousands of ether, and countless other tokens” by using visual emoji-based strings – a similar justification to the Drunken Bishop algorithm.

We can try base256emoji out with checksums to yield a similar benefit, displaying 16 emojis instead of a 32-byte hex-string:

IN: scratchpad "resource:license.txt" md5 checksum-bytes
               bytes>hex-string .
"ebb5ab617e3a88ed43f7d247c6466d95"

IN: scratchpad "resource:license.txt" md5 checksum-bytes
               >base256emoji .
"๐Ÿ’Œ๐Ÿ’จ๐Ÿคจ๐Ÿคค๐Ÿ˜ โœจ๐Ÿ˜น๐Ÿฅ€๐Ÿ˜๐ŸŽผ๐Ÿค ๐Ÿคฉโฌ‡๐Ÿ’“๐ŸŒท๐Ÿค™"

Visual Collisions

Given a desire to use visually dissimilar emojis in identities, it would be useful to think about the chosen emoji set and how that might be interpreted in a visual context. Some criticism of this particular group of emojis, which are sometimes magnified by smaller font sizes, might focus on the visual similarity of:

  1. Similar “globe” emojis (๐ŸŒ ๐ŸŒ ๐ŸŒŽ)
  2. Similar “face” emojis (๐Ÿ™‚ ๐Ÿ˜ ๐Ÿ˜‘ ๐Ÿ™)
  3. Similar “kiss” emojis (๐Ÿ˜™ ๐Ÿ˜š ๐Ÿ˜— ๐Ÿ˜˜)
  4. Similar “star” emojis (โญ ๐ŸŒŸ)
  5. Similar “grin” emojis (๐Ÿ˜€ ๐Ÿ˜ƒ ๐Ÿ˜„ ๐Ÿ˜ ๐Ÿ˜† ๐Ÿ˜…)
  6. Similar “heart” emojis (๐Ÿ’” ๐Ÿ’— ๐Ÿ’• ๐Ÿ’– ๐Ÿ’˜ ๐Ÿ’™ ๐Ÿ’œ ๐Ÿ’š ๐Ÿ’› ๐Ÿ–ค)
  7. Similar “hand” emojis (๐Ÿคž ๐Ÿ‘‹ โœ‹ ๐Ÿ‘Š โœŠ ๐Ÿค ๐Ÿคฒ)

It might be desirable to choose emojis not based on community membership or common usage, but on their most dissimilar visual identity. to make it even harder for scammers to deliberately pick lookalike emojis and rely on small text sizes, platform font differences, or user inattention.

A couple of other emoji sets can be found, for example @Equim-chan/base256 (which uses one of my favorite emojis: ๐Ÿ‘พ), npm/base-emoji, or even the KittenMoji crate.

Fun!

This is available on my GitHub.

Sat, 22 Mar 2025 15:00:00

John Benediktsson: RDAP

ICANN recently posted an update on Launching RDAP; Sunsetting WHOIS which got some discussion on Hacker News. Andy Newton, one of the creators of RDAP, has published A Guide to the Registration Data Access Protocol (RDAP), which is a pretty useful resource for understanding how it works. More information comes from the main RDAP website which describes it as:

The Registration Data Access Protocol (RDAP) is the successor to WHOIS. Like WHOIS, RDAP provides access to information about Internet resources (domain names, autonomous system numbers, and IP addresses). Unlike WHOIS, RDAP provides:

  • A machine-readable representation of registration data;
  • Differentiated access;
  • Structured request and response semantics;
  • Internationalisation;
  • Extensibility.

The WHOIS protocol that it replaces is super simple, being described by RFC 3912 in a few paragraphs. And, in fact, you can test it out on the command-line of most computers:

$ echo -e "factorcode.org\r\n" | nc -i 1 whois.cloudflare.com 43
Domain Name: FACTORCODE.ORG
Registry Domain ID: c49c93dee3304f39b081383262d320c6-LROR
Registrar WHOIS Server: whois.cloudflare.com
Registrar URL: https://www.cloudflare.com
Updated Date: 2025-01-15T22:46:54Z
Creation Date: 2005-12-01T04:54:37Z
Registrar Registration Expiration Date: 2025-12-01T04:54:37Z
Registrar: Cloudflare, Inc.
Registrar IANA ID: 1910
Domain Status: clienttransferprohibited https://icann.org/epp#clienttransferprohibited
...

The RDAP protocol must be equally simple, right? Well, not so fast. Instead of a few paragraphs, and simple queries over sockets, you get many RFCs describing it:

These, along with things like the RDAP Extension registry, and the protocol reliance on HTTP/HTTPS, JSON, and JSONPath considerably increase the complexity of RDAP implementations.

Below we are going to start an implementation using Factor.

Bootstrapping

The first concept we have to implement is RDAP Bootstrapping which uses 5 IANA files to redirect searches to the correct upstream RDAP servers.

Type Link
Forward DNS https://data.iana.org/rdap/dns.json
IPv4 Addresses https://data.iana.org/rdap/ipv4.json
IPv6 Addresses https://data.iana.org/rdap/ipv6.json
Autonomous System Numbers https://data.iana.org/rdap/asn.json
Object Tags https://data.iana.org/rdap/object-tags.json

We can abstract these by making a word to convert a type to a URL:

: bootstrap-url ( type -- url )
    "https://data.iana.org/rdap/" ".json" surround ;

We don’t want to retrieve these files all the time, so let’s cache them for 30 days:

INITIALIZED-SYMBOL: bootstrap-cache [ 30 days ]

: bootstrap-get ( type -- data )
    bootstrap-url cache-directory bootstrap-cache get
    download-outdated-into path>json ;

And provide a way to force delete the cached bootstrap files:

CONSTANT: bootstrap-files { "asn" "dns" "ipv4" "ipv6" "object-tags" }

: reset-bootstrap ( -- )
    [ bootstrap-files [ ".json" append ?delete-file ] each ] with-cache-directory ;

Each bootstrap file is described in RFC 9224, but basically we want to extract and manipulate the "services" block, modifying the keys of the assoc for convenient searching:

: parse-services ( data quot: ( key -- key' ) -- services )
    [ "services" of ] dip '[ [ _ map ] dip ] assoc-map ; inline

: search-services ( services quot: ( key -- ? ) -- urls )
    '[ drop _ any? ] assoc-find drop nip ; inline

We can then provide bootstrap data structures that are used for searching. For example, we find the longest subdomain that has an entry in the dns bootstrap list to handle both SLD and TLD:

: dns-bootstrap ( -- services )
    "dns" bootstrap-get "services" of ;

: split-domain ( domain -- domains )
    "." split dup length <iota> [ tail "." join ] with map ;

: domain-endpoints ( domain -- urls )
    split-domain dns-bootstrap [ swap member? ] with search-services ;

You can see that different domain names are directed to different RDAP endpoints:

IN: scratchpad "factorcode.org" domain-endpoints .
{ "https://rdap.publicinterestregistry.org/rdap/" }

IN: scratchpad "google.com" domain-endpoints .
{ "https://rdap.verisign.com/com/v1/" }

Or, find the correct endpoint for a given IPV4 address from the ipv4 bootstrap list:

: ipv4-bootstrap ( -- services )
    "ipv4" bootstrap-get [ >ipv4-network ] parse-services ;

: ipv4-endpoints ( ipv4 -- urls )
    ipv4-aton ipv4-bootstrap [ ipv4-contains? ] with search-services ;

Lookup

The RDAP data is typically available from HTTP or HTTPS web servers, as JSON files, but it uses a custom mime-type application/rdap+json. We can write a simple word to make the request and convert the response:

: accept-rdap ( request -- request )
    "application/rdap+json" "Accept" set-header ;

: rdap-get ( url -- response rdap )
    <get-request> accept-rdap http-request
    dup string? [ utf8 decode ] unless json> ;

And now we can build a word to lookup a domain:

: lookup-domain ( domain -- results )
    [ domain-endpoints random ]
    [ "domain/%s" sprintf derive-url rdap-get nip ] bi ;

Or to lookup an IPV4 address:

: lookup-ipv4 ( ipv4 -- results )
    [ ipv4-endpoints random ]
    [ "ip/%s" sprintf derive-url rdap-get nip ] bi ;

And we can try it out, getting an extensive response:

IN: scratchpad "factorcode.org" lookup-domain

--- Data stack:
LH{ { "rdapConformance" ~array~ } { "notices" ~array~ } { ...

Output

It would be nice to print the output in a more human-readable format. For now, we will just print these as a nested tree of keys and values:

GENERIC: print-rdap-nested ( padding key value -- )

M: linked-assoc print-rdap-nested
    [ over write write ":" print "  " append ] dip
    [ swapd print-rdap-nested ] with assoc-each ;

M: array print-rdap-nested
    [ print-rdap-nested ] 2with each ;

M: object print-rdap-nested
    present [ 2drop ] [ [ ": " [ write ] tri@ ] dip print ] if-empty ;

: print-rdap ( results -- )
    [ "" -rot print-rdap-nested ] assoc-each ;

This could probably be improved a fair bit – for example, the keys could be made more readable, and it doesn’t handle vCard entries very well.

Try it out!

You can try this out, by lookup up a domain name:

IN: scratchpad "factorcode.org" lookup-domain print-rdap
rdapConformance: rdap_level_0
rdapConformance: icann_rdap_response_profile_0
rdapConformance: icann_rdap_technical_implementation_guide_0
ldhName: factorcode.org
unicodeName: factorcode.org
nameservers:
  ldhName: carl.ns.cloudflare.com
  unicodeName: carl.ns.cloudflare.com
  objectClassName: nameserver
  handle: c34bedeccd8e4514b917e9e82a052077-LROR
  status: associated
nameservers:
  ldhName: kay.ns.cloudflare.com
  unicodeName: kay.ns.cloudflare.com
  objectClassName: nameserver
  handle: 7fc12bf413944de088f27f837349a8da-LROR
  status: associated
...

Or, by lookup up an IP address:

IN: scratchpad "1.1.1.1" lookup-ipv4 print-rdap
rdapConformance: history_version_0
rdapConformance: nro_rdap_profile_0
rdapConformance: cidr0
rdapConformance: rdap_level_0
events:
  eventAction: registration
  eventDate: 2011-08-10T23:12:35Z
events:
  eventAction: last changed
  eventDate: 2023-04-26T22:57:58Z
name: APNIC-LABS
status: active
type: ASSIGNED PORTABLE
endAddress: 1.1.1.255
ipVersion: v4
startAddress: 1.1.1.0
objectClassName: ip network
handle: 1.1.1.0 - 1.1.1.255
...

Pretty cool!

This was recently committed to the development version of Factor.

Tue, 18 Mar 2025 15:00:00

John Benediktsson: Two Sum

A few days ago, Ryan Petermen wrote a tweet about switching to Python from C++ in a solution for the Two Sum problem from LeetCode:

Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.

You may assume that each input would have exactly one solution, and you may not use the same element twice.

You can return the answer in any order.

Spoilers below!

They offered this iterative C++ solution:

std::vector<int> twoSum(std::vector<int>& nums, int target) {
    std::unordered_map<int, int> hashMap;
    std::vector<int> result;

    for (int i = 0; i < nums.size(); ++i) {
        int complement = target - nums[i];
        if (hashMap.find(complement) != hashMap.end()) {
            result.push_back(hashMap[complement]);
            result.push_back(i);
            return result;
        }
        hashMap[nums[i]] = i;
    }
    return result;
}

Followed by this improved Python solution:

def two_sum(nums, target):
    hashmap = {}

    for i, num in enumerate(nums):
        complement = target - num
        if complement in hashmap:
            return [hashmap[complement], i]
        hashmap[num] = i

Of course, I was curious what a Factor solution would look like.

Direct translation

We can start by making a mostly direct translation using local variables:

:: two-sum ( nums target -- i j )
    H{ } clone :> hashmap
    nums <enumerated> [
        first2 :> ( i num )
        target num - :> complement
        complement hashmap at [
            i num hashmap set-at f
        ] unless*
    ] map-find ?first ;

And a few test cases to show that it works:

{ 0 1 } [ { 2 7 11 15 } 9 two-sum ] unit-test
{ 1 2 } [ { 3 2 4 } 6 two-sum ] unit-test
{ 0 1 } [ { 3 3 } 6 two-sum ] unit-test

Using map-find-index

Sometimes a higher-level word exists that can simplify logic, for example map-find-index:

:: two-sum ( nums target -- i j )
    H{ } clone :> hashmap
    nums [| num i |
        target num - :> complement
        complement hashmap at [
            i num hashmap set-at f
        ] unless*
    ] map-find-index drop ;

Implicit arguments

In the spirit of concatenative thinking, we could reduce the amount of named variables we have slightly by not naming the complement internal variable:

:: two-sum ( nums target -- i j )
    H{ } clone :> hashmap
    nums [| num i |
        target num - hashmap at [
            i num hashmap set-at f
        ] unless*
    ] map-find-index drop ;

Fried quotations

And that works, but perhaps we could try some alternative syntax features like fried quotations:

: two-sum ( nums target -- i j )
    H{ } clone dup '[
        swap _ over - _ at [ 2nip ] [ _ set-at f ] if*
    ] map-find-index drop ;

Is that better? Perhaps.

How else might you solve this problem?

Can you do it in one-line?

Sat, 22 Feb 2025 15:00:00

John Benediktsson: Random Derangement

I had a lot of fun writing code to compute derangements recently. I thought I was done with that topic until I bumped into a question on StackOverflow asking how to generate a random derangement of a list. Being nerd sniped is a real thing, and so I started looking at solutions.

There’s a paper called “An analysis of a simple algorithm for random derangements” that has an, ahem, simple algorithm. The basic idea is to generate a random permutation of indices, breaking early if the random permutation is obviously not a derangement.

One way to take a random permutation would be to use our permutations virtual sequence:

IN: scratchpad "ABCDEF" <permutations> random .
"FCEBDA" ! is a derangement

IN: scratchpad "ABCDEF" <permutations> random .
"DFBCEA" ! is NOT a derangement

And so you could loop until a derangement of indices is found:

: random-derangement-indices ( n -- seq )
    f swap <iota> <permutations>
    '[ drop _ random dup derangement? not ] loop ;

But, since only 36% or so of permutations are derangements, perhaps it would be faster and better to implement the algorithm from that paper – making our own random permutation of indices and breaking early if obviously not a derangement:

:: random-derangement-indices ( n -- indices )
    n <iota> >array :> seq
    f [
        dup :> v
        n 1 (a..b] [| j |
            j 1 + random :> p
            p v nth j = [ t ] [ j p v exchange f ] if
        ] any? v first zero? or
    ] [ drop seq clone ] do while ;

We can use that to build a random-derangement word:

: random-derangement ( seq -- seq' )
    [ length random-derangement-indices ] [ nths ] bi ;

And then, for example, get a random derangement of the alphabet – of which there are one hundred and forty-eight septillion derangements, give or take – in under a millisecond:

IN: scratchpad "ABCDEFGHIJKLMNOPQRSTUVWXYZ" random-derangement .
"CZFABMSUXRQDEHGYJLTPVOIKWN"

We could check to make sure that we generate all derangments with equal possibility using a simple test case:

IN: scratchpad 1,000,000 [
                   "ABCD" random-derangement
               ] replicate histogram sort-keys .
{
    { "BADC" 111639 }
    { "BCDA" 110734 }
    { "BDAC" 110682 }
    { "CADB" 111123 }
    { "CDAB" 111447 }
    { "CDBA" 111147 }
    { "DABC" 111215 }
    { "DCAB" 111114 }
    { "DCBA" 110899 }
}

Looks good to me!

Mon, 16 Dec 2024 15:00:00

John Benediktsson: Derangements

Derangements, also sometimes known as deranged permutations, are described as:

In combinatorial mathematics, a derangement is a permutation of the elements of a set in which no element appears in its original position. In other words, a derangement is a permutation that has no fixed points.

There is a fun online derangements generator tool that you can use to play with computing the derangements of a sequence as well as calculating the number of derangements for a given sequence size.

As an example, we can use the math.combinatorics vocabulary, to generate all the permutations of the sequence { 0 1 2 }:

IN: scratchpad { 0 1 2 } all-permutations .
{
    { 0 1 2 }
    { 0 2 1 }
    { 1 0 2 }
    { 1 2 0 }
    { 2 0 1 }
    { 2 1 0 }
}

Since a derangement is a permutation that requires each element to be in a different slot, we could write a word to check the permuted indices to see if that is true:

: derangement? ( indices -- ? )
    dup length <iota> [ = ] 2any? not ;

These would be the two derangements of the indices { 0 1 2 }:

IN: scratchpad { 0 1 2 } all-permutations [ derangement? ] filter .
{
    { 1 2 0 }
    { 2 0 1 }
}

The number of derangements is the subfactorial of the length of the sequence:

: subfactorial ( n -- ? )
    [ 1 ] [ factorial 1 + e /i ] if-zero ;

We can build a <derangement-iota> that is a sequence as long as that number:

: <derangement-iota> ( seq -- <iota> )
    length subfactorial <iota> ; inline

And we can build a next-derangement word that calculates the next permutation that is a derangement:

: next-derangement ( seq -- seq )
    [ dup derangement? ] [ next-permutation ] do until ;

We can then build upon some of the code for iterating permutations, designing an internal derangements-quot word that is similar in form to the existing permutations-quot word:

: derangements-quot ( seq quot -- seq quot' )
    [ [ <derangement-iota> ] [ length <iota> >array ] [ ] tri ] dip
    '[ drop _ next-derangement _ nths-unsafe @ ] ; inline

And then use it to build a series of words that can provide iteration across derangements:

: each-derangement ( ... seq quot: ( ... elt -- ... ) -- ... )
    derangements-quot each ; inline

: map-derangements ( ... seq quot: ( ... elt -- ... newelt ) -- ... newseq )
    derangements-quot map ; inline

: filter-derangements ( ... seq quot: ( ... elt -- ... ? ) -- ... newseq )
    selector [ each-derangement ] dip ; inline

: all-derangements ( seq -- seq' )
    [ ] map-derangements ;

: all-derangements? ( ... seq quot: ( ... elt -- ... ? ) -- ... ? )
    derangements-quot all? ; inline

: find-derangement ( ... seq quot: ( ... elt -- ... ? ) -- ... elt/f )
    '[ _ keep and ] derangements-quot map-find drop ; inline

: reduce-derangements ( ... seq identity quot: ( ... prev elt -- ... next ) -- ... result )
    swapd each-derangement ; inline

And, now we can use this to find the nine derangements for "ABCD":

IN: scratchpad "ABCD" all-derangements .
{
    "BADC"
    "BCDA"
    "BDAC"
    "CADB"
    "CDAB"
    "CDBA"
    "DABC"
    "DCAB"
    "DCBA"
}

This is available on my GitHub.

Sun, 8 Dec 2024 15:00:00

John Benediktsson: Zen of Factor

Years ago, I remember reading a blog post called “Why is Groovy is big?” by Slava Pestov, the original author of Factor. In it, he talks about lines of code and sets the stage for how concatenative thinking can lead to properties like conciseness and readability, ending with this:

I tend to think the majority of code people write is overly complicated, full of redundancy, and designed for such flexibility that in practice is not needed at all. I hope one day this trend reverses.

I’ve been thinking a lot recently about 20 years of Factor and I thought it would be fun to write a Zen of Factor. Perhaps inspired by the Zen of Programming book written in 1987 by Geoffrey James, there are a couple of examples I wanted to point to first.

Python

The Python programming language was one of the first languages that I am aware of to get a Zen, contributed at least as far back as February 2002 in a kind of easter egg fashion encrypted by ROT13:

$ python -c "import this" 
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Zig

Quite a bit more recently, the Zig programming language introduced a Zen through a series of commits starting in August 2017. You can see the current version by running zig zen:

$ zig zen

 * Communicate intent precisely.
 * Edge cases matter.
 * Favor reading code over writing code.
 * Only one obvious way to do things.
 * Runtime crashes are better than bugs.
 * Compile errors are better than runtime crashes.
 * Incremental improvements.
 * Avoid local maximums.
 * Reduce the amount one must remember.
 * Focus on code rather than style.
 * Resource allocation may fail; resource deallocation must succeed.
 * Memory is a resource.
 * Together we serve the users.

Factor

In that spirit, I thought it would be fun to put together some ideas for what a Zen of Factor might look like, incorporating aspects of the language that I enjoy, some thematic elements that might make you more successful learning and developing in it, as well as pointing out the strong community that we have and hope to grow.

Here is the current draft:

The Zen of Factor

The REPL is your playground.
Working code beats perfect theory.
Words are better than paragraphs.
Stack effects tell the story.
Any syntax you need, you can create.
Simple primitives yield powerful combinations.
First make it work, then make it beautiful.
Make it beautiful, then make it fast.
Quick hacks grow into robust solutions.
When in doubt, factor it out.
Every word should do one thing well.
Let the stack guide your logic.
Write less, compose more.
If it works, ship it.
If it doesn't work, fix it.
If you don't like it, change it.
Today's beginner is tomorrow's core developer.
Questions encouraged, PRs welcome.

I really appreciate hearing about programs and problems that developers work on, getting feedback that allows us to iteratively improve, and knowing that every commit leads us towards a better future.

Thank you!

Thu, 5 Dec 2024 15:00:00

Blogroll


planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp.

Syndicate