In Python, the chemparse project is available as a “lightweight package for parsing chemical formula strings into python dictionaries” mapping chemical elements to numeric counts.
It supports parsing several variants of formula such as:
"H2O""C1.5O3""(CH3)2""((CH3)2)3""K4[Fe(SCN)6]"I thought it would fun to build a similar functionality using Factor.
We are going to be using the EBNF syntax support to more simply write a parsing expression grammar. As is often the most useful way to implement things, we break it down into steps. We can parse a symbol as one or two letters, a number as an integer or float, and then a pair which is a symbol with an optional number prefix and postfix.
EBNF: split-formula [=[
symbol = [A-Z] [a-z]? => [[ sift >string ]]
number = [0-9]+ { "." [0-9]+ }? { { "e" | "E" } { "+" | "-" }? [0-9]+ }?
=> [[ first3 [ concat ] bi@ "" 3append-as string>number ]]
pair = number? { symbol | "("~ pair+ ")"~ | "["~ pair+ "]"~ } number?
=> [[ first3 swapd [ 1 or ] bi@ * 2array ]]
pairs = pair+
]=]
We can test that this works:
IN: scratchpad "H2O" split-formula .
V{ { "H" 2 } { "O" 1 } }
IN: scratchpad "(CH3)2" split-formula .
V{ { V{ { "C" 1 } { "H" 3 } } 2 } }
But we need to recursively flatten these into an assoc, mapping element to count.
: flatten-formula ( elt n assoc -- )
[ [ first2 ] [ * ] bi* ] dip pick string?
[ swapd at+ ] [ '[ _ _ flatten-formula ] each ] if ;
And combine those two steps to parse a formula:
: parse-formula ( str -- seq )
split-formula H{ } clone [
'[ 1 _ flatten-formula ] each
] keep ;
We can now test that this works with a few unit tests that show each of the features we hoped to support:
{ H{ { "H" 2 } { "O" 1 } } } [ "H2O" parse-formula ] unit-test
{ H{ { "C" 1.5 } { "O" 3 } } } [ "C1.5O3" parse-formula ] unit-test
{ H{ { "C" 2 } { "H" 6 } } } [ "(CH3)2" parse-formula ] unit-test
{ H{ { "C" 6 } { "H" 18 } } } [ "((CH3)2)3" parse-formula ] unit-test
{ H{ { "K" 4 } { "Fe" 1 } { "S" 6 } { "C" 6 } { "N" 6 } } }
[ "K4[Fe(SCN)6]" parse-formula ] unit-test
This is available in my GitHub.
William Woodruff recently noticed that Python’s splitlines does a lot more than just newlines:
I always assumed that Python’s str.splitlines() split strings by “universal newlines”, i.e.,
\n,\r, and\r\n.But it turns out it does a lot more than that.
The recent Factor 0.100 release included a change to make the split-lines word split on unicode linebreaks which matches the Python behavior.
IN: scratchpad "line1\nline2\rline3\r\nline4\vline5\x1dhello"
split-lines .
{ "line1" "line2" "line3" "line4" "line5" "hello" }
These are considered line breaks:
| Character | Description |
|---|---|
\n |
Line Feed |
\r |
Carriage Return |
\r\n |
Carriage Return + Line Feed |
\v |
Line Tabulation |
\f |
Form Feed |
\x1c |
File Separator |
\x1d |
Group Separator |
\x1e |
Record Separator |
\x85 |
Next Line (C1 Control Code) |
\u002028 |
Line Separator |
\u002029 |
Paragraph Separator |
This might be surprising – or just what you needed!
I wrote about the Data Formats support that comes included in Factor. As I mentioned in that post, there are many more that we could implement. One of those is Extensible Data Notation – also known as EDN – and comes from the Clojure community.
We can see a nice example of the EDN format in Learn EDN in Y minutes:
; Comments start with a semicolon.
; Anything after the semicolon is ignored.
;;;;;;;;;;;;;;;;;;;
;;; Basic Types ;;;
;;;;;;;;;;;;;;;;;;;
nil ; also known in other languages as null
; Booleans
true
false
; Strings are enclosed in double quotes
"hungarian breakfast"
"farmer's cheesy omelette"
; Characters are preceded by backslashes
\g \r \a \c \e
; Keywords start with a colon. They behave like enums. Kind of
; like symbols in Ruby.
:eggs
:cheese
:olives
; Symbols are used to represent identifiers.
; You can namespace symbols by using /. Whatever precedes / is
; the namespace of the symbol.
spoon
kitchen/spoon ; not the same as spoon
kitchen/fork
github/fork ; you can't eat with this
; Integers and floats
42
3.14159
; Lists are sequences of values
(:bun :beef-patty 9 "yum!")
; Vectors allow random access
[:gelato 1 2 -2]
; Maps are associative data structures that associate the key with its value
{:eggs 2
:lemon-juice 3.5
:butter 1}
; You're not restricted to using keywords as keys
{[1 2 3 4] "tell the people what she wore",
[5 6 7 8] "the more you see the more you hate"}
; You may use commas for readability. They are treated as whitespace.
; Sets are collections that contain unique elements.
#{:a :b 88 "huat"}
;;;;;;;;;;;;;;;;;;;;;;;
;;; Tagged Elements ;;;
;;;;;;;;;;;;;;;;;;;;;;;
; EDN can be extended by tagging elements with # symbols.
#MyYelpClone/MenuItem {:name "eggs-benedict" :rating 10}
Recently, I implemented support for EDN, originally using Parsing Expression Grammar to do the parsing, and then adding support for encoding Factor objects into EDN, and then switching to a faster stream-based parsing approach.
This now allows us to parse that example above into:
{
null
t
f
"hungarian breakfast"
"farmer's cheesy omelette"
103
114
97
99
101
T{ keyword { name "eggs" } }
T{ keyword { name "cheese" } }
T{ keyword { name "olives" } }
T{ symbol { name "spoon" } }
T{ symbol { name "kitchen/spoon" } }
T{ symbol { name "kitchen/fork" } }
T{ symbol { name "github/fork" } }
42
3.14159
{
T{ keyword { name "bun" } }
T{ keyword { name "beef-patty" } }
9
"yum!"
}
V{
T{ keyword { name "gelato" } }
1
2
-2
}
LH{
{ T{ keyword { name "eggs" } } 2 }
{ T{ keyword { name "lemon-juice" } } 3.5 }
{ T{ keyword { name "butter" } } 1 }
}
LH{
{ V{ 1 2 3 4 } "tell the people what she wore" }
{ V{ 5 6 7 8 } "the more you see the more you hate" }
}
HS{
88
T{ keyword { name "a" } }
T{ keyword { name "b" } }
"huat"
}
T{ tagged
{ name "MyYelpClone/MenuItem" }
{ value
LH{
{ T{ keyword { name "name" } } "eggs-benedict" }
{ T{ keyword { name "rating" } } 10 }
}
}
}
}
The edn vocabulary is now included in the Factor standard library.
You can see some information about the various words currently available:
IN: scratchpad "edn" help
Extensible Data Notation (EDN)
The edn vocabulary supports reading and writing from the Extensible Data
Notation (EDN) format.
Reading from EDN:
read-edns ( -- objects )
read-edn ( -- object )
edn> ( string -- objects )
Writing into EDN:
write-edns ( objects -- )
write-edn ( object -- )
>edn ( object -- string )
Basic support is included for encoding Factor objects:
IN: scratchpad TUPLE: foo a b c ;
IN: scratchpad 1 2 3 foo boa write-edn
#scratchpad/foo {:a 1, :b 2, :c 3}
But we don’t automatically parse these tagged objects back into a Factor object at the moment.
Check it out!
Pseudo Encrypt is a function drawn from the PostgreSQL project.
pseudo_encrypt(int) can be used as a pseudo-random generator of unique values. It produces an integer output that is uniquely associated to its integer input (by a mathematical permutation), but looks random at the same time, with zero collision. This is useful to communicate numbers generated sequentially without revealing their ordinal position in the sequence (for ticket numbers, URLs shorteners, promo codes…)
It’s implementation is defined as:
CREATE OR REPLACE FUNCTION pseudo_encrypt(value int) returns int AS $$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
l1:= (value >> 16) & 65535;
r1:= value & 65535;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
return ((r1 << 16) + l1);
END;
$$ LANGUAGE plpgsql strict immutable;
Let’s implement this in Factor using some of the words from the math.bitwise vocabulary, working with the intermediate results as 32-bit signed integers:
: pseudo-encrypt ( x -- y )
[ -16 shift ] keep [ 16 bits ] bi@ 3 [
[
1366 * 150889 + 714025 rem 714025.0 / 32767 *
round >integer bitxor 32 >signed
] keep swap
] times 16 shift + 32 >signed ;
We can compare our results for [-10..10] which are helpfully provided on the
original linked page:
IN: scratchpad -10 ..= 10 [ dup pseudo-encrypt "%3d %12d\n" printf ] each
-10 -1270576520
-9 -236348969
-8 -1184061109
-7 -25446276
-6 -1507538963
-5 -518858927
-4 -1458116927
-3 -532482573
-2 -157973154
-1 -1105881908
0 1777613459
1 561465857
2 436885871
3 576481439
4 483424269
5 1905133426
6 971249312
7 1926833684
8 735327624
9 1731020007
10 792482838
Great – it matches!
Morwenn posted a blog about implementing a
std::flip
operation in C++:
This is basically walking up the tree from the child node as if it were a linked list. The reverse operation either implies walking through two children nodes, or simply flipping the order of parameters, which is where
std::flipintervenes:auto is_descendant_of = std::flip(is_ancestor_of); // This property should always hold assert(is_descendant_of(node1, node2) == is_ancestor_of(node2, node1));
Spoiler: the std::flip operator is not part of the C++ standard
library, although an implementation is
providing at the end of the blog post in around 90 lines of code.
Still, I thought it would be fun to implement in Factor.
As it turns out, we already have a
flip word that
modifies a sequence, essentially by returning the transpose of a matrix. One
could argue that
transpose
might be a better name for that operation. In any event, let’s focus on
implementing the std::flip operation.
How would we reverse the arguments to a word?
a b can become b a by calling swap.a b c can become c b a by calling swap rot.a b c d can become d c b a by calling swap rot roll.We can generalize this into a macro by repeatedly calling -nrot:
MACRO: nreverse ( n -- quot )
0 [a..b) [ '[ _ -nrot ] ] map [ ] concat-as ;
And then show that it works:
IN: scratchpad { } [ 0 nreverse ] with-datastack .
{ }
IN: scratchpad { "a" } [ 1 nreverse ] with-datastack .
{ "a" }
IN: scratchpad { "a" "b" } [ 2 nreverse ] with-datastack .
{ "b" "a" }
IN: scratchpad { "a" "b" "c" } [ 3 nreverse ] with-datastack .
{ "c" "b" "a" }
IN: scratchpad { "a" "b" "c" "d" } [ 4 nreverse ] with-datastack .
{ "d" "c" "b" "a" }
IN: scratchpad { "a" "b" "c" "d" "e" } [ 5 nreverse ] with-datastack .
{ "e" "d" "c" "b" "a" }
Note: this has been added to the shuffle vocabulary.
Using this, we can build some syntax that takes the next token and searches for a matching word with that name, and then calls it after reversing the inputs:
SYNTAX: flip:
scan-word [ stack-effect in>> length ] keep
'[ _ nreverse _ execute ] append! ;
As an example, we will use the 4array word that returns an array consisting of four arguments from the stack.
IN: scratchpad 10 20 30 40 4array .
{ 10 20 30 40 }
IN: scratchpad 10 20 30 40 flip: 4array .
{ 40 30 20 10 }
We could have different syntax for flipping arbitrary code – first parsing a quotation and then infer the stack-effect and then inline a reversed argument version.
SYNTAX: flip[
parse-quotation [ infer in>> length ] keep
'[ _ nreverse @ ] suffix! ;
We can try that out with a simple block of code:
IN: scratchpad 1 2 3 flip[ [ 10 * ] tri@ ] call 3array .
{ 30 20 10 }
And only a few lines of code in total.
Pretty cool!
Seth Larson wrote about a Scream Cipher:
You’ve probably heard of stream ciphers, but what about a scream cipher 😱? Today I learned there are more “Latin capital letter A” Unicode characters than there are letters in the English alphabet. You know what that means, it’s time to scream:
We can use bidirectional assocs to keep a single cipher data structure that efficiently maps into and out of the cipher:
CONSTANT: cipher $[
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"AÁĂẮẶẰẲẴǍÂẤẬẦẨẪÄǞȦǠẠȀÀẢȂĀĄ"
zip >biassoc
]
: >scream ( str -- SCREAM )
[ ch>upper cipher ?at drop ] map ;
: scream> ( SCREAM -- str )
[ cipher ?value-at drop ] map ;
And then give it a try!
IN: scratchpad "FACTOR!" >scream .
"ẰAĂẠẪȦ!"
IN: scratchpad "ẰAĂẠẪȦ!" scream> .
"FACTOR!"
Fun!
Factor has an environment vocabulary for working with process environment variables on all the platforms we currently support: macOS, Windows, and Linux.
Recently, I noticed that .NET 9 added support for empty environment variables. This was particulary relevant due to a test failure of the new Dotenv implementation on Windows. It turns out that we inherited the same issue that earlier .NET versions had, which is an inability to disambiguate an unset environment variable from one that was set to the empty string. This issue has now been fixed in the latest development version.
Before:
IN: scratchpad "FACTOR" os-env .
f
IN: scratchpad "" "FACTOR" set-os-env
IN: scratchpad "FACTOR" os-env .
f
After:
IN: scratchpad "FACTOR" os-env .
f
IN: scratchpad "" "FACTOR" set-os-env
IN: scratchpad "FACTOR" os-env .
""
IN: scratchpad "FACTOR" unset-os-env
IN: scratchpad "FACTOR" os-env .
f
There might be other cross-platform environment-related topics to investigate, such as an open issue to look into case-preserving but case-insensitive environment variables on Windows.
PRs welcome!
HiDPI is a name for high resolution displays, sometimes called retina displays. A long long time ago, I added support for Retina Displays on macOS using Factor. But, they have not been well supported on either Linux or Windows platforms.
That ends today!
Some users have seen the “small window” problem on Linux, where on high resolution displays the Factor UI listener was rendered super tiny:
This is now fixed, it renders at the appropriate resolution detecting the
screen it is launched on, or using the GDK_SCALE environment variable:
There has been one report that this works in the Gnome environments but not on KDE, so we might still have a few code changes necessary to make this more universal. And we also still need to switch from using our older GTK integration to the newer one with clean support for Wayland.
Other users have noticed the blurry text on Windows, due to using a legacy compatibility mode:
This is now fixed, rendering with the correct scaling factor:
It has been tested with 200% and 300% scaling factors. It is possible that intermediate scaling factors like 150% are not well supported and additional tweaks might be necessary to make this more universal.
Currently, on all three supported platforms, we use a global scaling factor which does not allow for moving Factor windows cleanly between screens with different scaling factors, for example when using HDMI on presentations, etc.
PRs welcome!
planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp.