JSON or “JavaScript Object Notation” is widely used as a data format for storing, transmitting, and retrieving data objects. It is language-independent and has parsers in most modern programming languages. Factor is no exception, containing the json vocabulary.
I wanted to go over some wtfjs that relates to JavaScript Arrays and JavaScript Objects, and then see how something similar might work in Factor!
You can define an array:
const person = ["John", "Doe", 46];
Or you can define an object:
const person = {firstName: "John", lastName: "Doe", age: 46};
You can start with an array and set it’s values by index:
const person = [];
person[0] = "John";
person[1] = "Doe";
person[2] = 46;
You can start with an object and set it’s values by key:
const person = {};
person["firstName"] = "John";
person["lastName"] = "Doe";
person["age"] = 46;
Or, using “dot notation” on an object:
const person = {};
person.firstName = "John";
person.lastName = "Doe";
person.age = 46;
In JavaScript, arrays are indexed by number, and objects are indexed by name. But, you can mix these and create association arrays that can be… both?
> const list = ["A", "B", "C"];
> list.length
3
> list[0]
"A"
> list["0"]
"A"
> list.key = "value";
> list.length
3
> list.key
"value"
> Object.assign({}, list);
{0: "A", 1: "B", 2: "C", key: "value"}
That’s kinda weird.
Maybe Factor needs something like that? What if we define a type that has both a sequence and an assoc, and supports both the sequence protocol and the assoc protocol. How might that look?
TUPLE: js-array seq assoc ;
: <js-array> ( -- js-array )
V{ } clone H{ } clone js-array boa ;
INSTANCE: js-array sequence
CONSULT: sequence-protocol js-array seq>> ;
INSTANCE: js-array assoc
CONSULT: assoc-protocol js-array assoc>> ;
And now we can do something kinda similar:
IN: scratchpad <js-array>
{ "A" "B" "C" } [ suffix! ] each
"value" "key" pick set-at
IN: scratchpad dup first .
"A"
IN: scratchpad dup length .
3
IN: scratchpad dup members .
V{ "A" "B" "C" }
IN: scratchpad dup >alist .
{ { "key" "value" } }
IN: scratchpad "key" of .
"value"
Well, it doesn’t handle converting
string keys so that
"0" of
would return the same value as first
. And it doesn’t handle
combining all the number indexed keys and name indexed keys to an alist
output like the Object.assign({}, ...)
call above. And probably a few
other idiosyncrasies of the JavaScript association array that I’m not familiar
with…
But do we like this? I dunno yet.
The Raku programming language community comes up with interesting modules, and I enjoy getting emails from the Rakudo Weekly News. This week, there was a link to a post from a few days ago about making a weird data structure in Raku:
I came up with a weird data structure. It’s a hash, but you can also add functions that receive the hash as input so you can do math with it (if you squint, it’s vaguely like a spreadsheet). Something like:
m = MagicDict() m["a"] = 1 m["b"] = 2 m["sum"] = lambda self: self["a"] + self["b"] print(m["sum"])
Ideally, I’d want to do this in #RakuLang. (I know it’s possible because I did something much weirder once (I gave Str a CALL-ME method).)
I thought it would be fun to build this in Factor –
starting with a magic-dict
class that:
TUPLE: magic-dict assoc ;
: <magic-dict> ( -- magic-dict ) H{ } clone magic-dict boa ;
INSTANCE: magic-dict assoc
CONSULT: assoc-protocol magic-dict assoc>> ;
And the main piece of logic we need is to implement the at*
lookup word
and, if the value is a
callable,
call it with the assoc on the stack as an argument.
M: magic-dict at*
swap over assoc>> at* over callable?
[ drop call( assoc -- value ) t ] [ nipd ] if ;
This allows us to make a Factor version of the original example:
IN: scratchpad <magic-dict>
1 "a" pick set-at
2 "b" pick set-at
[ [ "a" of ] [ "b" of ] bi + ] "sum" pick set-at
IN: scratchpad "sum" of .
3
Pretty cool!
Factor word definitions have required stack effects. These are used by the stack checker to verify that word definitions are consistent with their stack effects, have internal branches that have similar stack effects, and other validness checks.
Long ago, these stack effects were optional – words might have looked something like this:
: add2 2 + ;
At some point over a decade ago, we made stack effects required, and now it looks like this:
: add2 ( m -- n ) 2 + ;
That change has been generally positive for the Factor standard library – helping readability, improving natural word documentation, and improving the locality of stack checker errors. Every once in awhile, I think fondly back to those simpler days.
Well, it’s still possible to have the good ol’ days back – check this out!
The stack-checker has an
infer word
to apply the stack checker algorithm to a quotation, returning the inferred
stack effect. You can try it in the Listener using the Ctrl-I
shortcut or
calling it directly:
IN: scratchpad [ 2 + ] infer .
( x -- x )
IN: scratchpad [ [ + ] with map ] infer .
( x x -- x )
We can use this to make new syntax that defines a word with the stack effect that was inferred:
USING: effects.parser kernel parser stack-checker words ;
SYNTAX: INFER:
[ scan-new-word parse-definition ] with-definition
dup infer define-declared ;
And now we can just write our words without bothering to write their stack effects:
INFER: add2 2 + ;
And then use it:
IN: scratchpad 1 add2 .
3
Pretty neat!
Factor has several types of classes that can be used to specialize methods on generic words and to disambiguate objects from other objects. Some examples are:
Today, we are going to discuss predicate classes and a recent feature that was contributed by @Capital-Ex to support anonymous predicates. A typical definition of a predicate class might look like this definition which describes all positive integers:
PREDICATE: positive < integer 0 > ;
You can use it in the listener to identify instances:
IN: scratchpad 12 positive? .
t
IN: scratchpad -5 positive? .
f
You can also dispatch on them:
GENERIC: wat ( obj -- str )
M: object wat drop "object" ;
M: positive wat drop "positive integer" ;
And see how that works:
IN: scratchpad 12 wat .
"positive integer"
IN: scratchpad f wat .
"object"
The new anonymous predicates feature allows us to rewrite that word with inline predicate definitions:
GENERIC: wat ( obj -- str )
M: object wat drop "object" ;
M: predicate{ integer [ 0 > ] } wat drop "positive integer" ;
And, in fact, we could extend it with some of the other anonymous classes to create this monstrosity:
GENERIC: wat ( obj -- str )
M: object wat drop "object" ;
M: predicate{ integer [ 0 > ] } wat drop "positive integer" ;
M: intersection{ bignum positive } wat drop "five" ;
M: union{ fixnum bignum } wat drop "integer" ;
M: maybe{ string } wat drop "maybe a string" ;
And then test most (all? who really knows?) of the cases:
IN: scratchpad 5 wat .
"positive integer"
IN: scratchpad 5 >bignum wat .
"five"
IN: scratchpad -5 wat .
"integer"
IN: scratchpad "hello" wat .
"maybe a string"
IN: scratchpad B{ } wat .
"object"
Pretty cool! This is available in the development version of Factor.
Recently, I was looking at the Zig programming language. As I often do, I started implementing a few typical things in new languages to learn them. Well, one of them was super slow and Zig is supposed to be super fast, so I was trying to understand where the disconnect was and compare it to Factor!
I was able to reduce the issue to a small test case and it turns out that there is a behavioral issue in their implementation of HashMap that makes their HashMaps get slow over time. The test case performs these steps:
HashMap
of 2 million itemsHashMap
HashMap
We record the total time each block of 1 million actions takes:
Something is very wrong!
This is the simple test case implemented in Zig using the std.HashMap
:
const std = @import("std");
pub fn main() !void {
var map = std.AutoHashMap(u64, u64).init(std.heap.page_allocator);
defer map.deinit();
var list = std.ArrayList(u64).init(std.heap.page_allocator);
defer list.deinit();
var prng = std.rand.DefaultPrng.init(0);
const random = prng.random();
var start = std.time.milliTimestamp();
var i: u64 = 0;
while (i < 2_000_000) : (i += 1) {
try map.put(i, 3);
try list.append(i);
}
while (i < 250_000_000) : (i += 1) {
var index = random.uintLessThan(usize, list.items.len);
var j = list.items[index];
var k = map.get(j).?;
if (k == 1) {
_ = map.remove(j);
try map.put(i, 3);
list.items[index] = i;
} else {
try map.put(j, k - 1);
}
if (i % 1_000_000 == 0) {
var end = std.time.milliTimestamp();
std.debug.print("{} block took {} ms\n", .{ i, end - start });
start = std.time.milliTimestamp();
}
}
while (list.items.len > 0) {
var j = list.pop();
_ = map.remove(j);
}
}
We can run it using ReleaseFast
to get the best performance and see
that, over time, it gets super slow – so slow that it isn’t even able to
really finish the test case:
$ zig version
0.11.0
$ zig run -O ReleaseFast maptest.zig
2000000 block took 156 ms
3000000 block took 122 ms
4000000 block took 127 ms
5000000 block took 133 ms
6000000 block took 138 ms
7000000 block took 141 ms
8000000 block took 143 ms
9000000 block took 145 ms
10000000 block took 147 ms
11000000 block took 148 ms
12000000 block took 151 ms
13000000 block took 153 ms
14000000 block took 155 ms
15000000 block took 157 ms
16000000 block took 159 ms
17000000 block took 164 ms
18000000 block took 167 ms
19000000 block took 171 ms
20000000 block took 173 ms
21000000 block took 180 ms
22000000 block took 186 ms
23000000 block took 190 ms
24000000 block took 195 ms
25000000 block took 205 ms
26000000 block took 213 ms
27000000 block took 221 ms
28000000 block took 234 ms
29000000 block took 247 ms
30000000 block took 264 ms
31000000 block took 282 ms
32000000 block took 301 ms
33000000 block took 320 ms
34000000 block took 346 ms
35000000 block took 377 ms
36000000 block took 409 ms
37000000 block took 448 ms
38000000 block took 502 ms
39000000 block took 550 ms
40000000 block took 614 ms
41000000 block took 694 ms
42000000 block took 767 ms
43000000 block took 853 ms
44000000 block took 961 ms
45000000 block took 1088 ms
46000000 block took 1250 ms
47000000 block took 1420 ms
48000000 block took 1612 ms
49000000 block took 1826 ms
50000000 block took 2056 ms
51000000 block took 2320 ms
52000000 block took 2688 ms
53000000 block took 3015 ms
54000000 block took 3467 ms
55000000 block took 3971 ms
56000000 block took 4618 ms
57000000 block took 5377 ms
58000000 block took 6172 ms
59000000 block took 7094 ms
60000000 block took 8173 ms
61000000 block took 9469 ms
62000000 block took 11083 ms
63000000 block took 12737 ms
64000000 block took 14000 ms
65000000 block took 16243 ms
66000000 block took 17912 ms
67000000 block took 20452 ms
68000000 block took 24356 ms
...
We can switch the example above to use std.ArrayHashMap
which does not have
this problem, although it is a bit slower with each block taking roughly 250
milliseconds.
We can compare that to a simple implementation in Factor:
USING: assocs calendar formatting io kernel math random
sequences ;
:: maptest ( -- )
H{ } clone :> m
V{ } clone :> l
now :> start!
0 :> i!
[ i 2,000,000 < ] [
3 i m set-at
i l push
i 1 + i!
] while
[ i 250,000,000 < ] [
l length random :> j
j l nth :> k
k m at 1 - [
k m delete-at
3 i m set-at
i j l set-nth
] [
k m set-at
] if-zero
i 1,000,000 mod zero? [
i now start time- duration>milliseconds
"%d block took %d ms\n" printf flush
now start!
] when
i 1 + i!
] while
[ l empty? ] [
l pop m delete-at
] until ;
We can run it in Factor and see how long it takes. There are notably some long delays in the first few blocks which I’d like to understand better – possibly due to excessive rehashing or some allocation pattern with the Factor garbage collector – and then it quickly reaches a steady state where each block takes about 250 milliseconds.
$ factor maptest.factor
2000000 block took 855 ms
3000000 block took 198 ms
4000000 block took 205 ms
5000000 block took 3579 ms
6000000 block took 4438 ms
7000000 block took 3624 ms
8000000 block took 2996 ms
9000000 block took 232 ms
10000000 block took 243 ms
11000000 block took 248 ms
12000000 block took 298 ms
13000000 block took 233 ms
14000000 block took 238 ms
15000000 block took 298 ms
16000000 block took 233 ms
17000000 block took 521 ms
18000000 block took 231 ms
19000000 block took 236 ms
20000000 block took 280 ms
21000000 block took 235 ms
22000000 block took 235 ms
23000000 block took 281 ms
24000000 block took 231 ms
25000000 block took 236 ms
26000000 block took 294 ms
27000000 block took 231 ms
28000000 block took 236 ms
29000000 block took 506 ms
30000000 block took 234 ms
31000000 block took 237 ms
32000000 block took 277 ms
33000000 block took 232 ms
34000000 block took 239 ms
35000000 block took 279 ms
36000000 block took 235 ms
37000000 block took 239 ms
38000000 block took 275 ms
39000000 block took 234 ms
40000000 block took 514 ms
41000000 block took 231 ms
42000000 block took 236 ms
43000000 block took 282 ms
44000000 block took 235 ms
45000000 block took 235 ms
46000000 block took 282 ms
47000000 block took 231 ms
48000000 block took 233 ms
49000000 block took 280 ms
50000000 block took 234 ms
51000000 block took 238 ms
52000000 block took 507 ms
53000000 block took 231 ms
54000000 block took 236 ms
55000000 block took 276 ms
56000000 block took 231 ms
57000000 block took 238 ms
58000000 block took 278 ms
59000000 block took 234 ms
60000000 block took 235 ms
61000000 block took 278 ms
62000000 block took 237 ms
63000000 block took 239 ms
64000000 block took 510 ms
65000000 block took 234 ms
66000000 block took 284 ms
...
Not bad!
So, Zig could be super fast, but the default std.HashMap
implementation
uses tombstone buckets to mark slots as being deleted, and over time these
tombstone buckets create fragmentation in the HashMap, which causes their
linear probing to trend towards the worst case examination of every bucket in
the HashMap when looking for a key.
We can implement a rehash()
method on the HashMap that performs an in-place
rehashing of all the elements, without allocations. Ideally, this would be done
when the number of filled and deleted slots reaches some capacity threshold.
But, for now, we can just run map.rehash()
once per block, and see how that
improves performance:
diff --git a/lib/std/hash_map.zig b/lib/std/hash_map.zig
index 8a3d78283..7192ba733 100644
--- a/lib/std/hash_map.zig
+++ b/lib/std/hash_map.zig
@@ -681,6 +681,11 @@ pub fn HashMap(
self.unmanaged = .{};
return result;
}
+
+ /// Rehash the map, in-place
+ pub fn rehash(self: *Self) void {
+ self.unmanaged.rehash(self.ctx);
+ }
};
}
@@ -1505,6 +1510,92 @@ pub fn HashMapUnmanaged(
return result;
}
+ /// Rehash the map, in-place
+ pub fn rehash(self: *Self, ctx: anytype) void {
+ const mask = self.capacity() - 1;
+
+ var metadata = self.metadata.?;
+ var keys_ptr = self.keys();
+ var values_ptr = self.values();
+ var curr: Size = 0;
+
+ // While we are re-hashing every slot, we will use the
+ // fingerprint to mark used buckets as being used and either free
+ // (needing to be rehashed) or tombstone (already rehashed).
+
+ while (curr < self.capacity()) : (curr += 1) {
+ metadata[curr].fingerprint = Metadata.free;
+ }
+
+ // Now iterate over all the buckets, rehashing them
+
+ curr = 0;
+ while (curr < self.capacity()) {
+ if (!metadata[curr].isUsed()) {
+ assert(metadata[curr].isFree());
+ curr += 1;
+ continue;
+ }
+
+ var hash = ctx.hash(keys_ptr[curr]);
+ var fingerprint = Metadata.takeFingerprint(hash);
+ var idx = @as(usize, @truncate(hash & mask));
+
+ // For each bucket, rehash to an index:
+ // 1) before the cursor, probed into a free slot, or
+ // 2) equal to the cursor, no need to move, or
+ // 3) ahead of the cursor, probing over already rehashed
+
+ while ((idx < curr and metadata[idx].isUsed()) or
+ (idx > curr and metadata[idx].fingerprint == Metadata.tombstone))
+ {
+ idx = (idx + 1) & mask;
+ }
+
+ if (idx < curr) {
+ assert(metadata[idx].isFree());
+ metadata[idx].fingerprint = fingerprint;
+ metadata[idx].used = 1;
+ keys_ptr[idx] = keys_ptr[curr];
+ values_ptr[idx] = values_ptr[curr];
+
+ metadata[curr].used = 0;
+ assert(metadata[curr].isFree());
+ keys_ptr[curr] = undefined;
+ values_ptr[curr] = undefined;
+
+ curr += 1;
+ } else if (idx == curr) {
+ metadata[idx].fingerprint = fingerprint;
+ curr += 1;
+ } else {
+ assert(metadata[idx].fingerprint != Metadata.tombstone);
+ metadata[idx].fingerprint = Metadata.tombstone;
+ if (metadata[idx].isUsed()) {
+ var tmpkey = keys_ptr[idx];
+ var tmpvalue = values_ptr[idx];
+
+ keys_ptr[idx] = keys_ptr[curr];
+ values_ptr[idx] = values_ptr[curr];
+
+ keys_ptr[curr] = tmpkey;
+ values_ptr[curr] = tmpvalue;
+ } else {
+ metadata[idx].used = 1;
+ keys_ptr[idx] = keys_ptr[curr];
+ values_ptr[idx] = values_ptr[curr];
+
+ metadata[curr].fingerprint = Metadata.free;
+ metadata[curr].used = 0;
+ keys_ptr[curr] = undefined;
+ values_ptr[curr] = undefined;
+
+ curr += 1;
+ }
+ }
+ }
+ }
+
oid {
@setCold(true);
const new_cap = @max(new_capacity, minimal_capacity);
@@ -2218,3 +2309,35 @@ test "std.hash_map repeat fetchRemove" {
try testing.expect(map.get(2) != null);
try testing.expect(map.get(3) != null);
}
+
+test "std.hash_map rehash" {
+ var map = AutoHashMap(u32, u32).init(std.testing.allocator);
+ defer map.deinit();
+
+ var prng = std.rand.DefaultPrng.init(0);
+ const random = prng.random();
+
+ const count = 6 * random.intRangeLessThan(u32, 100_000, 500_000);
+
+ var i: u32 = 0;
+ while (i < count) : (i += 1) {
+ try map.put(i, i);
+ if (i % 3 == 0) {
+ try expectEqual(map.remove(i), true);
+ }
+ }
+
+ map.rehash();
+
+ try expectEqual(map.count(), count * 2 / 3);
+
+ i = 0;
+ while (i < count) : (i += 1) {
+ if (i % 3 == 0) {
+ try expectEqual(map.get(i), null);
+ } else {
+ try expectEqual(map.get(i).?, i);
+ }
+ }
+}
We can apply that diff to lib/std/hash_map.zig
and try again, now taking
about 165 milliseconds per block including the time for map.rehash()
:
$ zig run -O ReleaseFast maptest.zig --zig-lib-dir ~/Dev/zig/lib
2000000 block took 155 ms
3000000 block took 147 ms
4000000 block took 154 ms
5000000 block took 160 ms
6000000 block took 163 ms
7000000 block took 164 ms
8000000 block took 165 ms
9000000 block took 166 ms
10000000 block took 166 ms
11000000 block took 165 ms
12000000 block took 166 ms
13000000 block took 165 ms
14000000 block took 166 ms
15000000 block took 172 ms
16000000 block took 165 ms
17000000 block took 167 ms
18000000 block took 165 ms
19000000 block took 167 ms
20000000 block took 169 ms
21000000 block took 168 ms
22000000 block took 167 ms
23000000 block took 166 ms
24000000 block took 167 ms
25000000 block took 167 ms
26000000 block took 165 ms
27000000 block took 166 ms
28000000 block took 166 ms
29000000 block took 165 ms
30000000 block took 165 ms
31000000 block took 165 ms
32000000 block took 166 ms
33000000 block took 165 ms
34000000 block took 167 ms
35000000 block took 170 ms
36000000 block took 165 ms
37000000 block took 166 ms
38000000 block took 166 ms
39000000 block took 164 ms
40000000 block took 165 ms
41000000 block took 167 ms
42000000 block took 166 ms
43000000 block took 167 ms
44000000 block took 169 ms
45000000 block took 166 ms
46000000 block took 165 ms
47000000 block took 166 ms
48000000 block took 166 ms
49000000 block took 166 ms
50000000 block took 166 ms
51000000 block took 166 ms
52000000 block took 164 ms
53000000 block took 165 ms
54000000 block took 167 ms
55000000 block took 165 ms
56000000 block took 166 ms
57000000 block took 166 ms
58000000 block took 165 ms
59000000 block took 166 ms
60000000 block took 169 ms
61000000 block took 165 ms
62000000 block took 165 ms
63000000 block took 166 ms
64000000 block took 166 ms
65000000 block took 165 ms
66000000 block took 166 ms
67000000 block took 176 ms
68000000 block took 166 ms
...
Well now, Zig is fast and everything is right again with the world – and
Factor takes only about 50% more time than Zig’s std.HashMap
with
rehash()
and about the same as std.ArrayHashMap
, which is pretty
good for a dynamic language.
I submitted a pull request adding a rehash() method to HashMap and hopefully it gets into the upcoming Zig 0.12 release and maybe for Zig 0.13 they can adjust it to automatically rehash when it gets sufficiently fragmented, consider using quadratic probing instead of linear probing, or perhaps switch to using a completely different HashMap algorithm like Facebook’s F14 hash table, which doesn’t have this issue.
Maybe we should consider some of these improvements for Factor as well!
Open source is fun!
Recently, I was looking into the Zig programming language and bumped into this video tutorial:
It presents an example – in the spirit of Fizz
buzz – called SMAC, which is
basically a program that converts the first million numbers into strings or
"SMAC"
(if the number is divisible by 7 or ends with a 7) and prints them
out. This is then implemented in three different forms:
We first start with the basics, a word to check if a number is SMAC or not:
: smac? ( n -- ? )
{ [ 7 mod 0 = ] [ 10 mod 7 = ] } 1|| ;
And then a word that converts a number into its string representation or SMAC:
: smac ( n -- str )
dup smac? [ drop "SMAC" ] [ number>string ] if ;
The simplest example – run in a single-thread – iteratively prints to the output-stream.
: smac. ( n -- )
[1..b] [ smac print ] each ;
Trying it out, you can see that it works:
IN: scratchpad 20 smac.
1
2
3
4
5
6
SMAC
8
9
10
11
12
13
SMAC
15
16
SMAC
18
19
20
Slightly more complex, the multi-threaded example splits the numbers into four n-groups, computes them as a future in four background threads, and then waits for those computations to finish and iteratively prints them out.
: smac. ( n -- )
[1..b] 4 <n-groups>
[ '[ _ [ smac ] map ] future ] map
[ ?future [ print ] each ] each ;
It produces the same output as the single-threaded smac.
word above.
The simple networked example creates a server configured to print n
results
when a client connects:
: smac-server ( n -- server )
utf8 <threaded-server>
"smac" >>name
7979 >>insecure
swap '[ _ smac. ] >>handler ;
You can run it:
IN: scratchpad 20 smac-server start-server
And then try it out:
$ nc localhost 7979
1
2
3
4
5
6
SMAC
8
9
10
11
12
13
SMAC
15
16
SMAC
18
19
20
A fun example for learning about a few introductory concepts in Factor!
A couple of days ago an interesting article was posted about the performance impact of the memoization idiom on modern Ruby. It covers some of the internal changes that have been happening in the last few releases of Ruby and how some “object shape” optimizations have impacted memoization performance.
If you look at the Wikipedia article for memoization, you can see it described as:
In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again.
While I use the memoize vocabulary in some of the posts on this blog and it is used in over 100 words in the Factor standard library, there is an opportunity to talk a bit more about how it works in Factor.
The first example often provided to show the benefit is a recursive definition of the Fibonacci sequence, which could be implemented as:
: fib ( m -- n )
dup 1 > [ [ 2 - fib ] [ 1 - fib ] bi + ] when ;
If you time this, it is quite slow:
IN: scratchpad [ 40 fib ] time .
Running time: 1.800567 seconds
102334155
In a similar manner to the functools.cache decorator in
Python, you
can easily improve this by caching the results of previous computations by
using the MEMO:
syntax which works with any
arity function:
MEMO: fib ( m -- n )
dup 1 > [ [ 2 - fib ] [ 1 - fib ] bi + ] when ;
And see that it is now much faster:
IN: scratchpad [ 40 fib ] time .
Running time: 0.005302375 seconds
102334155
If we wanted to memoize some internal part of a word, we have historically used words like cache or 2cache with a hashtable literal to compute a result by key, storing and returning if it was previously calculated. This is a simple form of memoization – and I realized that we could create a simpler syntax for this that uses the memoize implementation to support arbitrary quotations:
SYNTAX: MEMO[ parse-quotation dup infer memoize-quot append! ;
And then we use it in the middle of a word, pretending that something takes a long time to compute and we need to memoize the computation:
: answer ( a b -- )
"The answer is: " write MEMO[ + 1 seconds sleep ] . ;
Trying it out shows that the first time is slow and the second time is fast:
IN: scratchpad [ 1 2 answer ] time
The answer is: 3
Running time: 1.011060583 seconds
IN: scratchpad [ 1 2 answer ] time
The answer is: 3
Running time: 0.00117075 seconds
That’s a cool syntax word – and it’s available in the memoize.syntax vocabulary!
Álvaro Ramírez wrote a blog post about generating a SHA-256 hash from URL, the easy way where they describe wanting to download a file and generate a SHA-256 hash of the contents easily. Their solution involves copying a URL and then having some Emacs Lisp be able to read the clipboard, download the file, then generate and return the hash on the clipboard.
I thought I’d show how this can be done in Factor, by breaking the problem into smaller parts.
USING: checksums checksums.sha http.client io.directories io.files.temp kernel
math.parser namespaces sequences ui.clipboards ;
The first step is downloading a file to a temporary file, returning the path of the downloaded file:
: download-to-temp ( url -- path )
dup download-name temp-file [
[ ?delete-file ] [ download-to ] bi
] keep ;
The next step is to build a word that applies a checksum to the downloaded file contents:
: checksum-url ( url checksum -- value )
[ download-to-temp ] [ checksum-file ] bi* ;
The last step is to use the
clipboard
to access the URL that was copied – checking minimally that it looks like an
http
or https
URL – and then putting the checksum value back onto the
clipboard:
: checksum-clipboard ( checksum -- )
clipboard get clipboard-contents
dup "http" head? [ throw ] unless
swap checksum-url bytes>hex-string
clipboard get set-clipboard-contents ;
This could be improved with better error checking, and maybe cleaning up the temporary file that was downloaded after running the checksum.
Give it a try!
IN: scratchpad sha-256 checksum-clipboard
planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp.