[ planet-factor ]

John Benediktsson: File Monitor

Factor has a cross-platform file-system change monitor which supports detecting changes to file names, attributes and contents under a specified directory.

There is some minor platform differences between Mac OS X, Windows, and Linux which may be worth looking at if you are building on top of the io.monitors vocabulary. I was curious about what kind of events are generated for various test-cases and built a small utility to experiment with it.

Some code to monitor for changed paths recursively in a directory and print each one out:

: watch-loop ( monitor -- )
dup next-change path>> print flush watch-loop ;

: watch-directory ( path -- )
[ t [ watch-loop ] with-monitor ] with-monitors ;

I've committed this as the file-monitor tool (with support for an optional command-line argument to specify which directory to monitor as well as printing the change descriptors). You can run it very simply:

factor -run=file-monitor [path]

An example session on Linux, monitoring some simple changes to files in /tmp:


$ factor -run=file-monitor /tmp &
Monitoring /tmp

$ touch /tmp/a
{ +add-file+ } /tmp/a
{ +add-file+ } /tmp/a
{ +modify-file+ } /tmp/a

$ echo "test" > /tmp/a
{ +modify-file+ } /tmp/a

$ rm /tmp/a
{ +remove-file+ } /tmp/a

Tue, 5 May 2015 22:31:00

John Benediktsson: File Server

Python has a neat feature that lets you serve files from the current directory.

# Python 2
python2 -m SimpleHTTPServer

# Python 3
python3 -m http.server

I always thought this was a quick and useful way to share files on a local network. Given that Factor has a HTTP Server, we should be able to implement this!

We already have support for serving static content and serving CGI scripts, so we can very simply implement a script to create and launch a HTTP server for the current directory (or the one specified on the command-line), logging HTTP connections to stdout.

This is available in the file-server vocabulary, now you can:

factor -run=file-server [--cgi] [path]

Currently, this defaults to serving files on port 8080 from all available network interfaces. In the future, it would be nice to add the ability to specify port and network interfaces to bind.

Sat, 2 May 2015 00:17:00

John Benediktsson: Burrows-Wheeler Transform

The Burrows–Wheeler transform is a reversible method of rearranging text used to improve the performance of compression algorithms, such as bzip2.

We will implement transform, bwt, and inverse transform, ibwt, in both Python and Factor. First with a slow and simple algorithm, and then second with a faster version.

Version 1

We start with the pseudocode suggested in the Wikipedia article:

function BWT (string s)
append an 'EOF' character to s
create a table, rows are all possible rotations of s
sort rows alphabetically
return (last column of the table)

In Python, this might look like:

def bwt(s):
s = s + '\0'
n = len(s)
m = sorted(s[i:] + s[:i] for i in range(n))
return ''.join(x[-1] for x in m)

In Factor, using all-rotations, it might look like this:

: bwt ( seq -- seq' )
0 suffix all-rotations natural-sort [ last ] map ;

The pseudocode to perform the inverse transform:

function inverseBWT (string s)
create empty table

repeat length(s) times
// first insert creates first column
insert s as a column of table before first column
sort rows of the table alphabetically
return (row that ends with the 'EOF' character)

In Python, this might look like:

def ibwt(s):
n = len(s)
m = [''] * n
for _ in range(n):
m = sorted(s[i] + m[i] for i in range(n))
return [x for x in m if x.endswith('\0')][0][:-1]

In Factor, we could implement it like this:

: ibwt ( seq -- seq' )
[ length [ "" <array> ] keep ] keep
'[ _ [ prefix ] 2map natural-sort ] times
[ last 0 = ] find nip but-last ;

Unfortunately, this is very slow, with most of the performance loss in the invert transform.

Version 2

Another way to increase the speed of BWT inverse is to use an algorithm that returns an index into the sorted rotations along with the transform.

In Python, it looks like this:

def bwt(s):
n = len(s)
m = sorted(s[i:] + s[:i] for i in range(n))
return m.index(s), ''.join(x[-1] for x in m)

In Factor, it might look like this:

: bwt ( seq -- i seq' )
dup all-rotations natural-sort
[ index ] [ [ last ] map ] bi ;

In Python, the inverse transform looks like this:

def ibwt(k, s):
def row(k):
permutation = sorted((t, i) for i, t in enumerate(s))
for _ in s:
t, k = permutation[k]
yield t
return ''.join(row(k))

In Factor, that roughly translates to:

: ibwt ( i seq -- seq' )
[ length ] [ <enum> sort-values ] bi
'[ _ nth first2 ] replicate nip ;

An improved version 2 is available in the development version. In particular, it uses rotated virtual sequences for increased performance and returns transformations that match the type of the input sequence.

Wed, 29 Apr 2015 22:49:00

John Benediktsson: Writing MIDI Files

Previously, I wrote about Reading MIDI Files using Factor.

Now, we are going to create a writer for MIDI files in less than 180 lines of additional code.

Variable-Length Quantity

To write a variable-length integer, we first "reverse" it, tagging the 8th bit of each additional byte. Then, we write each byte out to the output-stream.

: write-number ( n -- )
[ 0x7f bitand ] keep

[ -7 shift dup zero? ] [
[ 8 shift ] dip
[ 0x7f bitand 0x80 bitor + ] keep
] until drop

[ [ -8 shift ] [ 7 bit? ] bi ]
[ dup 0xff bitand write1 ] do while drop ;
Note: there is probably a cleaner way to do this. Patches are welcome! ☺

Text

Strings are encoded in UTF-8, prefixed with their encoded length in bytes (as a variable-length quantity).

: write-string ( str -- )
utf8 encode [ length write-number ] [ write ] bi ;

Writing Events

The three types of events will each have to be handled differently. To do this, we will make a generic method that is given the previous status byte (to enable "running status" for MIDI events) and returns the new status byte.

GENERIC: write-event ( prev-status event -- status )

First, we write MIDI events, implementing the "running status".

: write-status ( prev-status status -- )
dup 0xf0 < [
[ = ] keep swap [ drop ] [ write1 ] if
] [
nip write1
] if ;

: write-channel ( prev-status value status quot -- status )
[
swap [
"channel" of + [ write-status ] keep
] keep
] dip call ; inline

M: midi-event write-event
[ delta>> write-number ] [ value>> ] [ name>> ] tri {

{ "note-off" [
0x80 [
[ "note" of write1 ]
[ "velocity" of write1 ] bi
] write-channel ] }
{ "note-on" [
0x90 [
[ "note" of write1 ]
[ "velocity" of write1 ] bi
] write-channel ] }
{ "polytouch" [
0xa0 [
[ "note" of write1 ]
[ "value" of write1 ] bi
] write-channel ] }
{ "control-change" [
0xb0 [
[ "control" of write1 ]
[ "value" of write1 ] bi
] write-channel ] }
{ "program-change" [
0xc0 [ "program" of write1 ] write-channel ] }
{ "aftertouch" [
0xd0 [ "value" of write1 ] write-channel ] }
{ "pitchwheel" [
0xe0 [
"pitch" of min-pitchwheel -
[ 0x7f bitand write1 ]
[ -7 shift write1 ] bi
] write-channel ] }

! system common messages
{ "sysex" [
[ drop 0xf0 dup write1 ] dip
write 0xf7 write1 ] }
{ "quarter-made" [
[ drop 0xf1 dup write1 ] dip
[ "frame-type" of 4 shift ]
[ "frame-value" of + ] bi write1 ] }
{ "songpos" [
[ drop 0xf2 dup write1 ] dip
[ 0x7f bitand write1 ]
[ -7 shift write1 ] bi ] }
{ "song-select" [
[ drop 0xf3 dup write1 ] dip write1 ] }
{ "tune-request" [ 2drop 0xf6 dup write1 ] }

! real-time messages
{ "clock" [ 2drop 0xf8 dup write1 ] }
{ "start" [ 2drop 0xfa dup write1 ] }
{ "continue" [ 2drop 0xfb dup write1 ] }
{ "stop" [ 2drop 0xfc dup write1 ] }
{ "active-sensing" [ 2drop 0xfe dup write1 ] }
{ "reset" [ 2drop 0xff dup write1 ] }
} case ;

Next, we write meta events:

M: meta-event write-event
[ delta>> write-number ] [ value>> ] [ name>> ] tri
0xff write1 {
{ "sequence-number" [
B{ 0x00 0x02 } write 2 >be write ] }
{ "text" [ 0x01 write1 write-string ] }
{ "copyright" [ 0x02 write1 write-string ] }
{ "track-name" [ 0x03 write1 write-string ] }
{ "instrument-name" [ 0x04 write1 write-string ] }
{ "lyrics" [ 0x05 write1 write-string ] }
{ "marker" [ 0x06 write1 write-string ] }
{ "cue-point" [ 0x07 write1 write-string ] }
{ "device-name" [ 0x09 write1 write-string ] }
{ "channel-prefix" [ B{ 0x20 0x01 } write write1 ] }
{ "midi-port" [ B{ 0x21 0x01 } write write1 ] }
{ "end-of-track" [ B{ 0x2f 0x00 } write drop ] }
{ "set-tempo" [ B{ 0x51 0x03 } write 3 >be write ] }
{ "smpte-offset" [
B{ 0x54 0x05 } write {
[ "frame-rate" of 6 shift ]
[ "hours" of + write1 ]
[ "minutes" of write1 ]
[ "seconds" of write1 ]
[ "frames" of write1 ]
[ "subframes" of write1 ]
} cleave ] }
{ "time-signature" [
B{ 0x58 0x04 } write {
[ "numerator" of write1 ]
[ "denominator" of 2 /i write1 ]
[ "clocks-per-tick" of write1 ]
[ "notated-32nd-notes-per-beat" of write1 ]
} cleave ] }
{ "key-signature" [
B{ 0x59 0x02 } write
key-signatures value-at write ] }
{ "sequencer-specific" [
0x7f write1
[ length write-number ] [ write ] bi ] }
} case drop f ;

Finally, we write system-exclusive events:

M: sysex-event write-event
drop
[ delta>> write-number ]
[ type>> write1 ]
[ bytes>> write ] tri f ;

Writing a MIDI header and tracks, generically as "chunks":

GENERIC: write-chunk ( chunk -- )

M: midi-header write-chunk
$[ "MThd" >byte-array ] write
$[ 6 4 >be ] write
[ format>> ] [ #chunks>> ] [ division>> ] tri
[ 2 >be write ] tri@ ;

M: midi-track write-chunk
$[ "MTrk" >byte-array ] write
binary [
events>> f swap [ write-event ] each drop
] with-byte-writer
[ length 4 >be write ] [ write ] bi ;

Finally, words to write MIDI objects, either to a byte-array, or to a file.

: write-midi ( midi -- )
[ header>> write-chunk ]
[ chunks>> [ write-chunk ] each ] bi ;

: midi> ( midi -- byte-array )
binary [ write-midi ] with-byte-writer ;

: midi>file ( midi path -- )
binary [ write-midi ] with-file-writer ;

This is available now in the midi vocabulary.

Tue, 28 Apr 2015 16:16:00

John Benediktsson: Reading MIDI Files

MIDI is a specification for music, describing how electronic musical instruments and computers can communicate with each other.

Unlike digital audio formats such as MP3, the Standard MIDI File does not contain sounds, but rather a stream of instructions for playing notes, volume, tempo, and sound effects, as well as track names and other descriptive information. Because of this, MIDI files tend to be much smaller and typically allow the music to be easily rearranged or edited.

Using Factor, we will be creating a parser for reading MIDI files in under 180 lines of code.

Variable-Length Quantity

Some integers will be encoded as variable length, using 7 bits per byte with one bit reserved for the stop bit (indicating you have finished reading the number). This means the numbers 0 through 127 can be encoded in a single byte, but larger numbers will require additional bytes.

: read-number ( -- number )
0 [ 7 bit? ] [
7 shift read1 [ 0x7f bitand + ] keep
] do while ;

MIDI Events

There are three types of events: MIDI events, system-exclusive events, and meta events. The majority of events will usually be MIDI events, so we will parse those first.

Some MIDI events will include the channel in 4 bits of the status byte, so we handle those separately from the system common and realtime messages.

TUPLE: midi-event delta name value ;

C: <midi-event> midi-event
: read-message ( delta status -- message )
dup 0xf0 < [
[
! channel messages
[ 0x0f bitand "channel" ,, ] [ 0xf0 bitand ] bi {
{ 0x80 [ "note-off"
read1 "note" ,, read1 "velocity" ,, ] }
{ 0x90 [ "note-on"
read1 "note" ,, read1 "velocity" ,, ] }
{ 0xa0 [ "polytouch"
read1 "note" ,, read1 "value" ,, ] }
{ 0xb0 [ "control-change"
read1 "control" ,, read1 "value" ,, ] }
{ 0xc0 [ "program-change"
read1 "program" ,, ] }
{ 0xd0 [ "aftertouch"
read1 "value" ,, ] }
{ 0xe0 [ "pitchwheel"
read1 read1 7 shift + "pitch" ,, ] }
} case
] H{ } make
] [
{
! system common messages
{ 0xf0 [ "sysex" { 0xf7 } read-until drop ] }
{ 0xf1 [ "quarter-made" [
read1
[ -4 shift "frame-type" ,, ]
[ 0x0f bitand "frame-value" ,, ] bi
] H{ } make ] }
{ 0xf2 [ "songpos" read1 read1 7 shift + ] }
{ 0xf3 [ "song-select" read1 ] }
{ 0xf6 [ "tune-request" f ] }

! real-time messages
{ 0xf8 [ "clock" f ] }
{ 0xfa [ "start" f ] }
{ 0xfb [ "continue" f ] }
{ 0xfc [ "stop" f ] }
{ 0xfe [ "active-sensing" f ] }
{ 0xff [ "reset" f ] }
} case
] if <midi-event> ;

Meta Events

Meta events contain descriptive information such as track name, tempo and time signatures. They are also used to indicate the end of the track has been reached.

TUPLE: meta-event delta name value ;

C: <meta-event> meta-event
: parse-meta ( status bytes -- name value )
swap {
{ 0x00 [ 2 head be> "sequence-number" ] }
{ 0x01 [ utf8 decode "text" ] }
{ 0x02 [ utf8 decode "copyright" ] }
{ 0x03 [ utf8 decode "track-name" ] }
{ 0x04 [ utf8 decode "instrument-name" ] }
{ 0x05 [ utf8 decode "lyrics" ] }
{ 0x06 [ utf8 decode "marker" ] }
{ 0x07 [ utf8 decode "cue-point" ] }
{ 0x09 [ utf8 decode "device-name" ] }
{ 0x20 [ first "channel-prefix" ] }
{ 0x21 [ first "midi-port" ] }
{ 0x2f [ drop t "end-of-track" ] }
{ 0x51 [ 3 head be> "set-tempo" ] }
{ 0x54 [
[
5 firstn {
[
[ -6 shift "frame-rate" ,, ]
[ 0x3f bitand "hours" ,, ] bi
]
[ "minutes" ,, ]
[ "seconds" ,, ]
[ "frames" ,, ]
[ "subframes" ,, ]
} spread
] H{ } make "smpte-offset" ] }
{ 0x58 [
[
first4 {
[ "numerator" ,, ]
[ 2 * "denominator" ,, ]
[ "clocks-per-tick" ,, ]
[ "notated-32nd-notes-per-beat" ,, ]
} spread
] H{ } make "time-signature" ] }
{ 0x59 [ "key-signature" ] }
{ 0x7f [ "sequencer-specific" ] }
} case swap ;

: read-meta ( delta -- event )
read1 read-number read parse-meta <meta-event> ;

Sysex Events

For system-exclusive events, which are typically a sequence of bytes that are proprietary to particularly MIDI devices, we just preserve the type (0xf0 or 0xf7) and raw bytes.

TUPLE: sysex-event delta status bytes ;

C: <sysex-event> sysex-event
: read-sysex ( delta status -- event )
read-number read <sysex-event> ;

Reading Events

We can now read all types of events, dispatching on the status byte.

: read-event ( delta status -- event )
{
{ 0xf0 [ 0xf0 read-sysex ] }
{ 0xf7 [ 0xf7 read-sysex ] }
{ 0xff [ read-meta ] }
[ read-message ]
} case ;

Status bytes can be "running", which means that for channel events they can be dropped from the stream if they are identical to the previous MIDI channel event. Meta events (0xff) do not set the running status.

: read-status ( prev-status -- prev-status' status )
peek1 dup 0x80 < [
drop dup
] [
drop read1 dup 0xff = [
nip dup
] unless
] if ;

Each event has a header that is the delta-time (encoded as a variable length integer) and the status (which may not be present if it is "running").

: read-event-header ( prev-status -- prev-status' delta status )
[ read-number ] dip read-status swapd ;

There are a few ways to parse all events from a byte-array, but I thought it was a good opportunity to try out peekable streams, checking if the next event is present.

: parse-events ( data -- events )
binary <byte-reader> <peek-stream> [
f [
peek1 [ read-event-header ] [ f f ] if dup
] [ read-event ] produce 2nip nip
] with-input-stream ;

Reading MIDI

MIDI files are grouped into a series of chunks. The first chunk is a MIDI header indicating the format (single or multiple simultaneous tracks), number of tracks in the file, and division (indicating how to interpret the delta-times in the file).

TUPLE: midi-header format #chunks division ;

: <midi-header> ( bytes -- header )
2 cut 2 cut [ be> ] tri@ midi-header boa ;

Typically, that is followed by MIDI tracks, each containing a series of events.

TUPLE: midi-track events ;

: <midi-track> ( bytes -- track )
parse-events midi-track boa ;

Reading the chunks in the file dispatch off the "chunk type":

: read-chunk ( -- chunk )
4 read 4 read be> read swap {
{ $[ "MThd" >byte-array ] [ <midi-header> ] }
{ $[ "MTrk" >byte-array ] [ <midi-track> ] }
} case ;

To read a MIDI stream, we read the header and then all the chunks in the file, storing them in a midi tuple.

TUPLE: midi header chunks ;

C: <midi> midi

: read-header ( -- header )
read-chunk dup midi-header? t assert= ;

: read-chunks ( header -- chunks )
#chunks>> [ read-chunk ] replicate ;

: read-midi ( -- midi )
read-header dup read-chunks <midi> ;

Parsing a MIDI from raw bytes or a file:

: >midi ( byte-array -- midi )
binary [ read-midi ] with-byte-reader ;

: file>midi ( path -- midi )
binary [ read-midi ] with-file-reader ;

This is available now in the midi vocabulary.

Fri, 24 Apr 2015 21:43:00

John Benediktsson: Long URLs

In a world of 140 characters, space is at a premium (even for the longest tweet ever). It has become very common to shorten URLs when embedding links.

There are a lot of URL shortening services available, including branded ones such as t.co (Twitter), goo.gl (Google), nyti.ms (New York Times), and youtu.be (YouTube). You might not know it, but Factor even includes one in the wee-url web application.

You could use something like the LongURL service to resolve short URLs back to the long URL they point to, but I thought it would be more fun to show how to use Factor to do it!

By default, our http.client automatically follows redirects until exceeding a configurable maximum. We will need to make requests that do not redirect, using HEAD to retrieve only the HTTP headers and not the full contents:

: http-head-no-redirects ( url -- response data )
<head-request> 0 >>redirects http-request* ;

We use symbols to configure a maximum number of redirects (defaulting to 5) and to store the current number of redirects.

SYMBOL: max-redirects
5 max-redirects set-global

SYMBOL: redirects

We want a word that takes a URL and retrieves the next URL, if redirected. If we exceed our maximum number of redirects, it should throw an error.

: next-url ( url -- next-url redirected? )
redirects inc
redirects get max-redirects get <= [
dup http-head-no-redirects drop
dup redirect? [
nip "location" header t
] [ drop f ] if
] [ too-many-redirects ] if ;

To find the "long URL", just loop until we are no longer redirected:

: long-url ( short-url -- long-url )
[ [ next-url ] loop ] with-scope ;

To see it work, we can try it out with a short URL that I just made:

IN: scratchpad "http://bit.ly/1J0vm1x" long-url .
"http://factorcode.org/"

Neat!

This code is available on my GitHub.

Mon, 20 Apr 2015 14:57:00

John Benediktsson: Interpolate

Today, I made some minor improvements to the interpolate vocabulary, which provides simple string interpolation and formatting.

We have had the ability to use named variables:

IN: scratchpad "World" "name" set
"Hello, ${name}" interpolate
Hello, World

But now we can just as easily use stack arguments (numbered from the top of the stack):

IN: scratchpad "Mr." "Anderson"
"Hello, ${1} ${0}" interpolate
Hello, Mr. Anderson

In any order, even repeated:

IN: scratchpad "James" "Bond"
"${0}, ${1} ${0}" interpolate
Bond, James Bond

As well as anonymously, by order of arguments:

IN: scratchpad "Roses" "red"
"${} are ${}" interpolate
Roses are red

And even mix named variables and stack arguments:

IN: scratchpad "Factor" "lang" set
"cool" "${lang} is ${0}!" interpolate
Factor is cool!

Right now we simply convert objects to human-readable strings using the present vocabulary. In the future, it would be nice to support something like Python's string format specifications, which are similar but slightly different than our printf support.

Sun, 19 Apr 2015 03:42:00

John Benediktsson: Morse Code

A couple days ago, Verizon posted a press release complaining about the FCC's recent changes to Internet regulations. Normally, I wouldn't really bother with things like this, but they posted their statement using morse code. While it would be easy enough to read their English version, I thought it would be fun to decode it using Factor.

This feels a little wonky and a little fragile, but part of that is probably not having higher level words that we can use for moving between parsed HTML and its TEXT representation.

USING: html.parser html.parser.analyzer html.parser.printer
http.client io kernel morse sequences splitting wrap.strings ;

Step 1. Download the blog post and parse the HTML.

"http://publicpolicy.verizon.com/blog/entry/fccs-throwback-thursday-move-imposes-1930s-rules-on-the-internet"
http-get nip parse-html

Step 2. Extract the morse code text from the post.

"blog" find-by-class-between
"p" find-between-first html-text

Step 3. Split the morse code into words.

"&nbsp;&nbsp;" split-subseq

Step 4. Parse each word's morse code, joining and wrapping the text.

[ morse> ] map " " join 60 wrap-string print

The result is:

today's decision by the fcc to encumber broadband internet
services with badly antiquated regulations is a radical
step that presages a time of uncertainty for consumers,
innovators and investors. over the past two decades
a bipartisan, light-touch policy approach unleashed
unprecedented investment and enabled the broadband internet
age consumers now enjoy. the fcc today chose to change
the way the commercial internet has operated since its
creation. changing a platform that has been so successful
should be done, if at all, only after careful policy
analysis, full transparency, and by the legislature, which
is constitutionally charged with determining policy. as a
result, it is likely that history will judge today's actions
as misguided. the fcc's move is especially regrettable
because it is wholly unnecessary. the fcc had targeted tools
available to preserve an open internet, but instead chose to
use this order as an excuse to adopt 300-plus pages of broad
and open-ended regulatory arcana that will have unintended
negative consequences for consumers and various parts of the
internet ecosystem for years to come. what has been and will
remain constant before, during and after the existence of
any regulations is verizon's commitment to an open internet
that provides consumers with competitive broadband choices
and internet access when, where, and how they want.

Sun, 1 Mar 2015 02:19:00

Blogroll


planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp.

Syndicate