John Benediktsson: Cuckoo Filters
A Cuckoo filter is a Bloom filter replacement that allows for space-efficient probabilistic membership checks. Cuckoo filters provide the ability to add and remove items dynamically without significantly degrading space and performance. False positive rates are typically low. This data structure is explained by Bin Fan, Dave Andersen, Michael Kaminsky, and Michael Mitzenmacher in two papers: Cuckoo Filter: Better Than Bloom and Cuckoo Filter: Practically Better Than Bloom. There is also an implementation in C++ that can be referred to. The Cuckoo filter is basically a dense hash table that can support high load factors (up to 95%) without degraded performance. Instead of storing objects, we will store a hashed fingerprint. BucketsFirst, we need to create a number of buckets. Each bucket will hold 4 fingerprints. Load factors over 96% will cause us to grow our capacity to the next-power-of-2. ! The number of fingerprints to store in each bucket Making our buckets is then just an array of arrays: : <cuckoo-buckets> ( capacity -- buckets ) Given a fingerprint, we can check if it is in a bucket by calling member?: : bucket-lookup ( fingerprint bucket -- ? ) To insert a fingerprint into the bucket, we find the first empty slot and replace it with the fingerprint. We return a boolean value indicating if we were able to insert it or not: : bucket-insert ( fingerprint bucket -- ? ) To delete a fingerprint, we finding its index (if present) and set it to false. : bucket-delete ( fingerprint bucket -- ? ) If the bucket is full, we need to be able to swap a fingerprint into the bucket, replacing/removing an existing one: : bucket-swap ( fingerprint bucket -- fingerprint' ) HashingOur hashing strategy will be to generate the SHA-1 hash value for a given byte-array, splitting it into two 32-bit values (a 32-bit fingerprint, and a 32-bit index value). We will also generate an alternate index value as well using a constant from the MurmurHash to mix with the primary index: : hash-index ( hash -- fingerprint index ) Insert/Lookup/DeleteOur Cuckoo filter holds our buckets: TUPLE: cuckoo-filter buckets ; To insert an item into the Cuckoo filter, we calculate its ! The maximum number of times we kick down items/displace from To lookup an item, we calculate the :: cuckoo-lookup ( bytes cuckoo-filter -- ? ) To delete an item, we calculate the :: cuckoo-delete ( bytes cuckoo-filter -- ? ) This is available in the cuckoo-filters vocabulary along with some tests, documentation, and a few extra features. John Benediktsson: Backticks
Most languages support running arbitrary commands using something like the Linux system function. Often, this support has both quick-and-easy and full-featured-but-complex versions. In Python, you can use os.system: In Ruby, you can use system as well as "backticks": Basically, the difference between "system" and "backticks" is:
Factor has extensive cross-platform support for launching processes, but I thought it would be fun to show how custom syntax can be created to implement "backticks", capturing and returning standard output from the process: SYNTAX: ` You can use this in a similar fashion to Ruby or Perl: IN: scratchpad ` ls -l` Note: This syntax currently requires a space after the leading backtick. In the future, we have plans for an improved lexer that removes this requirement. This is available in the backticks vocabulary. John Benediktsson: Clock Angles
Programming Praxis posted about calculating clock angles, specifically to: Write a program that, given a time as hours and minutes (using a 12-hour clock), calculates the angle between the two hands. For instance, at 2:00 the angle is 60°. Wikipedia has a page about clock angle problems that we can pull a few test cases from: The hour hand moves 360° in 12 hours and depends on the number of hours and minutes (properly handling midnight and noon to be :: hour° ( hour minutes -- degrees ) The minute hand moves 360° in 60 minutes: : minute° ( minutes -- degrees ) Using these words, we can calculate the clock angle from a time string: : clock-angle ( string -- degrees ) John Benediktsson: left-pad
In the wake of an epic ragequit where Azer Koçulu removed all of his modules from npm (the node.js package manager), there have been so many entertaining discussions and explanations covering what happened. Today, Programming Praxis posted the leftpad challenge, pointing out that the original solution ran in quadratic time due to it's use of character-by-character string concatenation (but not pointing out that it only works with strings). First, the original code in Javascript: function leftpad (str, len, ch) { Now, a (simpler? faster? more general?) version in Factor: :: left-pad ( seq n elt -- newseq ) Using it, you can see it works: IN: scratchpad "hello" 3 CHAR: h left-pad . And it even works with other types of sequences: IN: scratchpad { 1 2 3 } 3 0 left-pad . I should also point out that Factor has pad-head that does this in the standard library and node.js has a pad-left module that solves the quadratic time problem (but still only works with strings). John Benediktsson: ISBN
Most books are issued a unique International Standard Book Number (ISBN) number. Often different formats of the same book will have different ISBN numbers. On a print book, you can usually find the ISBN on a barcode on the back cover. Most countries seem to have a national ISBN registration agency. In some countries this is a free service provided by a government agency. In other countries, this is operated by a commercial entity. In the United States, one company (R.R. Bowker LLC) has an apparent monopoly on issuing ISBN numbers which can cost $125 for one (less if you buy in bulk). The ISBN is 13 digits long if assigned starting in 2007, and 10 digits long if assigned before 2007. Each ISBN contains a check digit which is used for basic error detection We are going to build a few words in Factor to calculate the check digits and validate ISBNs. We need to turn an ISBN (which might include spaces or dashes) into numeric digits: : digits ( str -- digits ) For ISBN-10, the check digit is the sum of each of 10 digits multiplied by a weight (descending from 10 to 1) modulo 11. : isbn-10-check ( digits -- n ) For ISBN-13, the check digit is the sum of each of 13 digits multiplied by a weight (alternating between 1 and 3) modulo 10. : isbn-13-check ( digits -- n ) We can validate an ISBN by grabbing the digits and running either the ISBN-10 or ISBN-13 check and verifying that the result is zero. : valid-isbn? ( str -- ? ) The code (and some tests) for this is on my GitHub. John Benediktsson: Pig Latin
Pig Latin is a somewhat ridiculous language game which modifies words in such a funny way that is hard to figure out if you don't know how it works but easy if you do. Using Factor, we will build a converter from English to Pig Latin words. There are two basic rules we should implement:
We can implement our two basic rules: : pig-latin ( str -- str' ) We could improve this by:
Anyway, this is available on my GitHub. John Benediktsson: Bowling Scores
Today we are going to explore building a bowling score calculator using Factor. In particular, we will be scoring ten-pin bowling. There are a lot of ways to "golf" this, including this short version in F#, but we will build this in several steps through transformations of the input. The test input is a string representation of the hits, misses, spares, and strikes. The output will be a number which is your total score. We will assume valid inputs and not do much error-checking. A sample game might look like this: 12X4--3-69/-98/8-8- Our first transformation is to convert each character to a number of pins that have been knocked down for each ball. Strikes are denoted with : pin ( last ch -- pin ) We use this to convert the entire string into a series of pins knocked down for each ball. : pins ( str -- pins ) A single frame will be either one ball, if a strike, or two balls. We are going to use cut-slice instead of cut because it will be helpful later. : frame ( pins -- rest frame ) A game is 9 "normal" frames and then a last frame that could have up to three balls in it. : frames ( pins -- frames ) Some frames will trigger a bonus. Strikes add the value of the next two balls. Spares add the value of the next ball. We build this by "un-slicing" the frame and calling sum on the next balls. : bonus ( frame -- bonus ) We can score the frames by checking for frames where all ten pins are knocked down (either spares or strikes) and adding their bonus. : scores ( frames -- scores ) We can solve the original goal by just adding all the scores: : bowl ( str -- score ) And write a bunch of unit tests to make sure it works: This is available on my GitHub. John Benediktsson: Haikunator
The Haikunator is a project to provide "Heroku-like memorable random names". These names usually consist of an adjective, a noun, and a random number or token. The original repository is implemented in Ruby, with ports to Go, Javascript, Python, PHP, Elixer, .NET, Java, and Dart. We will be implementing this in Factor using the qw vocabulary that provides a simple way to make "arrays of strings" using the qw{ syntax. First, a list of adjectives: CONSTANT: adjectives qw{ Next, a list of nouns: CONSTANT: nouns qw{ We will make a token out of digits: CONSTANT: token-chars "0123456789" Finally, a simple : haikunate ( -- str ) We can try it a few times, to see how it works: IN: scratchpad haikunate . Some versions of "haikunate" in other languages include features such as:
This is available on my GitHub. |
Blogroll
planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp. |