John Benediktsson: Non-repeating
The Programming Praxis blog, which recently achieved the milestone of One Million Hits, posted a challenge a couple days to find the first unrepeated character in a string: Given a string, find the first character that appears only once in the string. For instance, given the string “aabbcddd”, the first character that appears only once is the “c” found at the 4th character in the string, counting from 0. Be sure that your program properly handles the case that all characters appear more than once. Our quick-and-dirty solution, using lexical variables, will add each element in the sequence to a hash-set, adding the element to an accumulator if not seen before, removing the element otherwise: :: non-repeating ( seq -- seq' ) We can then use it like so: ! find all non-repeating characters (in-order) An improvement might be, instead of using remove! which will look for and remove all occurrences of an object in a sequence, we could instead only remove the first occurrence (knowing an object will only ever be in a sequence once): : remove-first! ( obj seq -- seq ) How else might we improve it? John Benediktsson: Terminfo
While investigating how to determine if a terminal is color capable, I re-discovered terminfo databases. These database files store the capabilities of terminals in a device-independent manner. tputThe simple answer is to use the tput program to lookup the terminal functionality (using the $ TERM=xterm-256color tput colors If you trace the system calls that ... I wanted to have access to these capabilities from Factor, without running terminfoThe compiled terminfo file is created by the tic program, and begins with a header containing six two-byte short integers:
We can parse this header pretty easily using the pack vocabulary: TUPLE: terminfo-header names-bytes boolean-bytes #numbers The names section comes next, containing the various names for the terminal separated by a "|" character and terminated by a NUL byte ("0"): : read-names ( header -- names ) The boolean section is stored as one byte per boolean flag, either a 0 or 1: : read-booleans ( header -- booleans ) The number section is stored as a sequence of two-byte short integers, aligned to an even byte (meaning if the name and boolean sections consume an "odd" number of bytes, an extra byte is inserted that should be skipped over to ensure the numbers start on an even byte): : read-shorts ( n -- seq' ) The strings are more complex, stored in two sections. The first section is a sequence of two-byte short integers and the second section is a sequence of bytes. To rebuild the string capabilities, interpret the integers as an offset into the string table: : read-strings ( header -- strings ) Putting this all together, we can "parse" our terminfo file into an object: TUPLE: terminfo names booleans numbers strings ; Finally, we can write a parsing word to convert a terminfo file into a terminfo object: : file>terminfo ( path -- terminfo ) /usr/share/terminfoThe terminfo files are stored in MEMO: terminfo-names ( -- names ) If instead, we wanted to lookup a specific terminal, we can map the name of the terminal to a directory. On Mac OS X, these are stored in a sub-directory with the hexadecimal representation of the first byte in the string. On Linux, the first character is the name of the sub-directory: HOOK: terminfo-path os ( name -- path ) Success!With just this much implemented, we can lookup our "max_colors" attribute, knowing it is the 14th number in the numbers table: : max-colors ( name -- n/f ) I added support for parsing all the capabilities into a hashtable, and allowing named lookup (rather than needing to know the offset like we used above). This is available now in the terminfo vocabulary. John Benediktsson: Factor 0.96 now available
"You're smart too late and old too soon." - Mike Tyson I'm very pleased to announce the release of Factor 0.96!
This release is brought to you with over 1100 commits by the following individuals: Alex Vondrak, Benjamin Pollack, Daniel Nagel, Doug Coleman, John Benediktsson, Jon Harper, Michael T. Richter, and @PGGB. Aside from bug fixes and various library improvements, I want to highlight the following changes:
Some possible backwards compatibility issues:
Since quite a few performance improvements have been made, I thought I would highlight some improvements using our benchmarks against version 0.95. The chart below shows benchmark time in 0.96 as a percent of the time it took in 0.95, lower is better: What is FactorFactor is a concatenative, stack-based programming language with high-level features including dynamic types, extensible syntax, macros, and garbage collection. On a practical side, Factor has a full-featured library, supports many different platforms, and has been extensively documented. The implementation is fully compiled for performance, while still supporting interactive development. Factor applications are portable between all common platforms. Factor can deploy stand-alone applications on all platforms. Full source code for the Factor project is available under a BSD license. New Libraries
John Benediktsson: Factorial
Calculating factorial numbers is frequently used to compare programming languages. Usually the implementation is simple, as is the one in Factor: MEMO: factorial ( n -- n! ) I hadn't realized until I skimmed the Wikipedia article on Factorials that there are actually many more kinds of factorials and what Factor really needed was implementations of all of them! Lots of Factorialsprimorial
double-factorial
multifactorial quadruple-factorial
super-factorial
hyper-factorial
alternating-factorial
exponential-factorial
factorial-prime?
primorial-prime?
falling-factorial
factorial-power
rising-factorial
Now available!That's probably more factorials than anyone really cares to know about, but now Factor has more than ten times as many factorials as before! The code could probably use some cleanup, but it is available now in the math.factorials vocabulary. John Benediktsson: Bitwise permutations
Using bit twiddling hacks can be fun. When I noticed compute the lexicographically next bit permutation, I knew it could be a neat feature for Factor. The idea is that given a number like 7 (represented as Next PermutationThe "bit hacks" website uses this C code to calculate the next permutation of bits: unsigned int v; // current permutation of bits A direct translation into Factor looks like this: : next-permutation-bits ( v -- w ) Permutation WordsFactor has a math.combinatorics vocabulary containing useful operations on permutations and combinations for sequences. Now that we have a way of obtaining the next bitwise permutation of a number, I thought we should have similar words for operating on permutations of bits. A helper word takes : permutation-bits-quot ( bit-count bits quot -- n pred body ) That might look a little complicated, but it makes these operations fairly simple: : each-permutation-bits ( bit-count bits quot: ( n -- ) -- ) Try ItYou can then do things like this: ! find all 5-bit numbers with 3 bits set This is available in the main repository in the math.combinatorics.bits vocabularly. John Benediktsson: move
I've used Factor to build several common unix programs including copy, cat, fortune, wc, and others. Today, I wanted to show how to build the
We can make a nice usage string to display if the arguments are not correct: : usage ( -- ) Moving files into a directory (and displaying the usage if the destination is not a directory): : move-to-dir ( args -- ) If we specify two arguments, we are either moving a single file into a directory, or renaming a file using the move-file word: : move-to-file ( args -- ) A "main" word checks the number of arguments and performs the appropriate action: : run-move ( -- ) An improvement that could be made would be producing better error messages when files don't exist. Something like the errors $ mv src dst The code for this is available on my Github. John Benediktsson: Improved hash-sets
For quite a long time, the hash-sets vocabulary in Factor has had this note at the top: ! In a better implementation, less memory would be used By wrapping a hashtable, we had a pleasantly simple set implementation: M: hash-set in? table>> key? ; Unfortunately, this implementation had the consequence of increased memory usage (members of the set would be stored twice - as both key and value in the hashtable) and some performance implications due to collecting the underlying hashtable into an array and then throwing out the values to return only the keys. After a new hash-set implementation and a series of minor improvements, we had a greatly improved hash-set. While working on this, I realized we could speed up growing the backing array and applied a similar improvement to hashtables. Some performance results using our benchmark.hash-sets test: Before: IN: scratchpad gc [ hash-sets-benchmark ] time After: IN: scratchpad gc [ hash-sets-benchmark ] time I have since converted various parts of the compiler infrastructure to use hash-sets which had previously used hashtables for set operations and have gotten some overall performance improvements. This is available now in the development version of Factor. John Benediktsson: Faster "shuffle"
A funny comment on a Reddit discussion of my last post about "shuffle" said: "Turtle races are fun indeed." ImprovementsWell, that is perhaps true given the performance of Factor, Python, and Ruby compared with implementations in C (or SBCL and LuaJIT). However, I'm happy to say that as of today, the performance of randomize (the standard library "shuffle" word) in Factor is greatly improved. Before:IN: scratchpad 10,000,000 iota >array After:IN: scratchpad gc [ randomize ] time IN: scratchpad gc [ randomize ] time CommitsSo, what changed? Well, while investigating the overhead of Factor versus C, I fixed a few things:
Looking at other programming languages is useful to know where your performance can improve and by approximately how much. Since Factor is an attempt at making a high level language that produces relatively fast code, I hope someday to turn this "turtle" into a "hare". Every little bit helps! |
Blogroll
planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp. |