05 Oct 20

This article discusses a relatively unknown data structure, the suffix tree, and shows how its characteristics can be used to attack difficult string matching problems.

by mlb

22 Sep 18

Article that describes the steps to generate the code in MS Excelwithout using any Visual Basic or other scripting.

by mlb

30 Aug 17

Overview of several approaches to rank results by an average score provided by users, and why most fail to provide a satisfying solution. Concluding with a formula that yields better results.

by mlb

10 Aug 16

Notes on how to programmatically generate fictional maps that look like something you’d find at the back of fantasy novels.

by mlb

19 Jan 12

Efficiently querying geospatial data is a considerable challenge: because the data is two-dimensional (or sometimes, more), you can’t use standard indexing techniques to query on position. Spatial indexes solve this through a variety of techniques. This post covers several - quadtrees, geohashes, and space-filling curves - and reveal how they’re all interrelated.

by mlb

29 Nov 11

Notes on the following algorithms and methods:

  • Levenshtein distance
  • Damerau-Levenshtein distance
  • Bitap algorithm with modifications by Wu and Manber
  • Spell-checker method
  • N-gram method
  • Signature hashing method
  • BK-trees
by mlb


02 Sep 11

Most people use this web site to get information about a particular number sequence. If you are a new visitor, then you might ask the database if it can recognize your favorite sequence, if you have one.

by mlb

05 May 11

If you have a web site with a search function, you will rapidly realize that most mortals are terrible typists. Many searches contain mispelled words, and users will expect these searches to magically work. This magic is often done using levenshtein distance. In this article, I’ll compare two ways of finding the closest matching word in a large dictionary.

by mlb

27 Aug 10

Pattern matching algorithm used in GNU Grep (see: http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html )

by mlb

06 Aug 10

The purpose of this note is to illustrate how the ordinary standards of ran- domness have little to do with the type of randomness required for cryptographic purposes. That is, there are really two standards of randomness.

by mlb

02 Jul 10

“When approaching the string comparison optimization problem, what we would like to do is to provide effective and efficient ways to rule out most of the candidate strings. We may refer to it as a “disqualifying comparison” - it lets us move faster down the search tree or move faster along the hash bucket linked list, until reaching the final string comparison in the search, keeping in mind that even the most efficient hash structure would probably waste a substantial amount of its time and cycles in string comparison.”

by mlb


25 Jun 10

That’s the crazy thing about malloc implementations. They all claim to be awesome in every way. So the burning question is: is this better than what I’m using today? I don’t know, but I’d sure like to!

by mlb

15 Mar 10

Most credit card numbers are validated using an algorithm called the “Luhn check”. This is a very simple algorithm that doubles the odd digits and does a sum to see if the number is divisible by 10.

by mlb

27 Jan 10

A new algorithm is proposed for removing large objects from digital images. The challenge is to fill in the hole that is left behind in a visually plausible way. A. Criminisi*, P. P´erez and K. Toyama

by mlb