rejoyce: A few implementations of flatten

A few implementations of flatten

December 20, 2014

The Set-up

Yesterday, a friend challenged me to reimplement flatten in Clojure.

In case you've never used flatten before, it takes a sequence that may have nested sequences inside of it, and returns a sequence with no sequences inside. Here's what it looks like in action:

(def simplest-possible-case '(1 2 (3) 4 5))
(flatten simplest-possible-case)
; => '(1 2 3 4 5)

Ya folla?

My initial goal was to reproduce the The Little Schemer's recursive solution. However, I had a few problems. First, I had forgotten the book's "Fourth Commandment," namely to plan for a termination condition / base case.

Once I remembered that I needed a base case, I was still a little confused. In the Scheme solution, if you've hit something that is an empty list, you return an empty list-- without calling the flatten function again. All of the other items can cons onto that empty list.

For some reason, I had thought that Clojure's differences with other Lisps would prevent me from consing onto an empty list. This is not the case:

(cons 4 '())
(cons 4 [])
(cons 4 nil)
;; => (4)

These all return the same result, and so any of them would work. But using nil as the base case's return value is the most idiomatic Clojure solution.

So there are three cases: you have an empty list, you have something (an atom), or you have a non-empty list. The first is the base case, and we know how to handle that: return nil.

Usually, a Scheme solution for checking the other cases uses an atom? or a (not (pair? x)) function. There isn't an atom? function in Clojure, so I made one:

(def atom? (complement sequential?))

Now we can make a Scheme-esque solution.

(defn flat
  [s]
  (let [f (first s)
        r (rest s)]
    (cond (empty? s) nil
          (atom? f) (cons f (flat r))
          ;; (sequential? f)
          :else (concat (flat f) (flat r)))))

This gets the job done. Still, I knew I wasn't making the most out of Clojure's core functions.

The Crew

Rosetta Code is a website that lists many standard programming tasks and puzzles, including flattening a list, and then lists solutions in many languages. Here's what it has for Clojure:

(defn flat [coll]
  (lazy-seq
    (when-let [s (seq coll)]
      (if (coll? (first s))
        (concat (flat (first s)) (flat (rest s)))
        (cons (first s) (flat (rest s)))))))

Just like a sly con man or a Hollywood actor, this solution is pretty handsome. It's a lazy-seq; it uses a when-let / seq combo to take care of the base case, and wraps everything else into a concise if statement.

It does have one bug: by using coll? instead of sequential?, it will behave strangely when a sequence contains something like a map, which is a collection but not a sequence. In particular, calling this function on '({:a 3}) will return an empty list.

Even after fixing that problem, the truth has to come out: Clojure core has a different solution, which we simply have to give a scratch of our noses to.

The nose-touch

Hickey & co. use a function called tree-seq, to implement flatten. If you're like me, you've never even seen it before. Still, its name is a big clue.

The main idea is to think of flatten not just in terms of recursion, but in terms of a tree. You pass flatten a tree-like sequence, and you expect it to return the "leaves" of that tree.

Here's the source of tree-seq:

(defn tree-seq
  [branch? children root]
  (let [walk (fn walk [node]
               (lazy-seq
                (cons node
                      (when (branch? node)
                        (mapcat walk (children node))))))]
    (walk root)))

tree-seq takes three arguments: a predicate function, a function, and a sequence. These parameters are called branch?, children, and root.

The first value in the lazy-sequence is the input value. Then we check if the input value is a branch with the branch? predicate. If it is, we call the children function on it to get its children, and keep "walking" the function recursively. If it isn't, return nil, so that the highest level of the recursive calls can return a sensible list.

So tree-seq returns all of the branches and all of the leaves, as determined by the branch? and children functions. In flatten, we are expecting a sequence of sequences; thus, the branch? predicate is sequential, and the children function is seq. (We'll get to the rest of its source in a moment.) You'll recognize that both of these were used in the other flatten solutions.

Let's look at the result of calling tree-seq with sequential?, seq, and our simplest possible case:

(def result (tree-seq sequential? seq simplest-possible-case))
result
; => ((1 2 (3) 4 5) 1 2 (3) 3 4 5)

Woah! That result looks a little weird, and it sure isn't flat. But it does look like all of the branches (including the input value), and all of the leaves.

With flatten, we just want the leaves, or all the atoms. So this would be the obvious thing to do:

(filter atom? result)
; => (1 2 3 4 5)

That works great! But since we know that the very first item in the list will be the un-flattened input value, we can call rest on the result first, without losing any leaves:

(filter atom? (rest result))
; => (1 2 3 4 5)

Same return value, but this is presumably more efficient. And that, my friends, is how Hickey & co. implement flatten!:

(defn flatten [x]
  (filter (complement sequential?)
          (rest (tree-seq sequential? seq x))))

Aw yeah!

Success!

What else could we use tree-seq for? At this point, it should be pretty easy to implement a function that is the opposite of flatten:

(defn branches [x] (filter sequential? (tree-seq sequential? seq x)))

(branches simplest-possible-case)
; => ((1 2 (3) 4 5) (3))

It's not immediately obvious to me what you might use that for. I did think of another use of tree-seq that might actually come in handy: finding all of the values in a series of nested maps. Here's how:

(def nested-map {:a {:b {:c {:d 4}}} :e {:f 6}})

(defn all-vals 
  [m]
  (remove map? (rest (tree-seq map? vals m))))

(all-vals nested-map)
; => (6 4)

So we've read and evaluated several solutions of flatten, including Clojure core's solution. And we have a new Clojure core function under our belts.

You might be interested in checking out some other uses of tree-seq. It's used in two other core Clojure functions, and a handful of libraries. And who knows! You or I might make one of those libraries one day.

One last thing: it's true that flatten and tree-seq are unrelated to that wonderful film "The Sting." But the real Sting is that all of this came about by way of trying to solve Lewis Carroll's / Carin Meier's "Doublets" kata, and I have yet to solve it.

Perhaps my friend is right, though, and thinking about flatten will help a solution emerge...

Tags: Clojure

« 2014: A Year of Emacs || First post »

A few implementations of flatten

Links

Recent Posts

Tags