Sequences
Sequences are the key abstraction that connects two of the most important parts of Clojure - immutable persistent collections and the sequence library. Lisp has a deep tradition of list-oriented transformation functions. In Clojure, these functions have been separated from the specific linked list data structure and built instead upon the sequence abstraction. Sequences are logical lists that can be sequentially traversed.
The key sequence abstraction functions are:
(first coll)
- return the first element(rest coll)
- returns a logical collection of the rest of elements (not necessarily a seq). Never returns nil.(next coll)
- returns a seq of the rest of the elements, which will be nil if no elements remain.(cons item seq)
- construct a new sequence with the item prepended to seq
Sequences are traversed by a series of first
and rest
or next
calls. As elements of the sequence are realized, they are cached. Sequences are immutable unfolding views of a source (whether a collection or something else).
rest
and next
are importantly different in their usage. rest
returns a logical collection (possibly a sequence, possibly not). This allows some sequences to be lazier about forcing the evaluation of the next element before it’s needed. As the lazier choice, rest
is usually preferred when traversing a sequence.
next
may evaluate the next element and will return either nil or a sequence containing at least one element. (next s)
is equivalent to (seq (rest s))
.
Terminology and empty sequences
There is some very careful language used in the oldest docs and docstrings around the terminology of sequence, seq, etc. As far as I read it, a collection produces a seq, a lazy-seq
returns a lazy-seq, and a sequence is either a seq or a lazy-seq. This distinction was (at one point) reflected in the underlying Java interfaces, however that no longer exists and modern usage treats seq and sequence as interchangeable terms.
One related and confusing aspect of sequences centers around the representation of a sequence of no elements. A seq (or the result of calling seq
or next
) will always be either nil or a sequence with at least one element. The result of calling sequence
or rest
will represent an empty sequence as a non-nil empty sequence, which may be an empty list, an unrealized empty lazy sequence, or something else.
Depending on context, either nil or an empty sequence may be the expected representation.
Seqable
Collections or other things that can produce a sequence are seqable. Calling seq
on a seqable will return either nil or a sequence with at least one element. Because of this behavior, it is idiomatic to use seq
as a terminating condition check (aka nil-punning) while iterating a sequence:
Sequence functions (map, filter, etc) implicitly call seq
on the incoming (seqable) collection and return a sequence (possibly empty, not nil). You must use seq
on the result to nil pun.
Immutability
Because sequences are immutable, not stateful, they are safe to pass between threads, use in reference types, and use almost anywhere you would use persistent collections. Being able to blur the lines between use of collections and sequences is an important part of Clojure usability where things usually “just work”.
One place where the rule of immutability is bent is when working with Java interop. Sequences built on Java iterators (which are stateful) will be affected by the stateful nature of the source. If the iterator source is changing, all the normal iterator rules will apply depending on the iterator type, including the possibility of surfacing a ConcurrentModificationException
or missing values added during sequence traversal for concurrent iterators.
Sequences built on Java arrays are subject to mutability and do not separately cache values for performance. If you wish to create a safer sequence on a Java array, you must copy it first.
For example, as a Clojure programmer you should find it deeply disturbing to get different answers to calling first on a sequence over a mutated array:
So don’t do that. If your code fully controls the Java array backing the sequence, this is not an issue. If you need to protect against array mutation, copy the array into a new array instead:
Laziness
Because sequences are only evaluated an element at a time, they can represent an infinite stream of computed (or extracted) values. Lazy sequences can be created with the lazy-seq macro, which takes a body that yields a seq, nil, or anything seqable.
For example, a simplified version of map could be written:
Here, seq
is called on the incoming coll, which will return either nil (in which case nil is returned, terminating the lazy-seq) or a sequence with at least one element. We return a new sequence built using cons
on the mapped first element and a a nested call to map
. This looks like recursion, but as lazy-seq
is a macro, this is really an unfolding that occurs in separate stacks as the seq evaluates, not recursion.
One particular issue with lazy-seq
is that the body of the lazy-seq can close over locals in some cases, causing the entire lazy sequence to inadvertently be held in memory, even when no users of the lazy seq are retaining a reference to the sequence “head”. This is called “head-holding”.
The lazy-seq
macro specfically addresses this by clearing the locals in the tail call of the body. Because seqs cache their values, the body will only be invoked once per evaluation and clearing locals is safe.
Many of the lazy sequence functions in Clojure amortize the realization of elements by producing a “chunk” of N elements at a time. Chunking is an interesting (and very effective) optimization that is woven throughout various important parts of Clojure. I’m going to avoid talking about it further here though and leave that for a future post.
Java perspective
Clojure is implemented in both Java and Clojure. Many of the most important sequence interfaces and implementations are implemented in Java:
clojure.lang.ISeq
- the sequence abstraction interfaceclojure.lang.ASeq
- an abstract sequence for easier sequence implementationclojure.lang.LazySeq
- lazy sequence implementationclojure.lang.Seqable
- seqable markerclojure.lang.Sequential
- a collection trait indicating whether a collection is sequential (an ordered series of values). Lists, vectors, and effectively all seqs are sequential.
The Clojure list implementation (clojure.lang.PersistentList
) is a list data structure and thus is both a concrete data structure and also implements the ISeq abstraction directly. All of the other collections are Seqable, but not ISeq.
Predicates and functions
Some other important sequence predicates:
seq?
- checks whether an instance implements ISeqsequential?
- checks whether an instance implements Sequential
Update: There is no seqable?
predicate, although that might be useful to add. Such a function already exists in clojure.core.incubator and it’s implementation demonstrates that there are a number of cases that seq
can also handle: existing seqs, nil, Iterable, arrays, Strings, and java.util.Map instances.
Reference
A few important reference pages:
- Sequences - the main sequence documentation
- Lazy - a historical document at the time of the last major overhaul of Clojure sequences
- Cheat sheet - a great list of categorized sequence functions