Selected coding projects
Nonconsumptive
A standard and set of python libraries for distributing
fast, random-access access to large textual collections using
the Apache Arrow format.
Deepscatter
Fast, animated, interactive online maps that scales easily to
billions, not millions, of points using WebGL and Apache Arrow.
Stable Random Projection
General-purpose, lightweight dimensionality reduction for
book or article-length texts. A trick involving cryptographic hashes
makes it possible to use the same space for any language without a
pre-trained model or dictionary.
WordVectors
An R package for training and exploring word2vec models with a fluent
vocabulary taking advantage of R's ability to add, subtract, and perform
other vector-space models.
Pandoc Svelte Components
An implementation of
pandoc's
rich document model as pandoc components to allow the creation
of rich interactive documents from markdown files.
Bookworm
Tools for tokenizing and visually exploring large textual collections
backed by an extremely fast MySQL architecture and served over the web
through an expressive API.
Markdown Lectures
Document transformation scripts for writing talks and course lectures that
simultaneous generate their own slidedecks and outlines with identifying terms,
to keep everything aligned.