Selected coding projects

Nonconsumptive

A standard and set of python libraries for distributing fast, random-access access to large textual collections using the Apache Arrow format.

Deepscatter

Fast, animated, interactive online maps that scales easily to billions, not millions, of points using WebGL and Apache Arrow.

Stable Random Projection

General-purpose, lightweight dimensionality reduction for book or article-length texts. A trick involving cryptographic hashes makes it possible to use the same space for any language without a pre-trained model or dictionary.

WordVectors

An R package for training and exploring word2vec models with a fluent vocabulary taking advantage of R's ability to add, subtract, and perform other vector-space models.

Pandoc Svelte Components

An implementation of pandoc's rich document model as pandoc components to allow the creation of rich interactive documents from markdown files.

Bookworm

Tools for tokenizing and visually exploring large textual collections backed by an extremely fast MySQL architecture and served over the web through an expressive API.

Markdown Lectures

Document transformation scripts for writing talks and course lectures that simultaneous generate their own slidedecks and outlines with identifying terms, to keep everything aligned.