Posts with tag bookworm
Back to all postsAndrew Piper announced yesterday that the McGill text lab is releasing their corpus of modern novels in three languages. One of first thoughts with any corpus is: what existing Bookworm methods might add some value here? It only took about ten minutes to write the code to import it into a bookworm; the challenge is figuring how methods developed for millions of books can be useful on a set of just 450.
A first pass at understanding the potential of the Hansard corpus through a Bookworm browser.
I’ve divided up the native XML by using the intrinsic speaker tag into a variety of individual speeches.
A “speech” can be very short; on average, each one in the Hansard corpus is 225 words.
There’s no full description of the D3 bookworm package yet, because it’s still something of a moving target.
But Abby Mullen wanted to know what the different possibilities were for charts through the API, so I thought it was time to give a quick tour.
Core chart types
Bookworm 0.4 is now released on github. It contains a number of improvements to the code from over the summer. It makes the existing code much, much more sensible for anyone wanting to build a bookworm on their own collections of texts based on the experience of many using it so far. All the stages: installation, configuration, and testing are now a lot easier. So if you have a collection of texts you wish to explore, I welcome you to test it out. (I’ll explain at more length later, but for the absolute lowest investment of time you can just run a prebuilt bookworm virtual machine using vagrant.)
This post is just kind of playing around in code, rather than any particular argument. It shows the outlines of using the features stored in a Bookworm for all sorts of machine learning, by testing how well a logistic regression classifier can predict IMDB genre based on the subtitles of television episodes.
I just saw Matt Wilkens’ talk at the Digital Humanities conference on places mentioned in books; I wanted to put up, mostly for him, a quick stab at some of the raw data running the equivalents on my movie bookworm.
This is a quick post to share some ideas for interacting with the data underlying the recent article by Ted Underwood and Jordan Sellers on the pace of change in literary standards for poetry.
Here are some interactives I’ve made in preparation for my talk at the Literary Lab at Stanford on Tuesday on plot arcs in television shows based on underlying language.
This is sort of in lieu of a handout for the talk, so some elements may not make much sense if you aren’t in the room.
Though more and more outside groups are starting to adopt Bookworm for their own projects, I haven’t yet written quite as much as I’d like about how it should work. This blog is attempt to rectify that, and begin to explain how a combination of blogging software, interactive textual visualizations, and a exploratory data analysis API for bag-of-words models can make it possible to quickly and usefully share texts through a Bookworm installation.