Henry James Style

I wanted to see if I could learn anything about style using R and decided to examine Henry James’ short stories. He is an interesting case because he straddles two, maybe three, periods. Early in his career he could be classed as Victorian and at the end of his career he could be seen to be an early modernist. Could I use text mining tools to see if there were any changes in style over the course of his career? I set about gathering material and used what I could find on Project Gutenberg — 52 stories — to form my corpus. The biggest challenge was cleaning the text. Once I had useable material I was able to do some exploratory analysis.

The first thing I tried was to measure average sentence length. If there’s one thing everyone knows about Henry James it’s that he wrote notoriously thorny sentences but did they change over time? My graph suggests that they did.

According to this, he seems to have abandoned shorter sentences in 1890 though this apparent conclusion could be a result of my not having a full set of texts. I don’t think that much can be concluded by the upward trajectory of the trend line. “Julie Bride” is an outlier. Still, “The Beast in the Jungle” is a more indulgent story than “Daisy Miller”. By that, I mean that it is more clearly written for its own sake rather than with the view to earn popular acclaim. But what of “The Turn of the Screw” which sits pretty close to “The Beast in the Jungle” and isn’t half so difficult to follow? My rough conclusion is that sentence length isn’t a meaningful way to judge a change in style over time.

Around this time, I was reading Enumerations by Andrew Piper. I was intrigued by the way he used cluster analysis to show the genealogy of different editions of Leaves of Grass. I adapted some of his code for my own purposes (some of it was broken and we had a productive discussion via email about why that might be) and produced this chart:

I don’t think this told us much unfortunately. It’s true that there are two main groups and that the group on the right hand side is largely made up of novellas whereas the remainder on the left are short stories. This tells us that James had two main styles: one for stories and one for novellas. This might be interesting in some general way but it does not get me very far. Style is hard to metricise and reading, here, is the better method. That might change over time and it is a project I intend to return to.

Previous
Previous

Machine Learning and the Historical Novel