Visualizing Tolkien’s readability

While I’m somewhere between Brazil and Argentina, on my vacations, I’ll let you with this interesting article about the “readability” of the 3 major books of Tolkien: The Lord of the Rings, The Hobbit and The Silmarillion.

Which one of them is the hardest to read? Which one can a person who’s not a fan find it a bit boring?

Let’s not guess answers here and analyze data! Read the article below:

Visualizing Tolkien

Why?

First and foremost, I am a huge fan of J.R.R. Tolkien’s work. I have lost count of the number of times I have read Lord of the Rings. But I had not read The Hobbit or The Silmarillion yet, and decided to put an end to that situation.

Prior to purchasing it, I read several reviews about The Silmarillion. One of the reviewers argued that it was the hardest book to readbecause ‘and’ was the most used word in the book. I wondered if that was the case. And if not, why is it that The Silmarillion is so hard to read? And I can testify that it is definitely hard to read: I attempted to read it at least thrice past year, but I ended reading several other books instead, Lord of The Rings again as well.

The classic graphs

To find out if ‘and’ is the most frequent word in The Silmarillion, I wrote a simple program who counted how many times did each word appear in the book. This quickly contradicted the affirmation by that reviewer, since the most frequent word in The Silmarillion is the, followed by and and of.

Obviously, as I had this program and it could analyse any text instantly, I thought that maybe I could analyse the other two main important works from Tolkien: The Hobbit and The Lord Of The RingsIf I place all results side by side I may be able to deduct why ‘The Silmarillion’ is not as readable as the others, I said to myself. So I did:

Interestingly enough, all three books share the same top three words. In fact, all of their top words are pretty much the same (the, and, of, in, to, he, that, …). So it was totally unfair (apart from incorrect) to blame them for the lack of readability of a book.

What about the proportions and the distribution of words? If we compare the shapes of each chart together, it is easy to see that while the shapes for The Hobbit and The Lord of The Rings charts are really similar, the same does not occur with The Silmarillion, where there is a huge quantitative difference between the top three words and the rest. Now that might explain something!

But I was not satisfied with this analysis yet. You cannot reduce style differences to numbers only; there were a number of factors that I had not considered yet: relations between words, typical constructions, language richness, even the length of the text itself! So I built a few more charts:

The word count chart confirms something we knew: The Hobbit is shorter than The Lord of The Rings, but slightly surprises me when it shows so clearly that The Silmarillion is almost half the length than LOTR. Specially because the reader does not experience that very same perception.

Maybe the readability differences could be attributed to the originality index? That is an index that I “invented”, taking the number of unique words for each book and dividing it by the total word count. That would provide us with another way of comparing the books. But the originality index chart is surprising as well. I expected The Hobbit to have the lowest index, since that was the book that I perceived as easiest to read; in fact I even thought it was slightly dumb at certain points, too much children-oriented. But I was wrong; proportionally it is the most original book, and according to this chart, The Silmarillion would be only a bit less enjoyable than LOTR, that with only a 3% index, should be a bore.

My assumptions were not working, because LOTR is not a bore!

Could it be that I had taken into account the stop words but I should have not? I am referring to common English words such as the, and, of… — which are the most frequent in these works! On one hand I was very tempted to execute again the program, excluding those words. On the other hand, I did not believe it could be a good idea, since when we read a book, we are reading the stop words as well. We are not one of those rudimentary search engines who need to filter information out in order to distinguish keywords! If I removed those words from the text, the results would correspond to entirely different books.

Still, I decided to build a simple chart comparing the proportion of stop words vs. non stop words. Again, the results were surprising. One would expect The Silmarillion to have more filler text, but it was quite the contrary, with The Hobbit being the richest in stop words. In any case, the differences between books were not very significative.

I ran another quick test (not pictured in this page) where I built these charts for Dracula instead of LOTR. That returned a very different set of results on every chart, so maybe instead of using these indices to compare books of the same author, they could be used to compare books of a known authors versus anonoymous books — that way we could guess who was the author of a book or piece of text!

And here end the most classical-academic of my speculations about Tolkien. I was satisfied with refuting that reviewer regarding the overuse of ‘and’, and had also found some interesting surprises. I could think of more indices to be calculated: the proportion of verbs, adverbs, adjectives, nouns and etc; types of used tenses, type of constructions… but if I really wanted to get serious with this whole text analysis business, that would require way more time and resources than building a few charts and speculating about them.

By http://5013.es/p/1/

Well, well, well…next time someone criticizes the Silmarillion (my favorite book EVER) saying it’s hard to read, boring story and some other absurds like that, here it’s provided the empirical data proving that’s not the case! Full objectiveness and zero subjectivity!

By

6 Comments

Filed under Linguistics, Prose, Silmarillion, The Hobbit, The Lord Of The Rings, Tolkien

6 responses to “Visualizing Tolkien’s readability

  1. I was surprised to see Turin in the first chart.

    • Heheheheh…yeah….right at the bottom, 0.22%, huh? He’s a major character though. Sure, others could have been expected too. Interesting to note is the dominance of Hobbits! The only character in the Hobbit chart is Bilbo (no wonder) and in the Lord of the Rings chart are Frodo and Sam!

      Go Hobbits Go!🙂

  2. Gregory Goff

    Impressive. A fine piece of detective work.

Á tecë sís:

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s