Much has been written about the sorry state of academic publishing. Luckily for me, the visualization community works with a number of publishers that are very sane, welcoming, and considerate with respect to personal copies of papers. All of them permit the free use of pre-prints on a personal website as long as it contains a text similar to the following:

Author's Copy. Find the definitive version at example.com/doi/12345

After doing this manually for a couple of papers, I finally decided to solve it directly in LaTeX. This is what I came up with: first, you need to add \usepackage[absolute]{textpos} to the preamble of your document. Next, add the following lines:

\setlength{\TPHorizModule}{\paperwidth}
\setlength{\TPVertModule} {\paperheight}

They ensure that the subsequent calls for placing text work with respect to the current paper size. In order to place a copyright notice at the top of your paper, you only have to do the following:

\begin{textblock*}{20cm}(0.5cm,0.5cm)
  Author's copy. To appear in IEEE Transactions on Visualization and Computer Graphics.
\end{textblock*}

Place this somewhere in your document, for example directly after \begin{document} and you should be good to go.

Happy publishing!

Posted Thursday evening, August 3rd, 2017 Tags:

The current political climate in the United States appears to be somewhat heated, to say the least. The current president, Donald J. Trump, is considered by many to be the product of an ever-declining political culture, in which many people feel increasingly disenfranchised and hopeless.

I was wondering whether these changes in political culture would show up as patterns—and where better to look for patterns than in the inauguration speeches of U.S. presidents? These speeches are meant to give a grand vision of the new term. Ideally, they should unite but also rally the voters to support the new administration and look forward to the subsequent years.

In this post, I am going to show you how to obtain all inauguration speeches, perform a brief language analysis on them, and report some interesting results. Onwards!

Getting the data

Thanks to the good folks of Wikisource, obtaining all speeches is quite easy. There is a special category for all speeches, and since the formatting of every speech contains (more or less) the same tags, it is a simple exercise in using BeautifulSoup to obtain and store all the speeches. See my Python script for more details. As a result, each speech is stored in the format of YYYY_Name.txt. We thus have 1789_George_Washington.txt, for example. Other than that, the text files are left as-is. I did not make any attempts at extracting more information from them.

Analysing the data

The simplest form of analysis that one might apply to these speeches involves basic word counting or tokenization in general. I will go a small step further and use stemming to reduce every word to their root form.

This procedure gives us a way to answer the following questions:

  • How long is the speech (in sentences)?
  • How long is the speech (in words)?
  • What is the average sentence length?
  • What is the number of unique words?
  • What is the number of unique lemmas?

The last question is worth explaining. A lemma in the sense of NLP or natural language processing denotes the canonical form of words in a corpus. For example, run, runs, ran, and running belong to the same lemma, viz. run.

I stole this example from the Wikipedia page on lemma; please refer to it for more details.

The reason for counting lemmas is that they give us a rough estimate of the complexity of a text. If a text contains many unique lemmas, it is likely to be more complex than a text with fewer unique lemmas.

I am aware that this is not the linguistically correct way of complexity analysis, but it gives us a qualitative overview of a text without delving deeper into its vocabulary.

Consequently, I wrote another script to analyse the speeches and store the statistics mentioned above in files. It is time for a quick analysis!

Results

Let’s first take a look at the average sentence length of all speeches. You can hover over each data point to see the name of the president giving the speech.

A plot of the average sentence length, in words, of all presidential inauguration speeches

We can see that—alas— the average length of a sentence is declining. This is not necessarily a bad thing, as shorter sentences are often thought to be understood better by readers and listeners. Given that metric, there are a few interesting outliers, such as John Adams, whose speech contains a real beast of a sentence:

On this subject it might become me better to be silent or to speak with diffidence; but as something may be expected, the occasion, I hope, will be admitted as an apology if I venture to say that if a preference, upon principle, of a free republican government, formed upon long and serious reflection, after a diligent and impartial inquiry after truth; if an attachment to the Constitution of the United States, and a conscientious determination to support it until it shall be altered by the judgments and wishes of the people, expressed in the mode prescribed in it; if a respectful attention to the constitutions of the individual States and a constant caution and delicacy toward the State governments; if an equal and impartial regard to the rights, interest, honor, and happiness of all the States in the Union, without preference or regard to a northern or southern, an eastern or western, position, their various political opinions on unessential points or their personal attachments; if a love of virtuous men of all parties and denominations; if a love of science and letters and a wish to patronize every rational effort to encourage schools, colleges, universities, academies, and every institution for propagating knowledge, virtue, and religion among all classes of the people, not only for their benign influence on the happiness of life in all its stages and classes, and of society in all its forms, but as the only means of preserving our Constitution from its natural enemies, the spirit of sophistry, the spirit of party, the spirit of intrigue, the profligacy of corruption, and the pestilence of foreign influence, which is the angel of destruction to elective governments; if a love of equal laws, of justice, and humanity in the interior administration; if an inclination to improve agriculture, commerce, and manufacturers for necessity, convenience, and defense; if a spirit of equity and humanity toward the aboriginal nations of America, and a disposition to meliorate their condition by inclining them to be more friendly to us, and our citizens to be more friendly to them; if an inflexible determination to maintain peace and inviolable faith with all nations, and that system of neutrality and impartiality among the belligerent powers of Europe which has been adopted by this Government and so solemnly sanctioned by both Houses of Congress and applauded by the legislatures of the States and the public opinion, until it shall be otherwise ordained by Congress; if a personal esteem for the French nation, formed in a residence of seven years chiefly among them, and a sincere desire to preserve the friendship which has been so much for the honor and interest of both nations; if, while the conscious honor and integrity of the people of America and the internal sentiment of their own power and energies must be preserved, an earnest endeavor to investigate every just cause and remove every colorable pretense of complaint; if an intention to pursue by amicable negotiation a reparation for the injuries that have been committed on the commerce of our fellow-citizens by whatever nation, and if success can not be obtained, to lay the facts before the Legislature, that they may consider what further measures the honor and interest of the Government and its constituents demand; if a resolution to do justice as far as may depend upon me, at all times and to all nations, and maintain peace, friendship, and benevolence with all the world; if an unshaken confidence in the honor, spirit, and resources of the American people, on which I have so often hazarded my all and never been deceived; if elevated ideas of the high destinies of this country and of my own duties toward it, founded on a knowledge of the moral principles and intellectual improvements of the people deeply engraven on my mind in early life, and not obscured but exalted by experience and age; and, with humble reverence, I feel it to be my duty to add, if a veneration for the religion of a people who profess and call themselves Christians, and a fixed resolution to consider a decent respect for Christianity among the best recommendations for the public service, can enable me in any degree to comply with your wishes, it shall be my strenuous endeavor that this sagacious injunction of the two Houses shall not be without effect.

Compare this to the second inauguration speech of George Washington, which is the briefest one, while still using longer sentences than any of the presidents starting their term in the 20th or the 21st century.

What about the active vocabulary of the presidents? To this end, let us take a look at the number of unique lemmas in a speech. Note that these are raw counts, so they will also give us an indication of how long a speech is. Again, hover over a data point to display the name.

Unique lemmas of all presidential inauguration speeches

Here, there is not so much of a clear trend but rather some interesting outliers. According to this metric, most presidents in the 20th century and onwards use about the same number of unique lemmas in a speech. This means that the speeches have roughly the same length and also the same complexity, provided you buy my argument that the number of unique lemmas is something interesting to consider. We can see that Donald J. Trump is not really an outlier in this regard; both Gerald Ford—a republican—in 1974 and Jimmy Carter—a democrat—in 1977 gave speeches are are rated approximately the same.

Interestingly, some other patterns arise. First, we see that the second inauguration speech of George Washington is really short and sweet. There are less than 200 unique lemmas. Second, the inauguration speech of William Henry Harrison is obviously extremely long and convoluted, more so than any other speech (so far; who knows what the future holds?). We can also see that the third inauguration speech of Franklin D. Roosevelt in 1945 really is on point. His message is short and simple. Afterwards, things start to get more complicated again.

Conclusion

What can we make of this? First, it is interesting to see that these simple qualitative analysis steps only give us a very narrow picture of what is happening in a speech. At least the oratory culture of the United States seems to be doing well. Most presidents use about the same complexity for their speeches. The number of unique lemmas decreased for modern times, though. These may be an artefact of the stemming method, though. As language changes over time, a simple stemmer that is trained for modern speeches will have its troubles when analysing older speeches.

I hope that this provided some insights. You are welcome to play with the data and do your own analysis. Please take a look at the GitHub repository for more information.

Posted Sunday afternoon, August 20th, 2017 Tags: