When Nature Chemistry celebrated its 5th anniversary last year, we put together a word cloud (using Wordle) featuring the 150 words that appeared most often in the titles of the papers we had published up to that point. That was a collection of just under 600 papers, but a clear winner did emerge — ‘synthesis’ was the word used in titles more than any other (excluding some common words such as ‘from’, ‘by’, ‘to’, ‘with’, ‘and’, ‘so’, ‘on’…). It seems that a large part of chemistry is still very much about making things, and that reminds me of one of my favourite chemistry quotes:
‘la chimie crée son objet’ (chemistry creates its object) — Marcellin Berthelot, 1860.
The Nature Chemistry title-word cloud was not based on a particularly large data set, however, and is also from a very recent period. I wondered if the titles of chemistry papers have changed much over time, and so I decided to look to a journal with a lot more history. I wanted it to be a general chemistry journal to ensure there was no intrinsic bias towards words associated with a particular sub-field within chemistry and so I turned to the Journal of the American Chemical Society (JACS).
The date range I chose is somewhat arbitrary, but round numbers have a certain appeal and so I started at 1900 and worked my way up to 2014, the most recent complete year of JACS papers. This amounted to a little over 168,000 article titles and just shy of 2,000,000 words in total. I may well do more analysis in time, but first of all I decided to break down the data into decades (including a half-decade of 2010-2014 to cover the most recent papers) and look at the most popular 150 words for titles in each given period (excluding the same common words as we did when analysing the titles of Nature Chemistry papers).
Note that the size of each word corresponds to the number of times it appears in titles in that period — the larger it is, the more it is used. I have not combined words with the same root and nor have I combined singular and plural versions of the same word. I have made everything lowercase for the sake of simplicity though (otherwise ‘Synthesis’ appears as a separate entry to ‘synthesis’). Also, the number of papers published varies a lot between decades, so comparing the sizes of words between different clouds is meaningless.
This is what I found:
So, chemists at the start of the 20th century (yes, I know the century started on January 1st, 1901, but just go with it) were a determined bunch who liked to study milk, oil, wheat, sugar and urine — perhaps not all at the same time. Also, note the presence of a decent-sized ‘sulphur’. Yes, sulphur, with a ‘ph’. And remember, this is JACS, with all its American-ness. There’s not a hint of a ‘sulf’ to be found in JACS titles in this decade!
Still a healthy dose of determination, but also a lot of acid. And now ‘sulphur’ has become ‘sulfur’ — in fact, there are 143 ‘sulf’-based words and only 17 ‘sulph’ ones in titles from this decade.
Acid still looms large, but a lot of derivatives and compounds now too. Note that there is a lot more preparation than there is synthesis.
Seriously, what is it with chemists and acid? Compounds and derivatives remain popular and it seems as though synthesis is catching up a little with preparation.
The age of synthesis is upon us. And note the appearance of the word ‘spectra’ too. Also, ‘esters’, what’s going on there?
Synthesis remains dominant, but words such as ‘kinetics’ and ‘mechanism’ are growing larger, suggesting that there is an increasing drive to understand reactions as well. And ‘stereochemistry’ rears its head in the cloud for the first time.
Synthesis is not quite as prominent in the 1960s, but still a popular word in the titles of JACS papers. A new (and quite prominent) entry is ‘resonance’, along with ‘magnetic’, and note that both ‘nuclear’ and ‘proton’ are there too, reflecting the growing use of NMR as a technique to characterize chemical compounds. Another notable entry: ‘carbonium’ (the old name for carbocations), which was an active area of research at this time.
Chemists’ fascination with acid finally seems to be wearing off somewhat. And ‘complexes’ is now much more prominent. I suspect that this is a result of host–guest chemistry really taking off in the 1970s and the word ‘complex’ being associated with many more things than just traditional metal-coordination compounds.
There’s a fairly sizeable entry for ‘total’, and the vast majority of time it is used in the context of ‘total synthesis’ — and ‘synthesis’ itself dominates once more. Also note that the popularity of the word ‘via’ is increasing and both ‘novel’ and ‘new’ are well used (‘new’ seems to be a fairly constant presence in titles throughout the decades).
There’s still an awful lot of synthesis going on.
Nanotubes and nanoparticles make an appearance in the top 150 for the first time — nano comes of age? Other notable first-time entries (although small) are ‘supramolecular’, ‘self-assembly’ and ‘quantum’; I’m a little surprised it took so long.
Synthesis remains at the top, but look at the topics creeping into the top 150. ‘Metal–organic’ and ‘framework’ heralds the growing popularity of MOFs and it’s easy to miss, but there is also a little innocuous ‘graphene’ creeping into the picture at the bottom. ‘C–H’ is growing in size too, which is usually found in titles in the context of C–H activation. And finally, chemists’ love of ‘via’ is sealed!
To summarize, here are the top-ten words for each period:
(EDIT added June 3rd: I forgot to mention when I first posted this that for the top-ten lists I did combine simple singular and plural versions of the same word, so ‘reaction’ is actually ‘reaction’ and ‘reactions’ combined. Same goes for study/studies, complex/complexes, acid/acids and some of the others. What I did not do, however, is go beyond that and combine words that share the same root, so ‘synthesis’ and ‘synthetic’ have not been counted together and nor have ‘molecule’ and ‘molecular’, for example.)
Just to give you a sense of scale, if you don’t exclude the really common words, the top-20 words for the last full decade (2000-2009) are shown below (and remember that the words are scaled relative to the number of times they appear – the larger the word, the more times they appear in JACS titles).
So, the most common word in JACS titles is probably ‘of’ or, more meaningfully, ‘synthesis’.
(EDIT added June 3rd: there’s now a follow-up post, with some cautionary notes about word clouds and how they can miss some concepts…)