All your base are belong to JACS

This is a follow-up post to yesterday’s that looked at word clouds made up from the titles of JACS papers from the last 115 years.

Jake Yeston commented on Twitter about the lack of catalysis-based words in the clouds. This is something that also caught my eye and I’ve now had a chance to dig a little deeper into this.

The way the word clouds work (the ones you can make using Wordle at any rate) is by counting exact copies of the same word and then scaling the size of the word in the cloud in proportion to the number of times it appears in the input text. So, if you look closely at the word clouds from yesterday’s post, you will see ‘reaction’ and ‘reactions’ both appearing in the same word cloud. Similarly, acid and acids, complex and complexes, study and studies, and so on. Wordle also does not separate hyphenated words, so you will see things like ‘gas-phase’ and ‘electron-transfer’.

What does this mean for catalysis? Well, I started looking through the titles for the 2010-2014 data and found all of the following words (and there are probably other variants that I missed):

anticatalysis, autocatalysis, autocatalytic, biocatalysts, biocatalytic, catalase, catalysis, catalyst, catalytic, catalytically, catalyze, catalyzed, catalyzes, catalyzing, cocatalysis, cocatalytic, cocatalyzed, electrocatalysis, electrocatalyst, electrocatalysts, electrocatalytic, electrocatalyze, multicatalytic, nanocatalysts, organocatalysts, organocatalytic, photocatalysis, photocatalyst, photocatalysts, photocatalytic, precatalyst

This means that catalysis is being spread quite thin and not being lumped together as a single entry in the word clouds. But it gets worse. In the 2010-2014 cloud, if you look carefully you can find ‘palladium-catalyzed’… and remember what I said above about Wordle not separating hyphenated words? Not only is ‘palladium-catalyzed’ counted separately from ‘palladium’ and ‘catalyzed’, but also separately from things like ‘Pd-catalyzed’ too. And obviously you get lots of different ‘X-catalyzed’ terms, such as ‘gold-catalyzed’, ‘Rh-catalyzed’, ‘copper-catalyzed’, and so on. There’s an awful lot of catalysis going on, it just isn’t adequately captured in the word clouds. On the other hand, consider the word ‘synthesis’ — sure, it might lose some of its count to ‘synthetic’, but that’s about it; there aren’t anywhere near as many derivatives of ‘synthesis’ as there are of ‘catalysis’.

To get a sense of how much catalysis (in any and all of its guises) has been published in JACS down the years, I went back to the lists of titles and then searched for ‘catal’ as a fragment. For comparison, I did the same for ‘synth’ and what I found is plotted below.


In the 2000s, ‘catal’ words were almost level with ‘synth’ words, and by the end of the current decade, it looks very much like they will be in the lead. Is this the decline of synthesis?

Now, as I pointed out in yesterday’s post, it seems as though chemists really have something for acid and acids. Those words dominate the clouds in the early-to-mid part of the 20th century. On Twitter, Cafer Yavuz suggested that ‘base’ and ‘basic’ might be excluded as part of the set of common words, but I don’t think that is the case. Wanting to get a sense of acid vs base, I repeated the ‘catal’/’synth’ analysis for these words. The results are plotted below:


The analysis is not perfect, partly because ‘base’ and ‘basic’ can have different meanings (more so than acid and acidic), and ‘base’ is also a fragment of ‘based’ which might be adding to its total. Nevertheless, something interesting appears to be happening. When it comes to acids and bases, it seems that the balance of power (in JACS at least) is shifting — where acids once ruled supreme, bases took the crown in the 2000s and seem to be consolidating their position in the current decade.

If you have any questions about the analysis (or other things you want me to look for in the titles), just leave a comment or drop me a line on Twitter. Similarly, if you want the raw data, drop me a line by e-mail, I’m happy to share.

5 Responses to All your base are belong to JACS

  2. Anon says:

    I haven’t done a rigorous analysis, but including alkali/alkaline seems to roughly triple the size of the base column for the 1900s and double it for the 1950s. (These are probably inflated to some extent by counting papers on groups I and II of the periodic table.)

  3. kayakphilip says:

    Interesting analysis. I’ve been doing a lot of text analytics recently (using an engine called Attivio, although using that here would be like bringing a sledgehammer). It is almost always key to do some level of normalization – essentially applying synonym dictionaries – to collapse various spellings or synonyms to a single concept. If you were to do this analysis on JMedChem instead, a pertinent example would be the way that biological targets are referenced (e.g. 5HT2a and HTR2 and Htr-2 and….. are the same thing)

    • stu says:

      Yeah, I appreciate that. This is just a quick and dirty analysis that throws up a few interesting things – I realise it’s not terribly sophisticated.

