Why Aren’t We All Machine-Friendly Researchers?

October 19, 2015 Hilda Bastian Science Communication

I blame the writing and research impact advice we get. At least in part. It doesn’t prepare us as well as it could for our relationship with machines.

When we’re told to think of “the reader”, what follows usually neglects the non-human ones. And when was the last time you saw a stakeholder map with computers or robots getting so much as a look-in?

But the machines are our go-betweens, and that’s a pretty critical relationship. Other researchers generally won’t be able to find and use our research without them. Machines are helping people with visual impairments read too. And they’re mining text and data.

How well the machines can do all that depends a lot on us. And like so many relationships, little things count. A bit more consideration of our partners’ struggles would go a long way.

Take structured IDs. They’re the glue that can hold so much together in the world between humans and their machines.

Here’s a critical one – the ID of a clinical trial in a registry. With it, articles generated by the investigators can be automatically and reliably linked up with the vital details in the registry. This is what the International Committee of Medical Journal Editors (ICMJE) says journals and authors should do:

The ICMJE recommends that journals publish the trial registration number at the end of the abstract. The ICMJE also recommends that, whenever a registration number is available, authors list this number the first time they use a trial acronym to refer either to the trial they are reporting or to other trials that they mention in the manuscript.

The CONSORT reporting guidelines include it – and show you what it looks like:

You (and machines) can find that trial by its number in PubMed – and the PubMed and ClinicalTrials.gov entries can find each other, too. Citations link up reliably, systematic reviewers can identify multiple publications from the same trial easily.

Simple and clear, right? What could go wrong? Plenty, it turns out.

Some people just don’t think about it and leave the number out. Or they put their own trial’s number in the abstract, but not when they mention/cite another trial. Some cut up the number, inserting something between the “NCT” and the digits. Others drop the “NCT” – some spell it out (“National Clinical Trial 123456”).

The desire to be found, read, and cited should motivate us to be meticulous about citing accurately. It might be only at that moment when we can’t find something – or one of our own publications doesn’t come up when we search – that it registers how important this side of science craft is.

Photo of the first computer indexing the biomedical literature at the National Library of Medicine — The National Library of Medicine’s first computer for digitizing the index to the biomedical literature at the National Library was an IBM 360/50 in 1973.

After World War II, as the battle to be noticed grew, A. J. Meadows shows that the length of titles of scientific papers grew too. By 1976, almost 3.5 million citations had been added to the precursor of PubMed by the National Library of Medicine (NLM). Adding key words arrived with the machines. So did citation analysis and much more.

We need to do more, though, than cite properly and throw in some key words. At the very least for the title and abstract, we need to be writing with both human and non-human readers in mind. Search engine optimization isn’t just for websites. (When I write an abstract, I write first for the humans, then I edit for the machines.)

We can give ourselves permanent unique identifiers now, too. ORCID just celebrated its third birthday and gaining ground quickly. (I’m part way through adding my publications there – and the exercise is a reminder of how many publications get missed by major databases.)

To be a fully machine-friendly researcher, open access and open data is the ultimate goal. Jeffrey Furman and Scott Stern have written about research-enhancing institutions: organizations that enable others to find and exploit knowledge gains efficiently and ideally rapidly. At the beginning and the end of these chains, are individual researchers who need to share and find information effectively.

Granted, sometimes trying to please the machines becomes a major exercise in frustration. They can be rather, well… let’s say high maintenance. Everything has to be Just. So. But even though they can lack a sense of humor, the results can be hilarious.

We speak a lot of the need for technology to be user-friendly. That’s definitely a two-street though. Even if we can’t feel affection for the machines we need to share our research, we need to work out how to live with them. We definitely can’t afford to forget about their point of view.

~~~~

This post builds on part of a plenary talk I gave this month at the Cochrane Colloquium in Vienna, in a panel on information overload, with John Ioannidis and Ben Goldacre. (Here on YouTube, slides on Slideshare.)

Disclosure: I work on PubMed projects at the NLM’s National Center for Biotechnology Information (NCBI).

The cartoon is my own (CC-NC license). (More at Statistically Funny and on Tumblr.)

The photo of the first computer at the U.S. National Library of Medicine (NLM) indexing the biomedical literature is from Cheryl Rae Dee (2007) in the Journal of the Medical Library Association. Further details from Wyndham D. Miles (1984), A History of the National Library of Medicine: The Nation’s Treasury of Medical Knowledge.

The tweet of the coded data tangle is from Michelle Reeve (@michellereeve).

The excerpt of the CONSORT reporting guidelines for clinical trials comes from David Moher and colleagues (2010) [PDF].

* The thoughts Hilda Bastian expresses here at Absolutely Maybe are personal, and do not necessarily reflect the views of the National Institutes of Health or the U.S. Department of Health and Human Services.