“Thousands of scientists are cutting back on Twitter, seeding angst and uncertainty.” And Mastodon was the most common destination if they opened…
In 2019, I wrote a pair of posts about the risks of our reliance on Google Scholar (GS), and search engine alternatives for systematic reviewers. Back then, a thorough assessment of 28 searchable collections of academic records found only 2 that were large enough to come anywhere close to GS’ scope – the Microsoft Academic Graph (MAG) and WorldWideScience (WWS). And of those, only Microsoft indexed citations so you could search those as well. What’s more, MAG’s corpus was open, so you could download results – which you can’t in GS – or even the whole lot to use in creating something else.
But Microsoft canned Academic Search at the end of 2021. The exercise may have largely been a testing ground for the company’s “AI” search aspirations. Now it’s started a chatbot-driven arms race between its search engine, Bing, and Google. Meanwhile, Google’s cutting jobs and their search engine already seemed to be circling the drain, over-stuffed with ads and Google-surfaced content. It’s even easier to picture GS joining the long list of products scrapped by Google.
I rely on GS on a lot, and all the talk of threats to Google financial standing is making me nervous. So it was definitely time to see whether there are any viable, free-to-use, alternatives spanning all fields now – or even on the distant horizon. And I’ve ended up a lot more relieved – even optimistic – than I expected to be!
With GS held tightly to Google’s secretive chest, it seems to me there are 3 major strands to follow:
- Federated search of a wide range of databases;
- The growing corpus of open metadata, open access literature, and academic repositories, together with the expansion of open citations; and
- Services building on the MAG legacy Microsoft had released for others to use.
The first of those is the minimum most of us likely need – a way to at least search across a wide range of databases, instead of having to scour all relevant databases one at a time.
The second is complex, with limitations that make no scientific sense – but at least it’s a growing proportion of many areas of scientific work.
The third – building on the MAG content and system – may be the best chance to approximate, and likely surpass, what GS can do. (For example, you can’t download GS search results.)
Let’s go through the options in those categories – and please let me know if you know of free-to-use, scholarship-wide services that I’ve missed. As well as other search services I’ve seen, I’ll be drawing on the 2019 paper I mentioned above for some assessments, even though the information is out-of-date [Gusenbauer and Haddaway, 2019]. I’m only listing services I think are frontrunners in their category within the post: I’ve listed others I considered but didn’t include below the post.
1. Federated academic search
There is a large and stable one of these, thank goodness! It’s WorldWideScience (WWS), mentioned earlier. It has a lot of limitations – you can check some of those out in the 2019 assessment. Here’s some of what it does have:
- Wide and large scope – over 323 million records in 2019, when GS had over 389 million;
- Some advanced search functions;
- Alerts for new results for searches;
- Support for multiple languages; and
- Longterm viability.
About WWS: It comes from an international non-profit consortium, and it’s maintained by the US Department of Energy (DoE). It was a joint project of the DoE and The British Library, originally launched in 2008.
You can see the sources WWS is search across on the advanced search page.
I hope WWS is still going to grow, because it has lots of gaps that I think could potentially be filled if the commitment to improve the service is there. It’s a decent service already, though, and it has some advantages over GS. If the plug is pulled on GS, thanks to WWS, I won’t freak out!
2. Open metadata and academic repositories as a basis for search
A lot of the open access sources are already systematically gathered in WWS. But not all. And the area is growing. It’s patchy, though.
The frontrunner here, I think, is a large service from Germany: BASE, the Bielefeld Academic Search Engine, from Bielefeld University.
According to the 2019 assessment, it covered over 144 million records (when GS had over 389 million). BASE has over 317 million records now. Like WWS, it doesn’t include citations.
BASE was launched in 2004, and is comprised of 2 types of records. The team harvest metadata from institutional repositories and libraries that adopt a particular open access system (the OAI-PMH). They normalize (standardize) and index it for searching. Secondly, they index some websites and collections. BASE supports several languages, too.
The BASE team is on Mastodon: @email@example.com
3. Services building on the MAG legacy
I think these are the most exciting developments, because they incorporate linkages for citations. Searching studies that cite other studies is one of the most useful functions of GS.
First up, MAG in context. According to the 2019 assessment, MAG had over 213 million records at the time GS had over 389 million. Microsoft stopped updating MAG at the end of 2021. But it has a non-profit successor…
You may be familiar with Unpaywall, which helps you find free versions of publications. It’s one of the services produced by the US-based non-profit group, OurResearch. In May 2021, the group received a $4.5 million grant from Arcadia, a private charitable foundation founded by Lisbet Rausing and Peter Baldwin. (The couple are scholars as well as philanthropists, and their foundation is one of the major benefactors of the Wikimedia Foundation aka Wikipedia.)
Soon after OurResearch received that funding, Microsoft announced they were pulling the plug on MAG, and OurResearch sprang into action. Using MAG as the legacy content, they launched OpenAlex, named after the Library of Alexandria, in January 2022. It picked up where MAG left off, but excluded records on patents.
OpenAlex provides content for others to build services with, but they don’t have a user-friendly search service to use directly. That’s on their agenda, though. VOSviewer is available, which you can use to construct author and citation networks, etc. (It’s explained in a pair of blog posts.)
There’s a preprint on arXiv describing OpenAlex, by Jason Priem, Heather Piwowar, and Richard Orr from OurResearch. As of September 2022, they report they were adding about 50,000 records a day to OpenAlex, which contained 209 million records.
According to Priem and colleagues, OpenAlex is drawing new content from “many sources, including Crossref, PubMed, institutional and disciplinary repositories (eg, arXiv).” (Crossref is the organization that provides DOIs – identifiers – to academic journals.)
The OpenAlex team isn’t on Mastodon (please join!!!), but you can sign up for email updates here.
Although the OpenAlex search isn’t available yet, others can incorporate OpenAlex. I think the frontrunner here is Lens, an open source search engine which was launched for searching patents in 2000, but now includes scholarly works, too. It’s produced by Cambia, an Australian-based non-profit open science enterprise. They report that they have now normalized and indexed over 225 million scholarly works, more than 127 million patent records, and more than 370 million patent sequences.
(For systematic reviewers: EPPI-Reviewer uses OpenAlex to help keep reviews up-to-date.)
I’m quite hopeful that OpenAlex will deliver what I need to abandon GS before it abandons me. Fingers tightly crossed!
You can keep up with my work via my free newsletter, Living With Evidence.
Disclosure: I was a senior scientist at the NIH’s NCBI working on PubMed-related projects from 2011 to 2018. (NCBI is part of the U.S. National Library of Medicine.)
Search engines I considered, but excluded:
- Dimensions: A commercial service with a free online search engine, with 153 million records (including 100 million scholarly publications, plus grants and more). Uses machine-learning and full text searching. It was originally based on the GRID database.
- ResearchRabbit: I couldn’t find out much about this. I think it must have search, but I couldn’t find that either, and gave up.
- Scholar Archive: From the Internet Archive. Quite small – around 25 million papers which they have archived. It’s still under construction.
- Scinapse: A commercial service that currently only includes 48,000 journals. I couldn’t find much information about it.
- Semantic Scholar: A commercial service with a limited subject scope. According to the 2019 assessment, it included over 72 million records when GS had over 389 million.
Correction soon after posting: Removed a sentence from my description of Dimensions thanks to a pointer from Vincent Traag.