Deep Dive into Three AI Academic Search Tools
Introduction
AI-powered research assistants are transforming academic search — but how effective are they? A researcher tested Primo Research Assistant, Web of Science Research Assistant, and Scopus AI to find out how they really perform.
Product Overview
Primo Discovery Service (Ex Libris/Clarivate) serves as a key search platform for many academic libraries, indexing a wide range of scholarly content. Ex Libris also offers Summon — a similar service — and recently introduced the Summon Research Assistant in March 2025. While the two AI tools appear similar, this review focuses on Primo Research Assistant.
Scopus (Elsevier) and Web of Science (Clarivate) are leading citation databases specializing in scholarly articles.
Primo Research Assistant, Web of Science Research Assistant, and Scopus AI all use retrieval-augmented generation (RAG) to deliver direct, summarized answers to user queries. This process involves retrieving relevant documents and generating concise responses with a large language model (LLM).
While each platform has its own implementation, they share the same foundation — supporting natural language queries and ranking results based on relevance. The key question remains: how effectively do they retrieve and interpret information using this AI-driven approach?
Retrieval and Generation Mechanics
Primo, Web of Science, and Scopus AI all use LLMs to assist with literature discovery, but their retrieval and generation processes differ significantly.
The Primo Research Assistant generates multiple paraphrased versions of the original query, connecting them with OR. It then searches the entire Central Discovery Index (CDI), reranks the top results using vector embeddings, and uses an LLM to synthesize an answer from the top five. A key limitation is that results may include items not held by the user’s institution. It accepts non-English queries and responds in the input language.
The Web of Science Research Assistant uses a “search block” approach. It identifies core concepts in the query, generates synonyms for each, and combines these blocks with AND. It searches only the content the user is entitled to and uses the platform’s default ranking without additional reranking to generate an answer from the top eight results. Like Primo, it accepts and responds in non-English languages.
Scopus AI employs a more adaptive and complex strategy. A “copilot” first interprets the query to decide between a vector search, a keyword search, or a hybrid of both. It specifically uses a technique called RAG Fusion, which generates multiple related queries and fuses their results for a comprehensive retrieval. It offers both a concise summary (up to 10 references) and a broader, in-depth expanded summary (up to 30 references). While it accepts non-English queries, its responses are provided only in English.
Interpretability and Reproducibility of Searches
While Primo, Scopus AI, and Web of Science all support natural language queries, their underlying approaches lead to significant differences in how transparent and repeatable their searches are.
The Web of Science Research Assistant offers high interpretability. It clearly displays the generated Boolean query, and its results align with the platform’s standard, understandable relevance ranking. This allows users to validate the process. Reproducibility is medium, as the LLM generates a slightly different query in about one out of five subsequent identical searches.
The Primo Research Assistant provides medium interpretability. The initial Boolean query is visible, but the subsequent reranking of results using vector embeddings is an opaque “black box.” Its reproducibility is also medium, with similar LLM variability affecting the search strategy about 20% of the time.
Scopus AI has low interpretability and reproducibility. Although a Boolean query is shown, its hybrid search (combining keyword and vector search) makes it difficult to understand why specific results are returned. Reproducibility is the lowest of the three, with inconsistent results occurring about half the time due to LLM variability and the inherent randomness in vector search methods.
Boolean Search Strategy
The tools employ distinct strategies for converting a natural language query into a Boolean search.
Web of Science uses a structured “block search.” It identifies key concepts, generates synonyms for each, and connects these conceptual blocks with AND. For example, a query is broken into blocks like (large language models OR llms…) AND (information retrieval OR search engines…).
Scopus AI also uses a block approach but tends toward significant query expansion, often adding related concepts not explicitly mentioned by the user, such as (“natural language processing” OR nlp…) and (“machine learning” OR ml…).
Primo generates multiple paraphrased versions of the full query, combining them with OR into a single, broad search string like (large language models information retrieval) OR (LLMs in search engines) OR….
In general, all three tools tend to over-expand queries, with Web of Science often being the most liberal. However, both Primo and Scopus AI apply post-search reranking (using vector embeddings and reciprocal rank fusion, respectively) to improve the relevance of their top results. Web of Science relies solely on its initial Boolean search and default ranking, which—while highly interpretable—can sometimes result in lower relevance compared to the reranked results from the other tools.
User Experience
Each tool is accessed as a separate feature from its platform’s main search and is not set as a default.
The Primo Research Assistant features a simple, single-search-bar interface, accessible from the main menu or a floating widget. After a query, the AI-generated answer is displayed with its top five source documents listed prominently above it. A key feature is a link that lets users run the underlying Boolean search in Primo’s standard interface to see all results, as the top five may differ due to reranking. It conveniently tracks up to 200 past searches in a collapsible history panel.
Scopus AI also offers a clean, straightforward interface similar to Primo’s. Clicking on a citation opens a details panel without navigating away from the answer. While summaries can include basic tables, the tool lacks the advanced table-building synthesis of some competitors. Its integration with the wider Scopus platform is currently limited, with no saved AI search history or alerts and restricted export options for AI-generated content.
The Web of Science Research Assistant has a more complex interface. Beyond a main search bar, it provides three guided task options: “Understand a Topic,” “Literature Review,” and “Find a Journal.” The system adaptively interprets queries, which can sometimes lead to confusion; a search for a concept might trigger a guided task to show seminal papers instead of generating a direct, summarized answer.
Overall Comparison
Feature |
Primo Research Assistant (launched September 2024) |
Scopus AI (launched January 2024) |
Web of Science Research Assistant (launched September 2024) |
Index Used | Central Discovery Index (CDI) with exceptions (e.g. news content, aggregator collections, and selected content owner opt-outs) (Ex Libris, 2024) | Scopus index since 2003. Metadata and abstracts (articles, books, reviews, chapters, proceedings, etc.) (Elsevier, 2024b) | Web of Science Core Collection (user entitled holdings) |
Content Used for RAG. | CDI metadata and abstracts. No full text. | Scopus metadata and abstracts (from 2003 onward for summary generation). No full text. | Web of Science metadata and abstracts. No full text. |
Retrieval Process | LLM generates 10 keyword variants to add to original query (ORed). Top 30 relevant results reranked via vector embedding(Ex Libris, 2024). | Copilot uses vector embedding search and/or keyword search (LLM-generated Boolean). Hybrid approach.
Uses RAG Fusion (Elsevier, 2024a; Rackauckas, 2024). |
LLM generates keyword Boolean search strategy. |
RAG-Generated Answer or Summary | Yes, answer cites up to top 5 results. | Yes, answer cites up to 10 results. Expanded summary cites up to 30 results. | Yes, answer cites up to top 8 results. |
Prefilters/Natural Language Parsing | Filter by pub types (journal articles, peer-reviewed, books), publication years
e.g., “Give me peer reviewed articles about large language models from 2020 –2024” |
Lacks pre-filters. No explicit support for metadata field parsing in queries. | Many filters available. Parses many metadata fields from natural language queries including DOI, author, year, journal, institution, country, etc. |
Interpretability/Explainability of Search | Medium. Boolean query used is shown, but reranking step makes search opaque. | Low. Boolean query shown (if used), but vector search component lacks transparency. | High. Boolean search used is shown; results align with standard Web of Science search. |
Reproducibility of Search | Medium. ~1 in 5 tries yields different Boolean strategy and top 5 results.. | Very Low. ~1 in 2 tries yields different Boolean strategy and top 10 results. | Medium. ~1 in 5 tries yields different Boolean strategy and top 8 results. |
Non-English Language Support | Non-English input accepted; output matches input language. | Non-English input accepted; output is English only. | Non-English input accepted; output matches input language. |
Copyright 2025 Annual Reviews. All Rights Reserved.
Contact us via email for more information.