Bibliometrix & WoS

Today in the lecture¹ I walked you through a bibliometric analysis workflow using Web of Science and Bibliometrix. Several of you asked for written instructions afterwards, so here they are. This guide covers installing R, installing the package, pulling data from WoS, merging exports without doing it by hand, and loading everything into the graphical interface.

I will also suggest which analyses are worth running depending on how deep you want to go.

Installing R

R is a free, open-source programming language for statistical computing. You do not need to know how to code to use Bibliometrix, but you do need R installed.

Windows

Go to the CRAN website and download the latest installer. Run it and follow the defaults. You do not need to customise anything.

Optionally, also install RStudio, which gives you a much friendlier interface for running R code.

Ubuntu

The version of R in Ubuntu’s default repositories tends to lag behind. To get the latest release, add the CRAN repository first:

sudo apt install --no-install-recommends software-properties-common dirmngr

wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc \
  | sudo tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc

sudo add-apt-repository \
  "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"

sudo apt update
sudo apt install --no-install-recommends r-base

Fedora

sudo dnf install R

Fedora’s repos are more up to date than Ubuntu’s, so you do not need the extra steps. That said, installing R packages from source takes time because they need to compile. If you want precompiled binaries — much faster — enable the iucar COPR repository first:

sudo dnf install 'dnf-command(copr)'
sudo dnf copr enable iucar/cran
sudo dnf install R-CoprManager

After that, R packages you install via install.packages() will pull precompiled binaries from that repo automatically.

macOS

Go to the CRAN page for macOS and download the package matching your chip (Apple Silicon or Intel). Install it like any other app.

If you use Homebrew: brew install r also works.

As with Windows, I recommend also installing RStudio.

Installing Bibliometrix

Once R is running, open a terminal (on Linux) or the RStudio console (on Windows/macOS) and run:

install.packages("bibliometrix")

This will install the package and all its dependencies. It takes a few minutes. You only need to do this once.

Launching Biblioshiny

Biblioshiny is the graphical interface bundled with Bibliometrix. It runs as a local web app in your browser. To launch it:

library(bibliometrix)
biblioshiny()

Your browser will open automatically with the interface. Do not close the R session while you are using it since the browser tab is just a front end; R is doing all the work in the background.

Getting Data from Web of Science

Access

As Uppsala University students you have access to Web of Science through the library. Go to the Uppsala University Library and search for Web of Science in the database list, or access it directly through the library’s database portal.

Searching with the Query Builder

The Advanced Search or Query Builder interface is what you want. It lets you combine field-specific searches precisely. The most common field tags are:

TI= title
AB= abstract
AK= author keywords
KP= Keywords Plus (generated by WoS)
SO= publication source (journal name)
AU= author

A typical query might look like:

TI=(bibliometri* OR scientometri*) AND AK=(research evaluation)

Use parentheses and AND, OR, NOT to build your query. The Query Builder will let you combine fields visually if you are not comfortable writing the query directly.

Exporting the Records

Once you have your results, click Export and select Plain Text File. In the options:

Set the record content to Full Record and Cited References. Do not export just the basic fields or you will lose citation data and keywords that are essential for most analyses.
WoS limits exports to 500 records per batch. If your search returns more than 500 results, you need to export in chunks: records 1–500, then 501–1000, and so on.

By default the files will be called savedrecs.txt. Rename them as you go — savedrecs_1.txt, savedrecs_2.txt, etc. — and keep them all in the same folder.

Merging the Exports in R

If you only exported one batch, you can load it directly. If you have several files, use R to concatenate them into a single .txt before loading into Biblioshiny. Put all your exported files in a folder called data, then run:

# List all your exported txt files
files <- list.files(path = "data", pattern = "\\.txt$", full.names = TRUE)

# Read and merge them into one
merged_lines <- unlist(lapply(files, readLines, encoding = "UTF-8"))
writeLines(merged_lines, "data/merged.txt")

You will end up with a single merged.txt in your data folder. That is the file you will load into Biblioshiny.

Loading the Data into Biblioshiny

Launch Biblioshiny as shown above, then in the left panel go to Load Data. Select Web of Science as the database and plaintext as the format. Upload the merged.txt file you created in the previous step.

If you only had one export file to begin with, upload it directly — no merging needed.

A Word on Data Cleaning

Before running any analysis, it is worth checking the quality of what you loaded. Bibliometrix gives you a summary when you first import a dataset. Pay attention to missing fields. If a large share of records have no keywords, keyword analyses will be unreliable. If many have no cited references, co-citation analysis is off the table.

Two things commonly require manual attention:

Author name variants. The same researcher may appear as “Smith J”, “Smith, John”, and “J. Smith” depending on how journals submitted the data. Biblioshiny has a tool under Data Filtering to merge these manually, but it takes time and judgement.
Keyword normalisation. “Machine learning”, “machine-learning”, and “ML” will appear as three separate nodes in a co-occurrence network unless you merge them. There is no automated fix; you have to decide which synonyms matter for your topic and collapse them by hand before running network analyses.

Neither of these is optional if you plan to publish the results. For an exploratory review, the raw data is usually good enough to get orientation.

What Analyses to Run

For a first exploration

These give you a quick overview of the field without needing to interpret complex networks:

Annual Scientific Production: How many papers per year. Immediately tells you whether the field is growing, plateauing, or declining.
Most Cited Documents: The foundational papers. If a paper has ten times more citations than the rest, start there.
Most Productive Sources: Which journals dominate. Useful for knowing where to focus your reading and where to submit your own work.
Top Author Keywords: What concepts the authors themselves label their work with. A quick scan of the field’s vocabulary.
Bradford’s Law / Core Journals: Identifies the small set of journals that produce the bulk of the literature on a topic.

Run these first. Together they give you a reasonable summary of the area’s general boundaries.

For a more detailed exploration

Once you have the basic picture, these analyses go deeper:

Keyword Co-occurrence Network: Shows which concepts appear together systematically. The clusters in the network often correspond to distinct research substreams. This is the analysis I find most useful for mapping a field.
Thematic Map: Classifies topics by how central and developed they are. Motor themes (high centrality, high density) are the established core; niche themes and emerging topics appear in different quadrants.
Co-authorship Network: Reveals collaboration patterns. Are there isolated national clusters? A few central hubs connecting most of the field? Useful if you are looking for collaborators or understanding the field’s social structure.
Co-citation Analysis: Groups papers that are cited together frequently. This surfaces the intellectual foundations of different schools of thought, even if those papers are not directly related.
Bibliographic Coupling: The complement to co-citation. Groups papers that share the same references. Useful for identifying clusters of recent work building on the same foundations.
Thematic Evolution: Tracks how topic clusters have shifted across time periods you define. If you want to argue that a field has undergone a paradigm shift, this is your evidence.

You do not need to run all of these. For most literature reviews, the keyword co-occurrence network and the thematic map will tell you most of what you need. Add co-citation or bibliographic coupling if your review has a strong intellectual history component.

If something breaks or the interface behaves unexpectedly, the Bibliometrix documentation is well maintained and covers most common issues.

This blog post is written for the students of the course “Methods” in the master programme in Sociology of Education at Uppsala University. ↩︎