Some work entities in OpenAlex include N-grams (word sequences and their frequencies) of their full text. The N-grams are obtained from Internet Archive, which uses the spaCy parser to index scholarly works. See <https://docs.openalex.org/api-entities/works/get-n-grams> for coverage and more technical details.
Arguments
- works_identifier
Character. OpenAlex ID(s) of "works" entities as item identifier(s). These IDs start with "W". See more at <https://docs.openalex.org/api-entities/works#id>.
- ...
Unused.
- endpoint
Character. URL of the OpenAlex Endpoint API server. Defaults to endpoint = "https://api.openalex.org".
- verbose
Logical. If TRUE, print information on querying process. Default to
verbose = FALSE
.
Note
A faster implementation is available for `curl` >= v5.0.0, and `oa_ngrams` will issue a one-time message about this. This can be suppressed with `options("oa_ngrams.message.curlv5" = FALSE)`.
Examples
if (FALSE) {
ngrams_data <- oa_ngrams(c("W1963991285", "W1964141474"))
library(dplyr)
first_paper_ngrams <- ngrams_data$ngrams[[1]]
top_10_ngrams <- first_paper_ngrams %>%
slice_max(ngram_count, n = 10, with_ties = FALSE)
# Missing N-grams are `NULL` in the `ngrams` list-column
oa_ngrams("https://openalex.org/W2284876136")
}