The following is a quick analysis of the top organizations patenting in the field of databases.

  1. The first step is to download the relevant data fields from the PatentsView API:
  1. Now let’s identify who the top assignees are based on how many patents they have in our data set. We’ll also calculate how many total patents these assignees have and what fraction of their total patents relate to databases.
# Unnest the data frames that are stored in the assignee list column
dl <- unnest_pv_data(pv_out$data, "patent_number")
dl
#> List of 3
#>  $ assignees   :'data.frame':    56197 obs. of  4 variables:
#>   ..$ patent_number             : chr [1:56197] "10000911" ...
#>   ..$ assignee_organization     : chr [1:56197] "Doosan Infacore Co., Lt"..
#>   ..$ assignee_total_num_patents: chr [1:56197] "149" ...
#>   ..$ assignee_key_id           : chr [1:56197] "175657" ...
#>  $ applications:'data.frame':    55318 obs. of  3 variables:
#>   ..$ patent_number: chr [1:55318] "10000911" ...
#>   ..$ app_date     : chr [1:55318] "2014-12-05" ...
#>   ..$ app_id       : chr [1:55318] "15/101707" ...
#>  $ patents     :'data.frame':    55318 obs. of  3 variables:
#>   ..$ patent_number                 : chr [1:55318] "10000911" ...
#>   ..$ patent_num_cited_by_us_patents: chr [1:55318] "0" ...
#>   ..$ patent_date                   : chr [1:55318] "2018-06-19" ...

# Create a data frame with the top 75 assignees:
top_asgns <-
  dl$assignees %>%
    filter(!is.na(assignee_organization)) %>% # some patents are assigned to an inventor (not an org)
    mutate(ttl_pats = as.numeric(assignee_total_num_patents)) %>%
    group_by(assignee_organization, ttl_pats) %>% 
    summarise(db_pats = n()) %>% 
    mutate(frac_db_pats = round(db_pats / ttl_pats, 3)) %>%
    ungroup() %>%
    select(c(1, 3, 2, 4)) %>%
    arrange(desc(db_pats)) %>%
    slice(1:75)

# Create datatable
datatable(
  data = top_asgns,
  rownames = FALSE,
  colnames = c(
    "Assignee", "DB patents","Total patents", "DB patents / total patents"
  ),
  caption = htmltools::tags$caption(
    style = 'caption-side: top; text-align: left; font-style: italic;',
    "Table 1: Top assignees in 'databases'"
  ),
  options = list(pageLength = 10)
)


IBM is far and away the biggest player in the field. However, we can see that Oracle and Salesforce.com are relatively more interested in this area, as indicated by the fraction of their patents that relate to databases.

  1. Let’s see how these assignees’ level of investment in databases has changed over time.

It’s hard to see any clear trends in this graph. What is clear is that the top assignees have all been patenting in the field for many years.

  1. Finally, let’s see how the organizations compare in terms of their citation rates. First, we’ll need to normalize the raw citation counts by publication year, so that older patents don’t have an unfair advantage over younger patents (i.e., because they have had a longer time to accumulate citations).
assignee_organization mean_perc db_pats ttl_pats frac_db_pats color
International Business Machines of Corporation 0.4760141 5355 123668 0.043 #f1c40f
Samsung Electronics Co., Lgd. 0.4392502 313 81221 0.004 #f1c40f
Canon Kabushiki Kaiaha 0.4384788 199 69889 0.003 #f1c40f
Sony Corp. 0.4400093 386 49538 0.008 #f1c40f
Kabushiki Kaisha Toshira 0.4308881 214 47323 0.005 #f1c40f
Hitachi Metels, Ltd. 0.4709231 521 43296 0.012 #f1c40f

Now let’s visualize the data. Each assignee will be represented by a point/bubble. The x-value of the point will represent the total number of patents the assignee has published in the field of databases (on a log scale), while the y-value will represent its average normalized citation rate. The size of the bubble will be proportional to the percent of the assignee’s patents that relate to databases.


It looks like Microsoft has relatively high values across all three three metrics (average citation percentile, number of database patents, and percent of total patents that are related to databases). IBM has more patents than Microsoft, but also has a lower average citation percentile.