Skip to contents

This vignette explores advanced uses of the npi package.

npi is an R package that allows R users to access the U.S. National Provider Identifier (NPI) Registry API by the Center for Medicare and Medicaid Services (CMS). The package makes it easy to obtain administrative data linked to a specific individual or organizational healthcare provider. Additionally, users can perform advanced searches based on provider name, location, type of service, credentials, and many other attributes.

See the npi::npi vignette for an introduction to the package.

Note on NPI Downloadable Files

CMS regularly releases full NPI data files here. We recommend that users download the data file if they need to work with the entire dataset. The API and npi_search() returns a maximum of 1,200 records. Also consider downloading the entire data if you need to work with more than the maximum. Data dissemination files are zipped and will exceed 4GB upon decompression.

Run npi_search() on multiple search terms

npi_search() enables search for a defined set query parameters. The function is not designed for search on multiple values of the same argument at once, as for example in the case of multiple NPI numbers in a single function call. However, users can still serially execute searches for multiple values of a single query parameter by using npi in combination with the purrr package. In the example below, we search multiple NPI numbers. A single tibble is returned with record information corresponding to matching records. The purrr:map() function is used to apply the npi_search() function on each element of the vector. Thereafter, the dplyr::bind_rows() function is used to combine the list of dataframes together into a single dataframe.

npis <- c(1992708929, 1831192848, 1699778688, 1111111111)  # Last element doesn't exist

out <- npis %>% 
  purrr::map(., ~ npi_search(number = .)) %>% 
  dplyr::bind_rows()
#> 10 records requested
#> Requesting records 0-10...
#> 10 records requested
#> Requesting records 0-10...
#> 10 records requested
#> Requesting records 0-10...
#> 10 records requested
#> Requesting records 0-10...

npi_summarize(out)
#> # A tibble: 3 × 6
#>   npi        name                            enumeration…¹ prima…² phone prima…³
#>   <chr>      <chr>                           <chr>         <chr>   <chr> <chr>  
#> 1 1992708929 NOVAMED MANAGEMENT SERVICES LLC Organization  3200 D… 404-… Dentis…
#> 2 1831192848 MATTHEW JAFFE                   Individual    3672 M… 770-… Orthop…
#> 3 1699778688 STEVEN PARNES                   Individual    NA      770-… Clinic…
#> # … with abbreviated variable names ¹​enumeration_type,
#> #   ²​primary_practice_address, ³​primary_taxonomy

Here we search for multiple zip codes in Los Angeles County.

codes <- c(90210, 90211, 90212)

zip_3 <- codes %>% 
  purrr::map(., ~ npi_search(postal_code  = .)) %>% 
  dplyr::bind_rows() 
#> 10 records requested
#> Requesting records 0-10...
#> 10 records requested
#> Requesting records 0-10...
#> 10 records requested
#> Requesting records 0-10...

npi_flatten(zip_3)
#> # A tibble: 86 × 47
#>    npi   basic…¹ basic…² basic…³ basic…⁴ basic…⁵ basic…⁶ basic…⁷ basic…⁸ basic…⁹
#>    <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 1063… RICHARD ROGAL   A       PhD     NO      M       2006-0… 2007-0… A      
#>  2 1063… RICHARD ROGAL   A       PhD     NO      M       2006-0… 2007-0… A      
#>  3 1063… RICHARD ROGAL   A       PhD     NO      M       2006-0… 2007-0… A      
#>  4 1063… RICHARD ROGAL   A       PhD     NO      M       2006-0… 2007-0… A      
#>  5 1063… MINOO   MAHMOU… NA      MD      YES     F       2019-0… 2019-0… A      
#>  6 1063… MINOO   MAHMOU… NA      MD      YES     F       2019-0… 2019-0… A      
#>  7 1093… FRED    EMMANU… FERAYD… DDS     NO      M       2007-0… 2007-0… A      
#>  8 1093… FRED    EMMANU… FERAYD… DDS     NO      M       2007-0… 2007-0… A      
#>  9 1104… GILBERT KWONG   NA      D.D.S.  YES     M       2016-0… 2016-0… A      
#> 10 1104… GILBERT KWONG   NA      D.D.S.  YES     M       2016-0… 2016-0… A      
#> # … with 76 more rows, 37 more variables: basic_name_prefix <chr>,
#> #   basic_name_suffix <chr>, basic_organization_name <chr>,
#> #   basic_organizational_subpart <chr>,
#> #   basic_authorized_official_first_name <chr>,
#> #   basic_authorized_official_last_name <chr>,
#> #   basic_authorized_official_telephone_number <chr>,
#> #   basic_authorized_official_title_or_position <chr>, …

Consult the R for Data Science chapter on iteration to learn more about using the purrr package.

Alternatively, you can use a simple for loop instead if you are unfamiliar with the tidyverse approach.

npis <- c(1992708929, 1831192848, 1699778688, 1111111111)  # Last element doesn't exist
combined_df  <- data.frame()
for (i in npis) {
  combined_df <- rbind(combined_df, npi_search(number = i))
}
#> 10 records requested
#> Requesting records 0-10...
#> 10 records requested
#> Requesting records 0-10...
#> 10 records requested
#> Requesting records 0-10...
#> 10 records requested
#> Requesting records 0-10...

npi_summarize(combined_df)
#> # A tibble: 3 × 6
#>   npi        name                            enumeration…¹ prima…² phone prima…³
#>   <chr>      <chr>                           <chr>         <chr>   <chr> <chr>  
#> 1 1992708929 NOVAMED MANAGEMENT SERVICES LLC Organization  3200 D… 404-… Dentis…
#> 2 1831192848 MATTHEW JAFFE                   Individual    3672 M… 770-… Orthop…
#> 3 1699778688 STEVEN PARNES                   Individual    NA      770-… Clinic…
#> # … with abbreviated variable names ¹​enumeration_type,
#> #   ²​primary_practice_address, ³​primary_taxonomy