R package ropenaq

M. Salmon

2018-02-28

Introduction

This R package is aimed at accessing the openaq API. OpenAQ is a community of scientists, software developers, and lovers of open environmental data who are building an open, real-time database that provides programmatic and historical access to air quality data. See their website at https://openaq.org/ and see the API documentation at https://docs.openaq.org/. The package contains 5 functions that correspond to the 5 different types of query offered by the openaq API: cities, countries, latest, locations and measurements. The package uses the dplyr package: all output tables are data.frame (dplyr “tbl_df”) objects, that can be further processed and analysed.

What data can you get?

Via the API since November 2017 the API only provides access to the latest 90 days of OpenAQ data. The whole OpenAQ data can be accessed via Amazon S3. See this announcement. You can interact with Amazon S3 using the aws.s3 package and the maintainer of ropenaq plans to write tutorials about how to access OpenAQ data and will also keep the documentation of ropenaq up-to-date regarding data access changes.

Finding measurements availability

Three functions of the package allow to get lists of available information. Measurements are obtained from locations that are in cities that are in countries.

The aq_countries function

The aq_countries function allows to see for which countries information is available within the platform. It is the easiest function because it does not have any argument. The code for each country is its ISO 3166-1 alpha-2 code.

library("ropenaq")
countries_table <- aq_countries()
library("knitr")
kable(countries_table)
name code cities locations count
Andorra AD 2 3 16334
Argentina AR 1 4 14976
Australia AU 18 99 3421595
Austria AT 16 306 1521351
Bahrain BH 1 1 14724
Bangladesh BD 1 2 16523
Belgium BE 14 191 1280066
Bosnia and Herzegovina BA 8 17 715153
Brazil BR 72 119 2812094
Canada CA 11 165 2174471
Chile CL 138 113 4337618
China CN 21 74 547416
Colombia CO 1 1 15327
Croatia HR 16 49 260437
Czech Republic CZ 15 200 1344297
Denmark DK 7 25 187236
Ethiopia ET 1 2 21427
Finland FI 35 107 589734
France FR 134 1171 6739559
Germany DE 36 1026 7116902
Ghana GH 1 11 1595
Gibraltar GI 2 6 36093
Hong Kong HK 9 16 84882
Hungary HU 14 50 480425
India IN 62 171 7276087
Indonesia ID 2 3 37548
Ireland IE 11 26 90318
Israel IL 14 137 62212529
Italy IT 45 104 579467
Kosovo XK 1 1 14825
Kuwait KW 1 1 7251
Latvia LV 4 4 35919
Lithuania LT 8 17 100765
Luxembourg LU 3 7 73022
Macedonia, the Former Yugoslav Republic of MK 16 30 344919
Malta MT 4 4 46194
Mexico MX 5 95 1826197
Mongolia MN 25 40 2147392
Nepal NP 1 4 26313
Netherlands NL 68 112 5195292
Nigeria NG 1 1 2541
Norway NO 32 70 1155462
Peru PE 1 19 437326
Philippines PH 1 1 958
Poland PL 10 16 547921
Portugal PT 15 64 197578
Russian Federation RU 1 49 187117
Serbia RS 4 5 15817
Singapore SG 1 1 1275
Slovakia SK 8 38 385116
Slovenia SI 8 8 27183
South Africa ZA 1 11 189963
Spain ES 115 1066 8224521
Sri Lanka LK 1 1 2686
Sweden SE 3 13 203211
Switzerland CH 14 25 267750
Taiwan, Province of China TW 30 77 2938513
Thailand TH 33 64 2700014
Turkey TR 40 142 3899809
Uganda UG 1 1 7274
United Arab Emirates AE 1 1 1121
United Kingdom GB 112 162 5332239
United States US 747 1946 28129949
Viet Nam VN 2 3 34342
attr(countries_table, "meta")
#> # A tibble: 1 x 6
#>   name       license   website                   page limit found
#>   <fct>      <fct>     <fct>                    <int> <int> <int>
#> 1 openaq-api CC BY 4.0 https://docs.openaq.org/     1 10000    64
attr(countries_table, "timestamp")
#> # A tibble: 1 x 1
#>   queriedAt          
#>   <dttm>             
#> 1 2018-02-28 19:18:02

The aq_cities function

Using the aq_cities functions one can get all cities for which information is available within the platform. For each city, one gets the number of locations and the count of measures for the city, the URL encoded string, and the country it is in.

cities_table <- aq_cities()
kable(head(cities_table))
city country locations count cityURL
Escaldes-Engordany AD 2 16020 Escaldes-Engordany
unused AD 1 314 unused
Abu Dhabi AE 1 1121 Abu+Dhabi
Buenos Aires AR 4 14976 Buenos+Aires
Amt der Niedersterreichischen Landesregierung AT 39 322499 Amt+der+Nieder%EF%BF%BDsterreichischen+Landesregierung
Amt der Steiermrkischen Landesregierung AT 41 320372 Amt+der+Steierm%EF%BF%BDrkischen+Landesregierung

The optional country argument allows to do this for a given country instead of the whole world.

cities_tableIndia <- aq_cities(country="IN", limit = 10, page = 1)
kable(cities_tableIndia)
city country locations count cityURL
Mandideep IN 1 13247 Mandideep
Navi Mumbai IN 1 9007 Navi+Mumbai
Delhi IN 35 1181980 Delhi
Bengaluru IN 8 387593 Bengaluru
Kanpur IN 2 163382 Kanpur
Howrah IN 4 54156 Howrah
Hyderabad IN 15 486471 Hyderabad
Dhanbad IN 1 3 Dhanbad
Asansol IN 2 7113 Asansol
Chandrapur IN 2 239419 Chandrapur

If one inputs a country that is not in the platform (or misspells a code), then an error message is thrown.

#aq_cities(country="PANEM")

The aq_locations function

The aq_locations function has far more arguments than the first two functions. On can filter locations in a given country, city, location, for a given parameter (valid values are “pm25”, “pm10”, “so2”, “no2”, “o3”, “co” and “bc”), from a given date and/or up to a given date, for values between a minimum and a maximum, for a given circle outside a central point by the use of the latitude, longitude and radius arguments. In the output table one also gets URL encoded strings for the city and the location. Below are several examples.

Here we only look for locations with PM2.5 information in Chennai, India.

locations_chennai <- aq_locations(country = "IN", city = "Chennai", parameter = "pm25")
kable(locations_chennai)
location city country count sourceNames lastUpdated firstUpdated distance sourceName latitude longitude pm25 pm10 no2 so2 o3 co bc cityURL locationURL
Alandur Bus Depot Chennai IN 13224 CPCB 1519271100 1487450700 13360023 CPCB 12.99711 80.19151 TRUE FALSE FALSE FALSE FALSE FALSE FALSE Chennai Alandur+Bus+Depot
IIT Chennai IN 16204 CPCB 1519271100 1487442600 13362255 CPCB 12.99251 80.23745 TRUE FALSE FALSE FALSE FALSE FALSE FALSE Chennai IIT
Manali Chennai IN 19515 CPCB 1519271100 1487452500 13345307 CPCB 13.16454 80.26285 TRUE FALSE FALSE FALSE FALSE FALSE FALSE Chennai Manali
US Diplomatic Post: Chennai Chennai IN 17530 StateAir_Chennai 1519839000 1449869400 13353890 StateAir_Chennai 13.08784 80.27847 TRUE FALSE FALSE FALSE FALSE FALSE FALSE Chennai US+Diplomatic+Post%3A+Chennai

Getting measurements

Two functions allow to get data: aq_measurement and aq_latest. In both of them the arguments city and location needs to be given as URL encoded strings.

The aq_measurements function

The aq_measurements function has many arguments for getting a query specific to, say, a given parameter in a given location or for a given circle outside a central point by the use of the latitude, longitude and radius arguments. Below we get the PM2.5 measures for Delhi in India.

results_table <- aq_measurements(country = "IN", city = "Delhi", parameter = "pm25", limit = 10, page = 1)
kable(results_table)
location parameter value unit country city latitude longitude dateUTC dateLocal cityURL locationURL
US Diplomatic Post: New Delhi pm25 108 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 18:30:00 2018-03-01 00:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 102 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 17:30:00 2018-02-28 23:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 88 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 16:30:00 2018-02-28 22:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 74 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 15:30:00 2018-02-28 21:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 69 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 14:30:00 2018-02-28 20:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 68 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 13:30:00 2018-02-28 19:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 62 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 12:30:00 2018-02-28 18:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 75 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 11:30:00 2018-02-28 17:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 93 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 10:30:00 2018-02-28 16:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi
US Diplomatic Post: New Delhi pm25 96 µg/m³ IN Delhi 28.63576 77.22445 2018-02-28 09:30:00 2018-02-28 15:00:00 Delhi US+Diplomatic+Post%3A+New+Delhi

One could also get all possible parameters in the same table.

The aq_latest function

This function gives a table with all newest measures for the locations that are chosen by the arguments. If all arguments are NULL, it gives all the newest measures for all locations. Below are the latest values for Hyderabad at the time this vignette was compiled.

tableLatest <- aq_latest(country="IN", city="Hyderabad")
kable(head(tableLatest))
location city country distance latitude longitude parameter value lastUpdated unit sourceName averagingPeriod_value averagingPeriod_unit cityURL locationURL
Bollaram Industrial Area Hyderabad IN NA NA NA co 420.0 2017-02-17 05:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area
Bollaram Industrial Area Hyderabad IN NA NA NA pm25 55.0 2017-02-17 05:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area
Bollaram Industrial Area Hyderabad IN NA NA NA pm10 137.0 2017-02-17 05:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area
Bollaram Industrial Area Hyderabad IN NA NA NA no2 16.2 2017-02-17 05:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area
Bollaram Industrial Area Hyderabad IN NA NA NA so2 16.8 2017-02-17 05:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area
Bollaram Industrial Area, Hyderabad - TSPCB Hyderabad IN NA NA NA co 710.0 2018-02-22 03:15:00 µg/m³ CPCB 0.25 hours Hyderabad Bollaram+Industrial+Area%2C+Hyderabad+-+TSPCB

Paging and limit

For all endpoints/functions, there a a limit and a page arguments, which indicate, respectively, how many results per page should be shown and which page should be queried. If you don’t enter the parameters by default all results for the query will be retrieved with async requests, but it might take a while nonetheless depending on the total number of results.

aq_measurements(city = "Delhi", parameter = "pm25")

Rate limiting

In October 2017 the API introduced a rate limit of 2,000 requests every 5 minutes. Please keep this in mind. In the case when the request receives a response status of 429 (too many requests), the package will wait 5 minutes.

Other packages of interest for getting air quality data