Background

Erroneous database entries and problematic geographic coordinates are a central issue in biogeography and there is a set of tools available to address different dimensions of the problem. CoordinateCleaner focuses on the fast and reproducible flagging of large amounts of records, and additional functions to detect dataset-level and fossil-specific biases. In the R-environment the scrubr an biogeo offer cleaning approaches complementary to CoordinateCleaner. scrubr combines basic geographic cleaning (comparable to cc_dupl, cc_zero and cc_count in CoordinateCleaner) but adds options to clean taxonomic names (See also taxize) and date information. biogeo includes some basic automated geographic cleaning (similar to cc_val, cc_count and cc_outl) but rather focusses on correcting suspicious coordinates on a manual basis using environmental information.

Table 1. Function by function comparison of CoordinateCleaner, scrubr and biogeo.

Functionality CoordinateCleaner2.0-1 scrubr 0.1.1 biogeo 1.0 Percent overlap
Missing coordinates cc_val coord_incomplete missingvalsexclude 100%
Coordinates outside CRS cc_val coord_impossible - 100%
Duplicated records cc_dupl dedup duplicatesexclude The aim is identical, methods differ
0/0 coordinates cc_zero coord_unlikely - 100%
Identical lon/lat cc_equ - - 0%
Country capitals cc_cap - - 0%
Political unit centroids cc_cen - - 0%
Coordinates in-congruent with additional location information cc_count coord_within errorcheck, quickclean 100%
Coordinates assigned to GBIF headquaters cc_gbif - - 0%
Coordinates assigned to the location of biodiversity institutions cc_inst - - 0%
Coordinates outside natural range cc_iucn - - 0%
Spatial outliers cc_outl - outliers 50%, biogeo uses environmental distance
Coordinates within the ocean cc_sea - - 0%
Coordinates in urban area cc_urb - - 0%
Coordinate conversion error dc_ddmm - - 0%
Rounded coordinates/rasterized collection dc_round - precisioncheck 20%, biogeo test for predefined rasaters
Fossils: invalid age range tc_equal - - 0%
Fossils: excessive age range tc_range - - 0%
Fossils: temporal outlier tc_outl - - 0%
Fossils: PyRate interface WritePyrate - - 0%
Wrapper functions to run all test CleanCoordinates, CleanCoordinatesDS, CleanCoordiantesFOS - - 0%
Database of biodiversity institutions institutions - - 0%
Taxonomic cleaning - tax_no_epithet - 0%
Missing date - date_missing - 0%
Add date - date_create - 0%
Date format - date_standardize - 0%
Reformatting coordinate annotation - - a large set of functions 0 %
Correcting coordinates using guessing and environmental distance - - a large set of functions 0 %