Removes or flags duplicated records based on species name and coordinates, as well as user-defined additional columns. True (specimen) duplicates or duplicates from the same species can make up the bulk of records in a biological collection database, but are undesirable for many analyses. Both can be flagged with this function, the former given enough additional information.
cc_dupl(
x,
lon = "decimalLongitude",
lat = "decimalLatitude",
species = "species",
additions = NULL,
value = "clean",
verbose = TRUE
)
data.frame. Containing geographical coordinates and species names.
character string. The column with the longitude coordinates. Default = “decimalLongitude”.
character string. The column with the latitude coordinates. Default = “decimalLatitude”.
a character string. The column with the species name. Default = “species”.
a vector of character strings. Additional columns to be included in the test for duplication. For example as below, collector name and collector number.
character string. Defining the output value. See value.
logical. If TRUE reports the name of the test and the number of records flagged.
Depending on the ‘value’ argument, either a data.frame
containing the records considered correct by the test (“clean”) or a logical vector (“flagged”), with TRUE = test passed and FALSE = test failed/potentially problematic . Default = “clean”.
x <- data.frame(species = letters[1:10],
decimalLongitude = sample(x = 0:10, size = 100, replace = TRUE),
decimalLatitude = sample(x = 0:10, size = 100, replace = TRUE),
collector = "Bonpl",
collector.number = c(1001, 354),
collection = rep(c("K", "WAG","FR", "P", "S"), 20))
cc_dupl(x, value = "flagged")
#> Testing duplicates
#> Flagged 5 records.
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [37] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [49] TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
#> [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [73] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
#> [85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [97] TRUE TRUE FALSE TRUE
cc_dupl(x, additions = c("collector", "collector.number"))
#> Testing duplicates
#> Removed 5 records.
#> species decimalLongitude decimalLatitude collector collector.number
#> 1 a 9 5 Bonpl 1001
#> 2 b 7 2 Bonpl 354
#> 3 c 10 4 Bonpl 1001
#> 4 d 4 7 Bonpl 354
#> 5 e 6 10 Bonpl 1001
#> 6 f 10 0 Bonpl 354
#> 7 g 10 4 Bonpl 1001
#> 8 h 8 0 Bonpl 354
#> 9 i 2 4 Bonpl 1001
#> 10 j 3 3 Bonpl 354
#> 11 a 2 4 Bonpl 1001
#> 12 b 4 6 Bonpl 354
#> 13 c 10 9 Bonpl 1001
#> 14 d 10 8 Bonpl 354
#> 15 e 2 3 Bonpl 1001
#> 16 f 7 10 Bonpl 354
#> 17 g 1 3 Bonpl 1001
#> 18 h 3 8 Bonpl 354
#> 19 i 7 6 Bonpl 1001
#> 20 j 0 7 Bonpl 354
#> 21 a 7 3 Bonpl 1001
#> 22 b 9 3 Bonpl 354
#> 23 c 7 3 Bonpl 1001
#> 24 d 3 9 Bonpl 354
#> 25 e 8 8 Bonpl 1001
#> 26 f 6 8 Bonpl 354
#> 27 g 8 6 Bonpl 1001
#> 28 h 7 0 Bonpl 354
#> 29 i 9 6 Bonpl 1001
#> 30 j 0 6 Bonpl 354
#> 31 a 6 1 Bonpl 1001
#> 32 b 4 3 Bonpl 354
#> 33 c 8 3 Bonpl 1001
#> 34 d 5 0 Bonpl 354
#> 35 e 1 10 Bonpl 1001
#> 36 f 9 8 Bonpl 354
#> 37 g 6 8 Bonpl 1001
#> 38 h 1 5 Bonpl 354
#> 39 i 9 10 Bonpl 1001
#> 41 a 3 5 Bonpl 1001
#> 42 b 6 10 Bonpl 354
#> 43 c 7 5 Bonpl 1001
#> 44 d 2 0 Bonpl 354
#> 45 e 8 10 Bonpl 1001
#> 46 f 3 3 Bonpl 354
#> 47 g 6 2 Bonpl 1001
#> 48 h 9 4 Bonpl 354
#> 49 i 9 0 Bonpl 1001
#> 50 j 2 7 Bonpl 354
#> 52 b 6 3 Bonpl 354
#> 53 c 5 2 Bonpl 1001
#> 54 d 5 1 Bonpl 354
#> 55 e 3 10 Bonpl 1001
#> 57 g 5 7 Bonpl 1001
#> 58 h 10 6 Bonpl 354
#> 59 i 5 2 Bonpl 1001
#> 60 j 1 6 Bonpl 354
#> 61 a 5 5 Bonpl 1001
#> 62 b 2 5 Bonpl 354
#> 63 c 5 9 Bonpl 1001
#> 64 d 7 8 Bonpl 354
#> 65 e 5 1 Bonpl 1001
#> 66 f 3 9 Bonpl 354
#> 67 g 10 2 Bonpl 1001
#> 68 h 8 4 Bonpl 354
#> 69 i 1 1 Bonpl 1001
#> 70 j 7 3 Bonpl 354
#> 71 a 0 7 Bonpl 1001
#> 72 b 10 1 Bonpl 354
#> 73 c 6 7 Bonpl 1001
#> 74 d 5 4 Bonpl 354
#> 75 e 5 8 Bonpl 1001
#> 76 f 2 4 Bonpl 354
#> 77 g 10 8 Bonpl 1001
#> 78 h 3 9 Bonpl 354
#> 79 i 6 2 Bonpl 1001
#> 80 j 6 1 Bonpl 354
#> 81 a 6 10 Bonpl 1001
#> 83 c 7 9 Bonpl 1001
#> 84 d 7 6 Bonpl 354
#> 85 e 10 6 Bonpl 1001
#> 86 f 1 8 Bonpl 354
#> 87 g 1 8 Bonpl 1001
#> 88 h 0 1 Bonpl 354
#> 89 i 6 9 Bonpl 1001
#> 90 j 10 3 Bonpl 354
#> 91 a 5 8 Bonpl 1001
#> 92 b 2 10 Bonpl 354
#> 93 c 8 4 Bonpl 1001
#> 94 d 8 3 Bonpl 354
#> 95 e 2 8 Bonpl 1001
#> 96 f 5 1 Bonpl 354
#> 97 g 1 9 Bonpl 1001
#> 98 h 5 2 Bonpl 354
#> 100 j 2 9 Bonpl 354
#> collection
#> 1 K
#> 2 WAG
#> 3 FR
#> 4 P
#> 5 S
#> 6 K
#> 7 WAG
#> 8 FR
#> 9 P
#> 10 S
#> 11 K
#> 12 WAG
#> 13 FR
#> 14 P
#> 15 S
#> 16 K
#> 17 WAG
#> 18 FR
#> 19 P
#> 20 S
#> 21 K
#> 22 WAG
#> 23 FR
#> 24 P
#> 25 S
#> 26 K
#> 27 WAG
#> 28 FR
#> 29 P
#> 30 S
#> 31 K
#> 32 WAG
#> 33 FR
#> 34 P
#> 35 S
#> 36 K
#> 37 WAG
#> 38 FR
#> 39 P
#> 41 K
#> 42 WAG
#> 43 FR
#> 44 P
#> 45 S
#> 46 K
#> 47 WAG
#> 48 FR
#> 49 P
#> 50 S
#> 52 WAG
#> 53 FR
#> 54 P
#> 55 S
#> 57 WAG
#> 58 FR
#> 59 P
#> 60 S
#> 61 K
#> 62 WAG
#> 63 FR
#> 64 P
#> 65 S
#> 66 K
#> 67 WAG
#> 68 FR
#> 69 P
#> 70 S
#> 71 K
#> 72 WAG
#> 73 FR
#> 74 P
#> 75 S
#> 76 K
#> 77 WAG
#> 78 FR
#> 79 P
#> 80 S
#> 81 K
#> 83 FR
#> 84 P
#> 85 S
#> 86 K
#> 87 WAG
#> 88 FR
#> 89 P
#> 90 S
#> 91 K
#> 92 WAG
#> 93 FR
#> 94 P
#> 95 S
#> 96 K
#> 97 WAG
#> 98 FR
#> 100 S