Sitemaps: The Website - Data Wrangling

Wrangling is a strange term for the process of massaging data into the correct format. I guess that “wrangling” is a slang term in the information technology community that ignores the historical pejorative meaning. I’ll leave wrangling here as most people will understand that we’re not trying to distort or misuse data.

There are two general goals of this section:

Make the column names consistent with the requirements of the styles. For example, the column with label headers needs to be called “text.”
Create or convert data into categories so that the visualization is clear. An example is changing values in a numerical scale into ranges of values and applying a color to each range. This is the most complex of the three goals and it will be the subject of the most discussion.

Getting Started

As usual, there are some tasks that need to be done. First, we’ll load some libraries and get the Google Map key registered.

Show the code chunk

## Libraries
library(readr)        ## Read in data
library(ggmap)        ## Show maps, handle Google key
library(ggplot2)      ## Build maps
library(dplyr)        ## Data wrangling
library(gt)           ## Tables
library(sitemaps)     ## Functions to help build site maps
library(parzer)       ## Convert HMS to digital coordinates
library(jsonlite)     ## Process files using the JSON format
library(httr)         ## Allow API calls

## Initialize Google Map key; the key is stored in a Project directory.
  My_Key <- read_file("P://Hot/Workflow/Workflow/keys/Google_Maps_API_Key.txt")

## Test if Google Key is registered.
if (!has_google_key()){

  ## Register the Google Maps API Key.
  register_google(key = My_Key, account_type = "standard")
  } ## end Google Key test

Now we can initialize some data, just as it has been done in the other chapters.

Show the code chunk

## Use two functions from sitemaps to initialize parameters
column <- site_styles()
hide   <- site_google_hides()

## Establish a theme that improves the appearance of a map.
## This theme removes the axis labels and 
## puts a border around the map. No legend.
simple_black_box <- theme_void() +
              theme(panel.border = element_rect(color = "black", 
                                   fill=NA, 
                                   size=2),
                    legend.position = "none")

And we’re ready to go.

Column Names

The point, label and name overlay information requires three columns of data: text, lon, and lat. Getting these columns named properly (both the words and lower-case type) is the first goal.

The remaining columns follow the names used for the parameters that are loaded with the site_styles() function.

There are two useful functions: rename and rename_with (both from the dplyr package). We’ll use these in the following example. Be sure to check the code to see the process.

The data we’re using in this example comes from a quick survey in Hawaii Volcanoes National Park. There is an area, called the Puhimau Hotspot, that must have lava beneath the surface. The soil temperature has essentially eliminated the forest that had grown here. This is now a mostly bare area with remnant tree stumps, clumps of grass, and various plants growing close to the hot surface.

Show the code chunk

## Read in the field data.
puhimau <- read_csv(col_names = TRUE, file = 
      "Site,  Lon,        Lat,        Temperature
       1,    -155.251451,   19.389354,  140
       2,    -155.251483,   19.388421,  125
       3,    -155.249834,   19.389709,  165
       4,    -155.248924,   19.389591,  180
       5,    -155.248982,   19.388827,  102
       6,    -155.248024,   19.389952,   97
       7,    -155.249994,   19.388816,  135
       8,    -155.251831,   19.387395,   95")

## Rename the site column and shift all the column names to lowercase
puhimau2 <- puhimau %>% 
  dplyr::rename(text = Site) %>% 
  dplyr::rename_with(tolower)

gt(puhimau2) %>% 
  fmt_number(columns = c(lat,lon), decimals = 5) %>% 
  tab_source_note(source_note = "Data: 2017 field survey") %>% 
  tab_footnote(footnote = "degrees F",
      locations = cells_column_labels(columns = temperature))

Table 1: Soil temperatures at the Puhimau Hotspot. Column names updated.
text	lon	lat	temperature¹
1	−155.25145	19.38935	140
2	−155.25148	19.38842	125
3	−155.24983	19.38971	165
4	−155.24892	19.38959	180
5	−155.24898	19.38883	102
6	−155.24802	19.38995	97
7	−155.24999	19.38882	135
8	−155.25183	19.38740	95
Data: 2017 field survey
¹ degrees F

Let’s check this with a map.

Show the code chunk

## Modify some default parameters.
column$margin <- 0.1
column$gmaptype <- "satellite"

## Generate a basemap
basemap <- site_google_basemap(datatable = puhimau2)

## Show the map (with points & labels)
ggmap(basemap) +
  site_labels(datatable = puhimau2) +
  site_points(datatable = puhimau2) +
  simple_black_box

Figure 1: 2017 Puhimau soil temperature survey sites.

Everything looks OK. We’ll see how to color code the temperature for each survey site i the next section.

Data as Categories

Many types of data that we’ll use for map symbolism come as categories. Examples include vegetation types (e.g., deciduous forest, desert), species (e.g., mouse, rat), and lodging (e.g., hotel, resort). Each category can be assigned a color and map symbols will show the distribution of the categories on the map.

We need to convert quantitative values to categories to get the same visual power. We’ll deal with this quantitative problem first, then look at representing data already divided into categories as different colors.

Categories from Quantitative Values

Tip

Converting quantitative values to categories is one of the most important tasks of data visualization with maps. Knowing how to use the cut function is an important skill. Applying the cut function is a critical behavior.

First, we need to divide a column of quantitative values into segments. Usually, the number is segments is quite small. Perhaps between 3 and 5. This can be done in two ways.

Perhaps the easiest and most common way is to choose a value for the number of segments. For example, you can use 3 to get a sequence of values divided into three equal parts from the smallest to the largest values.

An alternative way is to consider the statistical properties of the data distribution. In this case, the soil temperatures. In this example, we’ll use the site_cuts function to divide temperature into categories based on the mean and standard deviation of the set of temperature values. The bottom set is below 1 SD from the mean, the next goes to the mean, the next to 1 SD above the mean, and the final segment is above 1 SD above the mean.

The next step is to assign a color to each segment based on the index value created using the site_cuts function. The assignment is done with a color lookup table.

Show the code chunk

## Here are the three parts of the cut.
puhimau2$index <- site_cuts(quant_var = puhimau2$temperature, 
                            cuttype = "statistical4")

## Color codes for the temperature segments
color_table <- read_csv(col_names = TRUE, file =
  "index, point_color
   1,     yellow   
   2,     orange
   3,     red
   4,     black")  

## Merge the colors into the data table
puhimau2 <- merge(puhimau2, color_table, by = "index")

## Print the table for confirmation
gt(puhimau2) %>% 
  tab_source_note(source_note = "Data: 2017 field survey") %>% 
  tab_footnote(footnote = "degrees F",
      locations = cells_column_labels(columns = temperature)) %>% 
  tab_footnote(footnote = "Yellow=low, Orange=mid, Red=high , Black=very high temperature.",
      locations = cells_column_labels(columns = point_color))

Table 2: Soil temperatures divided into three ranges.
index	text	lon	lat	temperature¹	point_color²
1	6	-155.2480	19.38995	97	yellow
1	8	-155.2518	19.38740	95	yellow
2	2	-155.2515	19.38842	125	orange
2	5	-155.2490	19.38883	102	orange
3	1	-155.2515	19.38935	140	red
3	7	-155.2500	19.38882	135	red
4	3	-155.2498	19.38971	165	black
4	4	-155.2489	19.38959	180	black
Data: 2017 field survey
¹ degrees F
² Yellow=low, Orange=mid, Red=high , Black=very high temperature.

Show the code chunk

## Increase the size of the points.
column$point_size <- 6

## We can use the existing basemap.
## Show the map (with points & labels)
ggmap(basemap) +
  site_labels(datatable = puhimau2) +
  site_points(datatable = puhimau2) +
  labs(caption="Key: yellow, orange, red, black; cooler to hotter") +
  simple_black_box

Figure 2: Yellow = lowest, Orange = mid, Red = high, Black = vert high temperatures.

Converting descriptors to color values

Show the code chunk

## Reinitialize the styles
column <- site_styles()

## Read the data from the Hawaii Volcanoes National Park.
sites <- read_csv(col_names = TRUE, file = 
  "location,         habitat,                  lat,      lon
Kipuka Puaulu,       Upland Forest & Woodland, 19.43765, -155.30317
Top of MLSR,         Upland Forest & Woodland, 19.49278, -155.38532
Keamoku Flow,        Upland Forest & Woodland, 19.47638, -155.36552
Thurston Lava Tube,  Rainforest,               19.41368, -155.23872
`Ola`a Forest,       Rainforest,               19.46464, -155.24967
Sulfur Banks,        Mid-elevation Woodland,   19.43278, -155.26224
Southwest Rift Zone, Mid-elevation Woodland,   19.40136, -155.29409
1974 Lava Flow,      Mid-elevation Woodland,   19.40056, -155.25574
Puhimau Hot Spot,    Mid-elevation Woodland,   19.39007, -155.24783
Observatory,         Mid-elevation Woodland,   19.41991, -155.28804")

## Show the data table
gt(sites) %>% 
  fmt_number(columns = c(lat,lon), decimals = 5) %>% 
  tab_style(style = cell_text(v_align = "top"),
            locations = cells_body())

Table 3: Research sites in Hawaii Volcanoes National Park.
location	habitat	lat	lon
Kipuka Puaulu	Upland Forest & Woodland	19.43765	−155.30317
Top of MLSR	Upland Forest & Woodland	19.49278	−155.38532
Keamoku Flow	Upland Forest & Woodland	19.47638	−155.36552
Thurston Lava Tube	Rainforest	19.41368	−155.23872
`Ola`a Forest	Rainforest	19.46464	−155.24967
Sulfur Banks	Mid-elevation Woodland	19.43278	−155.26224
Southwest Rift Zone	Mid-elevation Woodland	19.40136	−155.29409
1974 Lava Flow	Mid-elevation Woodland	19.40056	−155.25574
Puhimau Hot Spot	Mid-elevation Woodland	19.39007	−155.24783
Observatory	Mid-elevation Woodland	19.41991	−155.28804

What we want to place on the map is a color-coded point with each color representing the habitat type.

The first step is to make a table that relates the different habitats to the colors. Note that the column with the habitats must have the same name as the column for the habitats in the main table. We’ll also use the column name point_color for the colors as this is the parameter name used to color location symbols.

Then it is just a matter of using the R base function merge to add the colors to the original table.

Show the code chunk

## Lookup table
lookup_list <- read_csv(col_names = TRUE, file = 
     "habitat,                  point_color
      Rainforest,               blue
      Upland Forest & Woodland, brown
      Mid-elevation Woodland,   green")

## Merge into the data table
sites <- merge(sites, lookup_list, by="habitat")

gt(sites) %>% 
  fmt_number(columns = c(lat,lon), decimals = 5) %>% 
  tab_style(style = cell_text(v_align = "top"),
            locations = cells_body())

Table 4: Habitats color-coded for the research sites.
habitat	location	lat	lon	point_color
Mid-elevation Woodland	Sulfur Banks	19.43278	−155.26224	green
Mid-elevation Woodland	Southwest Rift Zone	19.40136	−155.29409	green
Mid-elevation Woodland	1974 Lava Flow	19.40056	−155.25574	green
Mid-elevation Woodland	Puhimau Hot Spot	19.39007	−155.24783	green
Mid-elevation Woodland	Observatory	19.41991	−155.28804	green
Rainforest	Thurston Lava Tube	19.41368	−155.23872	blue
Rainforest	`Ola`a Forest	19.46464	−155.24967	blue
Upland Forest & Woodland	Kipuka Puaulu	19.43765	−155.30317	brown
Upland Forest & Woodland	Top of MLSR	19.49278	−155.38532	brown
Upland Forest & Woodland	Keamoku Flow	19.47638	−155.36552	brown

We can check our progress with a map.

Note that we need to change the name of the location column to text so this site identifier appears in the point labels.

Show the code chunk

## Specify a satellite basemap
column$gmaptype <- "satellite"

## Change the location to text
sites2 <- sites %>% 
  dplyr::rename(text = location)

## Generate a basemap
basemap <- site_google_basemap(datatable = sites2)

## Show the map (with points & labels)
ggmap(basemap) +
  site_labels(datatable = sites2) +
  site_points(datatable = sites2) +
  labs(caption = "Key: point color designates habitat") +
  simple_black_box

Figure 3: Key: Blue=Rainforest, Green=Mid-elevation Woodland, Brown=Upland forest & Woodland.

Adding More Information

Here, we’re focused on making both the table and the map richer with the addition of information. There are several ways this can be done.

Typing New Columns

If there is a data source with additional information, you might simply want to type the values into a new column in the original table. Because the code re-runs all the basic data manipulation steps, it is straight-forward to modify the original table and produce a fully updated version.

Data Services to Add Data

It may be possible to use existing table data to generate additional information. For our HVNP research sites, we can use Google Maps to obtain the elevation at each location.

The function we need, map_elevation, is another custom function in the mapfunctions package.

The get_elevation function works on the datatable, just like most of the other sitemaps functions. This means that it is looking for table columns with the lat and lon names. The function adds a new column to the datatable, called elevation.

We’ll do a couple more things in this chunk: sort so the highest elevation sites are at the top of the table and create a color-code for the elevations. We’ll be using this color-code as an outline around the data points.

Show the code chunk

## Add the elevation to the sites
sites3 <- site_elevation(datatable=sites2, APIkey = My_Key)

## Sort the sites with highest elevation at the top
sites4 <- sites3 %>% 
  arrange(desc(elevation))

## Color code the elevation ranges (top and bottom half of the elevation range)
sites4$index <- site_cuts(quant_var = sites4$elevation, 
                          cuttype = "topbottom")

## Color lookup table
color_table <- read_csv(col_names = TRUE, file = 
  "index,  point_outline_color
   1,      blue
   2,      cyan")

## Merge colors into the data table
sites4 <- merge(sites4, color_table, by = "index")

## Show the sorted table
gt(sites4) %>% 
  fmt_number(columns = c(lat,lon), decimals = 5) %>% 
  tab_style(style = cell_text(v_align = "top"),
            locations = cells_body()) %>% 
  tab_footnote(footnote = "Meters (from Google Maps)",
      locations = cells_column_labels(columns = elevation))

Table 5: Elevations added to the research locations.
index	habitat	text	lat	lon	point_color	elevation¹	point_outline_color
1	Mid-elevation Woodland	Observatory	19.41991	−155.28804	green	1245	blue
1	Mid-elevation Woodland	Sulfur Banks	19.43278	−155.26224	green	1199	blue
1	Upland Forest & Woodland	Kipuka Puaulu	19.43765	−155.30317	brown	1197	blue
1	Rainforest	Thurston Lava Tube	19.41368	−155.23872	blue	1194	blue
1	Rainforest	`Ola`a Forest	19.46464	−155.24967	blue	1179	blue
1	Mid-elevation Woodland	Southwest Rift Zone	19.40136	−155.29409	green	1140	blue
1	Mid-elevation Woodland	1974 Lava Flow	19.40056	−155.25574	green	1120	blue
1	Mid-elevation Woodland	Puhimau Hot Spot	19.39007	−155.24783	green	1094	blue
2	Upland Forest & Woodland	Top of MLSR	19.49278	−155.38532	brown	2032	cyan
2	Upland Forest & Woodland	Keamoku Flow	19.47638	−155.36552	brown	1721	cyan
¹ Meters (from Google Maps)

We’re now ready to see the visualization of the data for the Hawaii Volcanoes National Park site.

Show the code chunk

column$point_outline_thickness <- 2

## Show the map (with points & labels)
ggmap(basemap) +
  site_labels(datatable = sites4) +
  site_points(datatable = sites4) +
  simple_black_box

Figure 4: Elevation Key (rings): Cyan=Upper elevation, Pink=Mid-elevation sites.

Showing Less Data

Sometimes, the data table has more values than you want to display on your map. Or, you have duplicated locations and you want to simplify the map without loosing the information value of the duplicated points.

The data consolidation or reduction can be quite complex. The next section shows just one way this can be done that applies quite well to making maps.

Combining Rows

The example used here is the fuel log for a Smart car used primarily for long-distance trips in the Lower48.

Creamsicle, the long-distance Smart car, parked in Wisconsin on a cold day.

Information about each fuel stop is recorded and typed into a spreadsheet. A portion of that spreadsheet is contained in the following code chunk. Note that a lot more information is collected at each fuel stop. This is just a convenient extract to show the problem and solution to excessive data overlap.

Show the code chunk

## Resetting the parameters
column <- site_styles()

## Read the fuel log data
fuel <- read_csv(col_names = TRUE, file = 
    "Date,       City,            State,  Gallons
     09/23/2021, Nixa,            MO,     6.000
     09/23/2021, Garden City,     MO,     5.124
     09/24/2021, Plattsburg,      MO,     4.033
     09/25/2021, Lincoln,         NE,     3.799
     09/25/2021, Gothensburg,     NE,     3.799
     09/25/2021, Atwood,          CO,     6.270
     09/26/2021, Arvada,          CO,     2.737
     09/26/2021, Grand Junction,  CO,     5.766
     09/27/2021, Salina,          UT,     5.213
     09/27/2021, St George,       UT,     4.839
     09/28/2021, Jean,            NV,     4.773
     09/28/2021, Victorville,     CA,     5.274
     10/15/2021, Torrance,        CA,     2.860
     10/15/2021, Barstow,         CA,     3.610
     10/15/2021, Searchlight,     NV,     4.824
     10/16/2021, Ash Fork,        AZ,     4.742
     10/16/2021, Sanders,         AZ,     4.869
     10/17/2021, Albuquerque,     NM,     4.324
     10/17/2021, Wagon Mound,     NM,     4.313
     10/17/2021, Colorado Springs,CO,     5.138
     10/18/2021, Colby,           KS,     4.356
     10/19/2021, Kearney,         NE,     3.852
     10/19/2021, Omaha,           NE,     4.396
     10/20/2021, Des Moines,      IA,     3.280
     10/20/2021, Anamosa,         IA,     4.261
     10/22/2021, Madison,         WI,     4.239
     10/28/2021, Sun Prairie,     WI,     4.309
     11/07/2021, Madison,         WI,     4.159
     11/13/2021, Sun Prairie,     WI,     3.700
     11/16/2021, Sun Prairie,     WI,     1.881
     11/20/2021, Sun Prairie,     WI,     3.458
     11/26/2021, Sun Prairie,     WI,     4.168
     12/01/2021, Sun Prairie,     WI,     1.813
     12/07/2021, Sun Prairie,     WI,     4.271
     12/14/2021, Sun Prairie,     WI,     2.636
     12/20/2021, Sun Prairie,     WI,     3.444
     12/30/2021, Sun Prairie,     WI,     4.084
      1/18/2022, Sun Prairie,     WI,     1.751
      1/18/2022, Mendota,         IL,     3.790
      1/19/2022, Springfield,     IL,     5.357
      1/20/2022, Shelbina,        MO,     4.685
      1/21/2022, Kansas City,     MO,     5.182
      1/21/2022, Junction City,   KS,     3.829
      1/22/2022, Hays,            KS,     5.542
      1/22/2022, Burlington,      CO,     4.795
      1/23/2022, Colorado Springs,CO,     4.663
      1/23/2022, Walsenburg,      CO,     2.633
      1/23/2022, Questa,          NM,     2.514
      1/24/2022, Grants,          NM,     5.698
      1/25/2022, Flagstaff,       AZ,     5.256
      1/26/2022, Williams,        AZ,     4.874
      1/27/2022, Lake Havasu City,AZ,     3.318
      1/27/2022, Indio,           CA,     4.311
      2/02/2022, Alhambra,        CA,     4.818
      5/20/2022, Torrance,        CA,     4.918
      5/23/2022, Torrance,        CA,     3.552
      5/29/2022, Bakersfield,     CA,     4.173
      5/29/2022, Firebaugh,       CA,     5.219
      5/30/2022, Williams,        CA,     5.627
      5/30/2022, Mt Shasta,       CA,     5.558
      5/31/2022, Roseburg,        OR,     5.577
      6/08/2022, Veneta,          OR,     4.674
      6/11/2022, Bandon,          OR,     4.329
      6/12/2022, Klamath,         CA,     2.927
      6/12/2022, Leggett,         CA,     4.200
      6/13/2022, Bodega Bay,      CA,     2.887
      6/13/2022, Santa Cruz,      CA,     2.714
      6/14/2022, San Luis Obispo, CA,     5.287")


## Geocode the locations
fuel_loc <- geocode(paste0(fuel$City,", ",fuel$State),
                    output = "latlon")
fuel <- cbind(fuel,fuel_loc)

## Output a confirmation table
gt(fuel)

Table 6: Fuel information for Creamsicle (data subset).
Date	City	State	Gallons	lon	lat
09/23/2021	Nixa	MO	6.000	-93.29435	37.04339
09/23/2021	Garden City	MO	5.124	-94.19133	38.56112
09/24/2021	Plattsburg	MO	4.033	-94.44801	39.56555
09/25/2021	Lincoln	NE	3.799	-96.70260	40.81362
09/25/2021	Gothensburg	NE	3.799	-100.16070	40.92766
09/25/2021	Atwood	CO	6.270	-103.26966	40.54776
09/26/2021	Arvada	CO	2.737	-105.08748	39.80276
09/26/2021	Grand Junction	CO	5.766	-108.55065	39.06387
09/27/2021	Salina	UT	5.213	-111.85993	38.95774
09/27/2021	St George	UT	4.839	-113.56842	37.09653
09/28/2021	Jean	NV	4.773	-115.32388	35.77887
09/28/2021	Victorville	CA	5.274	-117.29276	34.53622
10/15/2021	Torrance	CA	2.860	-118.34063	33.83585
10/15/2021	Barstow	CA	3.610	-117.01728	34.89580
10/15/2021	Searchlight	NV	4.824	-114.91970	35.46527
10/16/2021	Ash Fork	AZ	4.742	-112.48407	35.22501
10/16/2021	Sanders	AZ	4.869	-109.32879	35.20914
10/17/2021	Albuquerque	NM	4.324	-106.65042	35.08439
10/17/2021	Wagon Mound	NM	4.313	-104.70666	36.00893
10/17/2021	Colorado Springs	CO	5.138	-104.82136	38.83388
10/18/2021	Colby	KS	4.356	-101.05238	39.39584
10/19/2021	Kearney	NE	3.852	-99.08168	40.69933
10/19/2021	Omaha	NE	4.396	-95.93450	41.25654
10/20/2021	Des Moines	IA	3.280	-93.62496	41.58684
10/20/2021	Anamosa	IA	4.261	-91.28516	42.10834
10/22/2021	Madison	WI	4.239	-89.40075	43.07217
10/28/2021	Sun Prairie	WI	4.309	-89.21373	43.18360
11/07/2021	Madison	WI	4.159	-89.40075	43.07217
11/13/2021	Sun Prairie	WI	3.700	-89.21373	43.18360
11/16/2021	Sun Prairie	WI	1.881	-89.21373	43.18360
11/20/2021	Sun Prairie	WI	3.458	-89.21373	43.18360
11/26/2021	Sun Prairie	WI	4.168	-89.21373	43.18360
12/01/2021	Sun Prairie	WI	1.813	-89.21373	43.18360
12/07/2021	Sun Prairie	WI	4.271	-89.21373	43.18360
12/14/2021	Sun Prairie	WI	2.636	-89.21373	43.18360
12/20/2021	Sun Prairie	WI	3.444	-89.21373	43.18360
12/30/2021	Sun Prairie	WI	4.084	-89.21373	43.18360
1/18/2022	Sun Prairie	WI	1.751	-89.21373	43.18360
1/18/2022	Mendota	IL	3.790	-89.11759	41.54725
1/19/2022	Springfield	IL	5.357	-89.65015	39.78172
1/20/2022	Shelbina	MO	4.685	-92.04295	39.69393
1/21/2022	Kansas City	MO	5.182	-94.57857	39.09973
1/21/2022	Junction City	KS	3.829	-96.83140	39.02861
1/22/2022	Hays	KS	5.542	-99.32677	38.87918
1/22/2022	Burlington	CO	4.795	-102.26936	39.30611
1/23/2022	Colorado Springs	CO	4.663	-104.82136	38.83388
1/23/2022	Walsenburg	CO	2.633	-104.78040	37.62274
1/23/2022	Questa	NM	2.514	-105.59501	36.70391
1/24/2022	Grants	NM	5.698	-107.85145	35.14726
1/25/2022	Flagstaff	AZ	5.256	-111.65130	35.19828
1/26/2022	Williams	AZ	4.874	-112.19100	35.24946
1/27/2022	Lake Havasu City	AZ	3.318	-114.32245	34.48390
1/27/2022	Indio	CA	4.311	-116.21556	33.72058
2/02/2022	Alhambra	CA	4.818	-118.12701	34.09529
5/20/2022	Torrance	CA	4.918	-118.34063	33.83585
5/23/2022	Torrance	CA	3.552	-118.34063	33.83585
5/29/2022	Bakersfield	CA	4.173	-119.01871	35.37329
5/29/2022	Firebaugh	CA	5.219	-120.45601	36.85884
5/30/2022	Williams	CA	5.627	-122.14942	39.15461
5/30/2022	Mt Shasta	CA	5.558	-122.31057	41.30987
5/31/2022	Roseburg	OR	5.577	-123.34174	43.21650
6/08/2022	Veneta	OR	4.674	-123.35093	44.04873
6/11/2022	Bandon	OR	4.329	-124.40845	43.11900
6/12/2022	Klamath	CA	2.927	-124.03841	41.52651
6/12/2022	Leggett	CA	4.200	-123.71483	39.86548
6/13/2022	Bodega Bay	CA	2.887	-123.04806	38.33325
6/13/2022	Santa Cruz	CA	2.714	-122.03080	36.97412
6/14/2022	San Luis Obispo	CA	5.287	-120.65962	35.28275

Use a preliminary map to check the fuel stop locations.

Show the code chunk

## Use the City column for the label.
fuel2 <- fuel %>% 
  dplyr::rename(text = City)

## Generate a basemap.
basemap <- site_google_basemap(datatable = fuel2)

## Show the map (with points)
ggmap(basemap) +
  site_points(datatable = fuel2) +
  simple_black_box

Figure 5: Check of the fuel stop locations.

The data cover a very wide area. To see the details, we can divide the data into two parts: West and East relative to some area in Kansas.

Show the code chunk

## Choose a division line
lon_divide <- -98.0

## Separate out the sites to the east of the divide
east_part <- fuel2 %>% 
  filter(lon > lon_divide)

## Create a basemap
basemap_east <- site_google_basemap(datatable = east_part)

## Plot the map
ggmap(basemap_east) +
  site_points(datatable = east_part) +
  simple_black_box

Figure 6: Fuel locations for the East segment.

Combine rows for the same city.

Colors are used to highlight the locations where there were multiple fuel stop.

Show the code chunk

## Combine the City (as text) and State
east_part$citystate <- paste0(east_part$text,", ",east_part$State)

## Count duplicate locations (compresses the list)
station_count <- as.data.frame(table(east_part$citystate))

## Modify the column names
station_count <- station_count %>% 
  rename(station = Var1) %>% 
  rename(freq = Freq)

## Make the station name character
station_count$station <- as.character(station_count$station)

## Number each of the locations (not gas stations)
station_count$text <- as.character(seq.int(nrow(station_count)))

## Re-geocode just the remaining set of stations
fuel_locs <- geocode(station_count$station, output="latlon")
map_data  <- cbind(station_count,fuel_locs)

## Add a column for the label color
quant_variable <- map_data$freq
set_of_colors  <- c("white","pink","red")
map_data$label_background_color <- cut(quant_variable, 
                                       breaks = c(0,1,2,100),
                                       labels = set_of_colors)

map_data$label_background_color<- as.character(map_data$label_background_color)

## Expand the basemap a bit to accommodate the labels
column$margin <- 0.1
basemap_east <- site_google_basemap(datatable = map_data)

## Plot the map
ggmap(basemap_east) +
  site_labels(datatable = map_data) +
  site_points(datatable = map_data) +
  simple_black_box

Figure 7: Labeled fuel location for the East segment.

You can see that enough time was spent around the Madison, Wisconsin area to require quite a few fuel stops. The other stations were visited just once.

A table gives the details appropriate to the map.

Show the code chunk

gt(map_data)  %>% 
  cols_move_to_start(text) %>% 
  cols_label(text="Ref", station="Location", freq="Frequency") %>% 
  cols_hide(columns = c(lon, lat, label_background_color)) %>% 
  tab_footnote(
    footnote = "Locations in alphabetical, not temporal order.",
    locations = cells_column_labels(columns = station))

Table 7: Location and frequency of fuels stops in the east segment. Location are identified in Table 7.
Ref	Location¹	Frequency
1	Anamosa, IA	1
2	Des Moines, IA	1
3	Garden City, MO	1
4	Junction City, KS	1
5	Kansas City, MO	1
6	Lincoln, NE	1
7	Madison, WI	2
8	Mendota, IL	1
9	Nixa, MO	1
10	Omaha, NE	1
11	Plattsburg, MO	1
12	Shelbina, MO	1
13	Springfield, IL	1
14	Sun Prairie, WI	11
¹ Locations in alphabetical, not temporal order.

A More Complex Example

Data points can have a number of properties that convey the characteristics of the data.

First, let’s get some data.

Show the code chunk

## Reinitialize the parameters
column <- site_styles()

inst <- read_csv(col_names = TRUE, file = 
     "Campus,       Acceptance, Undergrads, Aid,  Abbrev
      Berkeley,     0.17,       29310,      0.63, UC
      Davis,        0.42,       29379,      0.70, UCD
      Irvine,       0.41,       27331,      0.68, UCI
      Los Angeles,  0.18,       30873,      0.64, UCLA
      Merced,       0.74,        6815,      0.92, UCM
      Riverside,    0.66,       19799,      0.85, UCR
      San Diego,    0.36,       28127,      0.56, UCSD
      Santa Barbara,0.36,       21574,      0.70, UCSB
      Santa Cruz,   0.58,       16962,      0.77, UCSC")
     
source <- "https://www.thoughtco.com/comparison-university-of-california-campuses-786974"

gt(inst) %>% 
  fmt_percent(columns=c(Acceptance, Aid), decimals=0) %>% 
  fmt_number(columns=Undergrads, decimals=0, use_seps=TRUE) %>% 
  tab_source_note(source_note=source)

Table 8: Data for University of California campuses.
Campus	Acceptance	Undergrads	Aid	Abbrev
Berkeley	17%	29,310	63%	UC
Davis	42%	29,379	70%	UCD
Irvine	41%	27,331	68%	UCI
Los Angeles	18%	30,873	64%	UCLA
Merced	74%	6,815	92%	UCM
Riverside	66%	19,799	85%	UCR
San Diego	36%	28,127	56%	UCSD
Santa Barbara	36%	21,574	70%	UCSB
Santa Cruz	58%	16,962	77%	UCSC
https://www.thoughtco.com/comparison-university-of-california-campuses-786974

Here are some data wrangling that needs to be done:

Reset the default parameters.
Get the geographic coordinates for each campus. Using the center of the city is good enough for the resolution of the map we’ll produce.
Scale the data values into categories.
Specify a wider line around the data points.
Make the column names compatible with those needed to make a map.

Don’t be concerned with how this is done here. There are methodology descriptions shown elsewhere. The goal is to demonstrate a more complex map, and that takes some data wrangling. The results are in Table 9.

Show the code chunk

## Add California to the campus cities.
location <- paste0(inst$Campus,", California, USA") 

## Get the coordinates from the location.
coord <- coord <- geocode(location, output = "latlon")

## Add the coordinates to the original location data.
inst2 <- cbind(inst,coord)

## Divide the Undergraduates (school size) into size categories.
inst2$sindex <- site_cuts(quant_var = inst2$Undergrads, cuttype = "quartiles4")

## Convert the index to a point size value; merge the size data into the data table
size_lookup <- read_csv(col_names = TRUE, file =
    "sindex, point_size
     1,      3
     2,      5
     3,      7
     4,      9")
inst2 <- merge(inst2, size_lookup, by = "sindex")

## Cut the Acceptance data into chunks
inst2$aindex <- site_cuts(quant_var = inst2$Acceptance, 
                          cuttype = "quartiles4")

## Convert the index to a point color value; merge the color data into the data table
color_lookup <- read_csv(col_names = TRUE, file =
    "aindex, point_color
     1,      red
     2,      orange
     3,      yellow
     4,      green")
inst2 <- merge(inst2, color_lookup, by = "aindex")

## Cut the Financial Aid percentage into categories
inst2$findex <- site_cuts(quant_var = inst2$Aid, 
                          cuttype = "quartiles4")

## Assign colors to the Aid categories; merge the aid colors into the data table
color_lookup2 <- read_csv(col_names = TRUE, comment = "#", file =
    "findex, point_outline_color
     1,      black           # little aid
     2,      gray60
     3,      darkorchid2
     4,      maroon1         # ample aid")
inst2 <- merge(inst2, color_lookup2, by = "findex")

## Change the stroke size (line around point).
column$point_outline_thickness <- 3

## Fix the names so they are standard column names to get labels.
inst2 <- inst2 %>% 
  dplyr::rename(text = Abbrev)

gt(inst2) %>% 
  fmt_number(columns = c(lat,lon), decimals = 5) %>% 
  tab_style(style = cell_text(v_align = "top"),
            locations = cells_body())

Table 9: Prepare data for the US campuses map.
findex	aindex	sindex	Campus	Acceptance	Undergrads	Aid	text	lon	lat	point_size	point_color	point_outline_color
1	1	3	Berkeley	0.17	29310	0.63	UC	−122.27304	37.87152	7	red	black
1	1	4	Los Angeles	0.18	30873	0.64	UCLA	−118.24368	34.05223	9	red	black
1	1	3	San Diego	0.36	28127	0.56	UCSD	−117.16108	32.71574	7	red	black
2	1	2	Santa Barbara	0.36	21574	0.70	UCSB	−119.69819	34.42083	5	red	gray60
2	3	4	Davis	0.42	29379	0.70	UCD	−121.74052	38.54491	9	yellow	gray60
2	2	2	Irvine	0.41	27331	0.68	UCI	−117.82650	33.68457	5	orange	gray60
3	3	1	Santa Cruz	0.58	16962	0.77	UCSC	−122.03080	36.97412	3	yellow	darkorchid2
4	4	1	Merced	0.74	6815	0.92	UCM	−120.48297	37.30216	3	green	maroon1
4	4	1	Riverside	0.66	19799	0.85	UCR	−117.37549	33.98060	3	green	maroon1

Now it is time to create the map (Figure 8).

The coding is as follows:

The size of the point indicates the number of undergraduates.
The point color is the acceptance rate (green = high, yellow = medium, red = low).
The point outline color is the fraction of students getting financial aid (blue = high, pink = medium, black = low).

For example, UC (Berkeley) is a big campus it is hard to get into, and it gives relatively little financial aid. On the other hand, UCM (Merced) is a new campus that is small, relatively easy to get into, and offers more students financial aid.

Show the code chunk

## Expand the map margins just a bit
column$margin <- 0.1

## Generate a basemap
basemap <- site_google_basemap(datatable = inst2)

## Show the map (with points & labels)
ggmap(basemap) +
  site_points(datatable = inst2) +
  site_labels(datatable = inst2) +
  simple_black_box

Figure 8: Properties of the University of California campuses.

You should now see some of the potential available through the use of symbolism.

Note that there are no changes to the labels, nor are there any names added to this plot. Exploration in the use of symbolism has just started.