28  区域数据分析

28.1 苏格兰唇癌数据分析

Everything is related to everything else, but near things are more related than distant things.

— Waldo Tobler (Tobler 1970)

空间区域数据分析

空间区域数据的贝叶斯建模

响应变量服从泊松分布

记录 1975-1986 年苏格兰 56 个地区的唇癌病例数,这是一个按地区汇总的数据。

library(sf)
Linking to GEOS 3.12.0, GDAL 3.7.2, PROJ 9.2.1; sf_use_s2() is TRUE
scotlips <- st_read('data/scotland/scotland.shp', crs = st_crs("EPSG:27700"))
Reading layer `scotland' from data source 
  `/__w/data-analysis-in-action/data-analysis-in-action/data/scotland/scotland.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 56 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 7150.759 ymin: 529557.2 xmax: 468393.4 ymax: 1218479
Projected CRS: OSGB36 / British National Grid
str(scotlips)
Classes 'sf' and 'data.frame':  56 obs. of  10 variables:
 $ SP_ID    : chr  "12" "13" "19" "02" ...
 $ NAME     : chr  "Sutherland" "Nairn" "Inverness" "Banff-Buchan" ...
 $ ID       : num  12 13 19 2 17 16 21 50 15 25 ...
 $ District : int  12 13 19 2 17 16 21 50 15 25 ...
 $ Observed : int  5 3 9 39 2 9 16 6 17 19 ...
 $ Expected : num  1.8 1.1 5.5 8.7 1.1 4.6 10.5 19.6 7.8 15.5 ...
 $ pcaff    : int  16 10 7 16 10 16 7 1 7 1 ...
 $ Latitude : num  58.1 57.5 57.2 57.6 57.1 ...
 $ Longitude: num  4.64 3.98 4.73 2.36 4.09 3 2.98 3.2 3.1 3.3 ...
 $ geometry :sfc_MULTIPOLYGON of length 56; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:73, 1:2] 254302 254442 253074 245057 259217 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:9] "SP_ID" "NAME" "ID" "District" ...
library(ggplot2)
ggplot() +
  geom_sf(data = scotlips, aes(fill = Observed)) +
  scale_fill_viridis_c() +
  theme_minimal()
图 28.1: 苏格兰各地区唇癌病例数分布

28.2 美国各州犯罪率分析

响应变量服从高斯分布的调查数据 (Bivand 2001)

数据集 USArrests 记录 1973 年美国各州每 10 万居民中因谋杀 Murder、袭击 Assault 和强奸 Rape 被警察逮捕的人数以及城市人口所占百分比(可以看作城市化率)。

表格 28.1: 数据集 USArrests(部分)
州名 区域划分 谋杀犯 袭击犯 城市化率 强奸犯
Alabama South 13.2 236 58 21.2
Alaska West 10.0 263 48 44.5
Arizona West 8.1 294 80 31.0
Arkansas South 8.8 190 50 19.5
California West 9.0 276 91 40.6
Colorado West 7.9 204 78 38.7
library(sf)
# 州数据
us_state_sf <- readRDS("data/us-state-map-2010.rds")
# 观测数据
us_state_df <- merge(x = us_state_sf, y = us_arrests,
  by.x = "NAME", by.y = "state_name", all.x = TRUE)

ggplot() +
  geom_sf(
    data = us_state_df, aes(fill = Assault), color = "gray80", lwd = 0.25) +
  scale_fill_viridis_c(option = "plasma", na.value = "white") +
  theme_void()
图 28.2: 因袭击被逮捕的人数分布

1973 年美国各州因袭击被逮捕的人数与城市化率的关系:相关分析

代码
library(ggrepel)
ggplot(data = us_arrests, aes(x = UrbanPop, y = Assault)) +
  geom_point(aes(color = state_region)) +
  geom_text_repel(aes(label = state_name), size = 3, seed = 2022) +
  theme_classic() +
  labs(x = "城市化率(%)", y = "因袭击被逮捕人数", color = "区域划分")
图 28.3: 逮捕人数比例与城市化率的关系

阿拉斯加州和夏威夷州与其它州都不相连,属于孤立的情况,下面在空间相关性的分析中排除这两个州。

# 州的中心
centers48 <- subset(
  x = data.frame(x = state.center$x, y = state.center$y),
  subset = !state.name %in% c("Alaska", "Hawaii")
)
# 观测数据
arrests48 <- subset(
  x = USArrests,
  subset = !rownames(USArrests) %in% c("Alaska", "Hawaii")
)
library(spData)
library(spdep)
# KNN
k4.48 <- knn2nb(knearneigh(as.matrix(centers48), k = 4))
# Moran I test
moran.test(x = arrests48$Assault, listw = nb2listw(k4.48))

    Moran I test under randomisation

data:  arrests48$Assault  
weights: nb2listw(k4.48)    

Moran I statistic standard deviate = 3.4216, p-value = 0.0003113
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.294385644      -0.021276596       0.008511253 
# Permutation test for Moran's I statistic
moran.mc(x = arrests48$Assault, listw = nb2listw(k4.48), nsim = 499)

    Monte-Carlo simulation of Moran I

data:  arrests48$Assault 
weights: nb2listw(k4.48)  
number of simulations + 1: 500 

statistic = 0.29439, observed rank = 498, p-value = 0.004
alternative hypothesis: greater