Spatial Autocorrelation

The Basics of Spatial Autocorrelation

Spatial autocorrelation is a statistical concept that measures the degree to which data points in a spatial dataset are related based on geographic proximity. It is commonly used in spatial analysis to understand patterns and relationships in data influenced by spatial location. Two common spatial weights used in spatial autocorrelation analysis are queen and rook weights, which define the neighboring relationships between data points based on their shared edges or shared corners, respectively. The nearest neighbors concept refers to identifying the closest neighboring data points to each point, which can be useful for understanding clustering or dispersion patterns. First-order nearest neighbors specifically refer to the immediate neighbors of each point. Moran scatterplots are graphical tools that help visualize the relationship between a variable and its neighboring values, providing insights into spatial patterns of similarity or dissimilarity. The randomized Moran I test is a statistical test used to determine whether significant spatial autocorrelation is present in a dataset by comparing the observed Moran's I statistic with a distribution generated from random permutations of the data. These statistical concepts and techniques are valuable for exploring and analyzing spatial patterns and relationships.

Loading and Viewing the Asthma Data

Data used in this analysis: Prevalence of Adult Asthma in Select Los Angeles County zip codes between 2013-2014. The image below shows the coverage extent of the data.

This analysis utilizes R Studio and GeoDa. R code is presented in light blue boxes, while the resulting output is provided in light grey boxes.

asthma <-st_read("Prevalence_of_Adult_Asthma%2C_2013-2014.shp") 
a22<- asthma[! is.na(asthma$Percent_),] 
a2 <- as(a22,"Spatial") 
sf_use_s2(FALSE) 
qtm(a2)

Evaluating Queen and Rook Contiguity Spatial Weights

# Queen 
ces_q <-poly2nb(ces2, queen=TRUE)
summary(ces_q)
Neighbour list object: 
Number of regions: 119  
Number of nonzero links: 606  
Percentage nonzero weights: 4.279359  
Average number of links: 5.092437  
Link number distribution: 

1 2 3 4 5 6 7 8 10

1 6 14 22 23 32 14 6 1

1 least connected region:

84 with 1 link

1 most connected region:

62 with 10 links

plot(a2, border = 'lightgrey') 
 plot(a2_q, coordinates(a2), add=TRUE, col='red')
# Rook
a2_r <-poly2nb(a2, queen=FALSE)
summary(a2_r)
Neighbour list object: 
Number of regions: 119  
Number of nonzero links: 588
Percentage nonzero weights: 4.152249  
Average number of links: 4.941176
Link number distribution: 

1 2 3 4 5 6 7 8 9

1 7 15 25 24 29 11 6 1

1 least connected region:

84 with 1 link

1 most connected region:

62 with 9 links

plot(a2, border = 'lightgrey') 
 plot(a2_r, coordinates(a2), add=TRUE, col='blue')
# Queen and Rook
plot(ces2, border = 'lightgrey')
plot(ces_q, coordinates(ces2), add=TRUE, col='red')
plot(ces_r, coordinates(ces2), add=TRUE, col='blue')

First Order Nearest Neighbor Objects

coords <- coordinates(a2)
head(coords)
            [,1]             [,2]
1 -118.2494   33.97368
2 -118.2471   33.94910
3 -118.2739  33.96411
4 -118.3110    34.07621
5 -118.3091   34.05887
6 -118.2941   34.04827
k1 <- knn2nb(knearneigh(coords))
k1dists <-unlist(nbdists(k1, coords, longlat=TRUE))
summary(k1dists)
Min.     1st Qu.     Median     Mean     3rd Qu.      Max.
0.335   2.006      2.526       2.672      3.107         7.667
plot(a2, border = 'lightgrey')
plot(k1, coords, add=TRUE, col="red")
title(main = "Links of first-order nearest neighbors", cex = 0.6)
a2_8 <- dnearneigh(coords, 0, 8, longlat=TRUE)
plot(a2, border = 'lightgrey')
plot(a2_8, coords, add=TRUE, col="blue", pch=19, cex=0.6)
title(main="Neighbors within 8kms", cex=0.6)
distances <- spDists(coords, longlat=TRUE)%>%rowSums(.)
min_dist <- min(distances)
max_dist <- max(distances)
min_i <- which(distances==min_dist)
max_i <- which(distances==max_dist)
a_data <- as.data.frame(a2)
access <- a_data[min_i,c("ZIPCODE")]
remote <- a_data[max_i,c("ZIPCODE")]
access
"90048"
remote
"90732"
tm_shape(a2)+ tm_polygons() + tm_shape(highlighted_access) + tm_fill(col = 'green', alpha = 0.5, highlight = TRUE) + tm_shape(highlighted_remote) + tm_fill(col = 'red', alpha = 0.5, highlight = TRUE) + tm_borders(col = 'black', lwd = 0.5)

The spDists() function calculated euclidean distances for coordinates of the asthma dataframe. From those distances, the maximum and minimum values can be queried for and then called upon to produce the zip codes associated with the regions that are the most accessible (meaning the shortest euclidean distance to other polygons) or the most remote (meaning the longest euclidean distance to other polygons).

Mapping Asthma

tm_shape(a2) +
tm_fill("Percent_", style="quantile", n=4, palette="Oranges", textNA="0") + tm_layout(m
ain.title = "% Pop w/ Asthma, 2014",
main.title.position = "center",
title.size = .5,
legend.position = c("LEFT", "BOTTOM"),
legend.text.size =.65) +
tm_borders(col = "light grey", lwd = 1)

This map of zip codes in the Los Angeles area shows that the zip codes with the highest percentages (12.3 to 13.7% and 13.7 to 16.6 %) of their population having been diagnosed with asthma appear to be clustered together with only a few outliers. The lowest percentages of asthma appear to be in the lower half of the map.

Moran Scatterplot and Test Analysis

sd(a2$Percent_)
2.047352
mean(a2$Percent_)
12.15462
a2$stdPercent <- (a2$Percent_-mean(a2$Percent_)/sd(a2$Percent_))
moran.plot(a2$stdPercent, listw=a2w_q,
main=c("Moran Scatterplot of Standardized",
"Percent of Population with Asthma"))

The best-fit line slope of Moran’s I is positive and indicates that the spatial autocorrelation of the standardized asthma population values and those of the neighboring zip codes are positive. Additionally, this assertion is supported by the fact that the majority of observations are in the first and third quadrants.

moran.test(a2$stdPercent, a2w_q)
Moran I test under randomisation
data: a2$stdPercent
weights: a2w_q
Moran I statistic standard deviate = 9.653, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic         Expectation            Variance
0.557054993            -0.008474576         0.003432319

The randomized Moran I test returned a p-value of less than 0.00000000000000022 (or 2.2e-16), Moran I value of 0.557, expected value of -0.008, and the statistic standard deviate of 9.653. The p-value suggests that the null hypothesis that percentage of population with asthma is not spatially randomly in the zip codes. Both the slope of the Moran scatterplot and the results of the Moran I test suggest that there is positive spatial autocorrelation.

LISA Cluster Map

The obvious clusters of high-high regions of the LISA cluter map reside in quadrant 1 of the global Moran’s I scatterplot representing positive spatial autocorrelation (Murack, 2015). The clusters of low-low regions seen in the LISA map reside in quadrant 3 of the global Moran’s I scatterplot also representing positive spatial autocorrelation. The high-low clusters of the LISA map are in quadrant 4 of the global Moran’s I scatterplot and the low-high clusters reside in quadrant 2 which indicates these values are negatively spatial autocorrelated or outliers. The LISA clustering shows that there are a larger number of spatial clusters, represented by the high-high and low-low classification, than there are spatial outliers, represented by the high-low and low-high classification (Murack, 2015). The Moran’s I value indicates positive spatial autocorrelation and the connection between the LISA map and the locations of regions on the Moran’s I scatterplot identify the strength of influence the regions hold. The regions designated by the LISA map as high-high reside closer to the origin of the Moran’s I scatterplot than regions with other classifications. The LISA clusters indicate that the region “is more similar to its neighbors than would be the case under spatial randomness” (Murack, 2015, p.5). The Moran’s I positive value also indicates that clustering is occuring for asthma rates within the zip codes analyzed.

Getis-Ord local G-Statistic Map

While the results of the Getis-Ord local G-statistic and LISA maps are minimal, there are some discrepancies that between the results due to the differences in how these measures are calculated. First, differences are seen in the production of neighborless clusters in for the G-statistic cluster map. Additionally, the G-statistics reassigned the low-high cluster from the LISA plot as high. The G-statistic clusters also assigned the LISA plot clusters with a value of high-low as low. The G-statistics and LISA are both local measures that are calculated differently from one another. The G-statistics “identifies areas of a study region where a larger than average mass of the sum of all X values for that region are concentrated (yielding positive spatial autocorrelation), or areas of a study region where a smaller than average mass of the sum of all X values are concentrated (yielding negative spatial autocorrelation) (Rigby, 2023). While the LISA map compares similarities between locations and those that surround them (Murack, 2015; Rigby, 2023). The Gi cluster map shows that percentages of population with asthma in zip codes is significantly clustered and does not align with a random spatial pattern (Murack, 2015).

References

Murack, J. (2015). Spatial_Weights_GeoDa.

Rigby, D. (2023). Lecture Video 5: Local Indicators of Spatial Autocorrelation: Applied Geospatial Statistics.

Previous
Previous

Point Pattern Analysis

Next
Next

Clustering, Regression, Goodness of Fit Statistics, Spatial Lag and Error Models