Morphometry is a the study of shape and its variations, allowing quantitative comparisons of shape among groups. Shape analysis typically also includes a consideration of size, since shape typically varies with size (allometric shape variation). Therefore morphometrics can be a good way to examine growth and scaling.

The borealis package and this vignette are designed for those beginning to do work and analysis in morphometry. Advanced undergraduate and graduate students, as well as investigators new to morphometry will hopefully find this tutorial helpful.

Overview of the workflow

A typical workflow for geometric morphometrics (GMM) looks this this:

  1. Image specimens - For 2D morphometrics, this typically involves photography
  2. Digitize anatomical position of landmarks
  3. Data curation - Are the specimens all roughly facing the same direction? Are there jointed structures whose position may interfere with downstream steps?
  4. Generalized Procrustes Analysis (GPA) aligns specimens to one another, minimizing variance due to relative position, rotation and size. Size is typically retained as a separate variable that can be included in the analysis later.
  5. Data curation - Are there obvious outliers that should be removed?
  6. Ordination by tangent space projection (similar to PCA) highlights differences among individuals and groups
  7. Model influences on size and shape (due to allometric scaling, evolutionary relationships, environmental factors, etc.)
  8. Disparity comparisons among groups of specimens
  9. Modularity tests among landmarks

Morphometry in general nd GMM in particular is a rich discipline where people have developed a number of excellent analytical tools. However, many of these tools can be difficult to use. The borealis package aims to provide R users with a set of relatively simple functions to perform GMM analysis, while maintaining an open-ended flexibility.

Importantly, GMM typically involves a number of data processing steps. The borealis package embraces the concept of data provenance. Each time a dataset is modified, functions in the package add to a list element recording important details about the execution and results of that step. This information can then be written to a markdown file as a report of the data’s history and handling throughout the workflow.

borealis also includes functions that produce publication-quality figures with minimal user effort.

Installing the R package

If you’ve previously loaded the package, it is helpful to “unload” it before installing a newer version.

detach("package:borealis", unload = TRUE)

If you’ve never installed a package from Git Hub before, it may be necessary for you to install the package devtools.

install.packages("devtools")

Install borealis from GitHub as shown below.

devtools::install_github("aphanotus/borealis")

Load the package as any other.

library(borealis)

Managing raw coordinate data

For landmark-based geometric morphometrics (GMM), the first step is to “digitize” coordinate positions in digital images of each specimen. ImageJ provides the easiest way to do this.

Digitizing specimens in ImageJ

Before you begin, think carefully about the landmarks you should use to represent the shape of the structure you’re interested in. Areas with a greater density of landmarks will capture more shape variation among specimens. However, you don’t necessarily need dozens of landmarks. Important insights can be obtained from a simple set of landmarks.

Often, it’s helpful to try a number of different landmark configurations on a small group of diverse specimens to see how well they perform. Importantly, you want landmarks that are present in all specimens and that can be placed with good confidence by everyone involved in the digitizing effort. I suggest writing up a good map of all the landmarks before proceeding with the digitization.

  1. Open a specimen image in ImageJ. Be sure metadata, such as sex, species, etc., are recorded somewhere. Be sure a scale bar is present in the image. If needed flip or rotate images so they are relatively consistent. Omit specimens with missing anatomical features.
  2. Use the line tool to measure the scale and trace any linear anatomical features.
  3. Use the multi-point tool to place landmarks. Place each landmark, in order. Consult your anatomy guide until you’re familiar with everything!
  4. Press [cmnd] M to record data in a table in ImageJ.
  5. Select and Copy the measured data from ImageJ.
  6. Paste the values into Google Sheets (or Excel). Be sure to keep the data in a consistent format.

The first row of the spread sheet should provide column names. Enter the X and Y values under columns named “x” and “y”. There must be a column giving specimen IDs, such as “specimen.ID”, and one for scale. Optionally, add other columns to record metadata. The ID names, scale and other metadata, should be entered on the row for the first landmark of the specimen. Any linear measurements accompanying the specimen should be treated as metadata. Coordinate positions and metadata for each specimen should appear in a consecutive block of rows. In other words, in you have 3 landmarks: row 1 is the header, rows 2-4 are the data for specimen 1, rows 5-7 are the data for specimen 2, etc. This spreadsheet format is convenient, since it allows rapid copy-and-paste of data from ImageJ with minimal formatting. It will also facilitate converting the coordinate positions and encoding the metadata into the tps format in the next step.

Convert raw landmark coordinates into the tps file format

The tps (“thin plate spline”) file format is one of the most commonly used formats among different GMM software (Rohlf 2015). Creating a large tps file manually can be difficult. A stray space or tab can prevent downstream software from handling it properly, and those issues can be very hard to track down. For that reason, borealis includes a simple function create.tps to convert spreadsheet data, as described above, into a tps file. It will also begin recording data provenance and embed metadata for each specimen.

The scale value can also be inverted, by setting invert.scale = TRUE. The default for downstream functions is to apply scale values by multiplication. Typically this is appropriate when scale is recorded as unit distance (e.g. mm) per pixel. However, if scale is recorded in pixels per unit distance (e.g. pixels/mm) it will be appropriate to first invert the scaling factor before importing coordinate data.

create.tps(
  input.filename = "wings.csv",
  output.filename = "wings.tps",
  id.factors = c('species','caste','digitizer'),
  include.scale = TRUE,
  invert.scale = TRUE)

The file that’s created will looks like this.

# ## TPS file creation 
# 
# Created by user `drangeli` with `borealis::create.tps` on Monday, 22 June 2020, 15:11:47
# 
# Input file:  ~/Documents/3.research/2.Bombus.mouthparts/Bombus wing GMM/Bombus Wings - forewings.csv 
# 
# The dataset is 20 x 2 x 99 (*p* landmarks x *k* dimensions x *n* specimens)
# 
# Metadata are encoded in specimen ID lines from the following factors:
# - species
# - caste
# - digitizer
# 
# Metadata separator: __
# 
# **Scale** included and **inverted** from the original dataset.
# 

LM=20
1942 354.667
1650 336
1690 361.333
1747.333 377.333
1851.333 400
1867.333 440.667
1832.667 503.333
1819.333 507.333
1743.333 479.333
1674 486
1608.667 419.333
1488.667 387.333
1231.333 467.333
1471.333 518
1478 584.667
1478 594
1724.667 590
1447.333 650
1192.667 539.333
1479.333 679
ID=DRA190718-001__vag__W__JL
SCALE=0.00702814773166532

Import tps data into R

To start a workflow, you will typically begin from a tps file. The function read.tps can do several early steps, but let’s start by looking at a simple usage.

shapes <- read.tps("wings.tps")

This creates the object shapes. It is what R knows as a list, a type of data object that can include many different elements, all accessible together. (Now the name “shapes” is pretty generic. If you will be doing multiple GMM analyses, it’s a good idea to modify your code to use a more descriptive name.) You can reveal the components of the list using the names function.

names(shapes) 
[1] "coords"          "landmark.number" "specimen.number" "scaled"
[5] "metadata"        "provenance"     

The coords element is a 3-dimensional array of the coordinated positions. landmark.number and specimen.number are obvious. The element scaled is a logical value stating whether the coordinates have already been scaled. If this is true (it is here), then you should avoid applying any scale a second time! The element metadata is a data frame with metadata. Each specimen is a row in this table, and ID names are the same in the metadata row and coords element. Specimens are also in the same order in those two elements. Finally the provenance element contains data provenance, which will be added to as the workflow continues.

names(shapes$provenance)
[1] "create.tps" "read.tps"

You can look at the contents of the data provenance elements this way.

cat(unlist(shapes$provenance$read.tps))
## TPS data import

Performed by user `drangeli` with `borealis::read.tps` on Tuesday, 23 June 2020, 17:40:40

Metadata were extracted from specimen ID lines for the following factors:
- specimen.id
- species
- caste
- digitizer

View the shape data

The borealis package has a simple function for viewing the relative positions of landmarks. The function is designed to be user-friendly. It will automatically detect whether you’re giving it coordinates for one specimen, an array with coordinates for multiple specimens, or a list object with such an array as one component. So all of the commands below will produce the same plot.

landmark.plot(shapes$coords[,,1])
landmark.plot(shapes$coords)
landmark.plot(shapes)
landmark.plot(shapes, specimen.number = 1)

The function defaults to looking at the first specimen in the array, but you can also specify others.

landmark.plot(shapes, specimen.number = 2)

Viewing shapes with landmark connections

Often it’s convenient to look at shape data with the landmarks connected in a way that reflects their biological meaning. You can ask landmark.plot to include these connections if you first define them as a matrix. In the example below, each connections is indicated by adjacent landmark numbers.

fw.links <- matrix(c(1,2, 1,5, 5,4, 4,3, 3,2, 5,6, 6,7, 7,8, 8,9, 9,4, 3,11, 11,12, 11,10, 9,10, 10,14, 14,15, 15,16, 16,18, 18,20, 16,17, 17,8, 12,13, 13,19, 14,13, 18,19, 2,12),
                   ncol = 2, byrow = TRUE)

landmark.plot(shapes, links = fw.links)

Viewing multiple specimens

It’s also frequently helpful to look at more than one specimen side-by-side. If you provide a vector to the specimen.number argument, the plotting function will display multiple panels for those specimens.

landmark.plot(shapes, specimen.number = 1:4, links = fw.links)

Bombus forewing data

The data as they exist in our shape object right now are included in the borealis package. You could choose to start the tutorial at this point using the following command.

data("Bombus.forewings", package = "borealis")
shapes <- Bombus.forewings

Reflect specimens

Occasionally, a few specimens in a dataset are entered upside-down or “reflected” relative to the others. Or perhaps ImageJ recorded XY coordinates with the origin at the lower left, instead of the upper left. In these instances, it’s helpful to go through all the specimens and orient them to have certain landmarks up or to the left.

The function align.reflect will accomplish this, and update the data provenance to list all the reflections that are made.

shapes <- align.reflect(shapes, top.pt = 2, left.pt = 1, links = fw.links )

To save time, the reflection step can be included in the initial read.tps statement.

shapes <- read.tps("wings.tps", 
                   reflect.specimens = TRUE,
                   top.pt = 2, left.pt = 1, links = fw.links)

Angular alignment

Joints can introduce nuisance variation in landmark-based geometric morphometrics. The borealis function align.angle rotates a subset of landmarks such that they come to a consistent angle. This step should be run before Procrustes alignment, and it is robust to differences in the relative position, orientation and size of specimens.

However, it’s not always necessary to run this step. If the structure you’re analyzing isn’t jointed, then there’s no need for it. So far in this vignette, we’ve been working with a bumblebee wing dataset that doesn’t have obvious joints. So, let’s switch to another sample dataset, mantis, which includes simulated shapes for the femur and tibia of 100 mantis forelegs.

data("mantis", package = "borealis")

# Define mantis.lines
{
  x <- 1:16
  mantis.lines <- matrix(c(x[-length(x)],x[-1]), ncol = 2)
  mantis.lines[10,] <- c(10,1)
  mantis.lines[15,] <- c(15,6)
  mantis.lines <- rbind(mantis.lines,
                        matrix(c(5,11, 6,11, 13,16, 14,16), ncol = 2, byrow = TRUE))
}

landmark.plot(mantis, specimen.number = 1:3, panels = c(1,3), links = mantis.lines)

As you can see, the angle of the femur and the tibia (the smaller segment) varies for each specimen. If you were to proceed into GPA with these data as they are, that angle would dominate the shape variance and obscure any signal from more interesting biological factors.

These angles can be standardized using the align.angle function. It takes several arguments, angle.pts.1, art.pt, and angle.pts.2, which define the angle on which we want to focus. art.pt is the vertex of the angle and the articulation point for rotation. rot.pts defines the landmarks to actually rotate. Each of these arguments can be the number of a single landmark or a vector of several landmark numbers. If multiple landmarks are provided, their mean position is used. Using multiple landmarks helps mitigate any variance individual landmarks may have. However, you should use your knowledge of anatomy to choose landmarks that make sense for your subject.

mant.2 <- align.angle(mantis,
                      art.pt = 11,
                      angle.pts.1 = c(1:10),
                      angle.pts.2 = c(12:15),
                      rot.pts = c(12:16) )

landmark.plot(mant.2, specimen.number = 1:3, panels = c(1,3), links = mantis.lines)

Angle alignment reduced variance in centroid size-scaled landmark distances by 20.4%.

By default, the function will plot overlaid histograms using the distribution of variance in inter-landmark distances, scaled by each specimen’s centroid size. This can be used to determine how well the alignment improves the consistency of landmark positions, with lower values indicating greater consistency.

The angle that is enforced on all specimens is based on the mean angle of all specimens, by default. However, arguments to reference.specimen can indicate a subset of specimens on which to base the reference angle. You can also add angular value to all specimens using the angle argument.

Procrustes alignment

Generalized Procrustes analysis minimizes the variance in distances between the landmarks of different specimens. It does this by translating, rotating and re-sizing each specimen’s coordinates, but not changing their relative distances to one another. In this way, GPA makes shapes comparable and removes trivial issues introduced by their original imaging. GPA also retains size information as centroid size.).

wing.gpa <- align.procrustes(shapes)

What’s happening behind the scenes is that borealis relies on the function gpagen in the R package geomorph. The output of the align.procrustes contains the same list elements produced by geomorph::gpagen. However it also retains any list elements from the input and add to the data provenance element.

GPA with semilandmarks

True landmarks are based on reliable anatomical features with unambiguous positions (Zelditch et al. 2012). However, it’s also possible to include “semilandmarks”, positions that are defined by a line or curve. For example, the edge of a limb does not have any true, fixed anatomical landmarks, but one or more semilandmarks can be defined along the edge to capture its shape. Semilandmarks are defined relative to their neighbors (which can also be semilandmarks). During GPA, semilandmarks are allowed to “slide” in the axis determined by their defining neighbors, so that they are positions equidistant from those neighbors in that axis. But their deviation from that axis is retained, in order to capture that shape variation. Ideally, semilandmarks are initially placed very close to their ultimate position. But sliding them during GPA helps optimize the alignment of specimens, and reduces variance in shapes that is not biologically meaningful (Zelditch et al. 2012).

In the bumblebee wing dataset, all landmarks are true landmarks, defined by the intersections of veins. However, to provide an example, let’s consider landmarks 6 and 15 to be sliding semilandmarks. To define the landmarks, gpagen and align.procrustes will take a matrix with 3 columns. Each row defines one semilandmark. The semilandmark itself is in the middle column. The flanking neighbors, which define the axis for sliding, are in columns 1 and 3.

semiLMs <- matrix(
  c(5,6,7,  14,15,16),
  ncol = 3, byrow = TRUE
)
semiLMs
     [,1] [,2] [,3]
[1,]    5    6    7
[2,]   14   15   16

The matrix defining semilandmarks is then included in the call to align.procrustes with the curves argument.

wing.gpa <- align.procrustes(shapes, curves = semiLMs)

The use of semilandmarks and their definitions is recorded in the data provenance.

GPA with outlier detection

After initial GPA, it’s typically important to remove any obvious outliers. These may be specimens where coordinate positions were recorded incorrectly. There presences in the dataset can skew signals that are actually of interest.

align.procrustes facilitates interactive outlier detection and removal. Just include the argument outlier.analysis = TRUE. The function will run geomorph::plotOutliers, then display warp grids for the most extreme shapes and prompt the user to remove some number of them. This can be done iteratively.

wing.gpa <- align.procrustes(shapes, outlier.analysis = TRUE)

In this case, removing the two most extreme outliers looks prudent.

The final GPA plot looks good: The individual specimen landmarks in gray are very close to the mean landmarks shown in black.

The geomorph data frame

The geomorph package (Adams et al. 2020) provides a powerful set of tools for GMM analysis. Many functions require shape, size and other metadata to be in a data structure called a geomorph.data.frame. In order to retain data provenance and other elements of our data object, we can use the function listed.gdf to convert the gpagen and metadata elements of our list into a geomorph.data.frame while still keeping it within a list with the other elements.

wing.gpa <- listed.gdf(wing.gpa)

We can then take a look at the new data structure. This can be done using the names function, as above. However, for complex data structures, the str function is often convenient. The output shows the nested relationships among list elements and previews the contents of each one.

str(wing.gpa)
List of 5
 $ gdf            :List of 6
  ..$ specimen.id: chr [1:97] "DRA190815-001" "DRA190815-002" "DRA190815-003" "DRA190815-004" ...
  ..$ species    : chr [1:97] "vag" "imp" "imp" "imp" ...
  ..$ caste      : chr [1:97] "W" "W" "W" "W" ...
  ..$ digitizer  : chr [1:97] "JL" "JL" "JL" "JL" ...
  ..$ coords     : num [1:20, 1:2, 1:97] -0.405 -0.0319 -0.0621 -0.1388 -0.2011 ...
  .. ..- attr(*, "dimnames")=List of 3
  .. .. ..$ : chr [1:20] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:2] "X" "Y"
  .. .. ..$ : chr [1:97] "DRA190815-001" "DRA190815-002" "DRA190815-003" "DRA190815-004" ...
  ..$ Csize      : Named num [1:97] 7.02 6.91 6.54 7.2 6.93 ...
  .. ..- attr(*, "names")= chr [1:97] "DRA190815-001" "DRA190815-002" "DRA190815-003" "DRA190815-004" ...
  ..- attr(*, "class")= chr "geomorph.data.frame"
 $ landmark.number: int 20
 $ specimen.number: int 97
 $ scaled         : logi TRUE
 $ provenance     :List of 8
  ..$ create.tps       : chr "## TPS file creation \n\nCreated by user `drangeli` with `borealis::create.tps` on Tuesday, 23 June 2020, 23:50"| __truncated__
  ..$ read.tps         : chr "## TPS data import\n\nPerformed by user `drangeli` with `borealis::read.tps` on Tuesday, 23 June 2020, 23:51:33"| __truncated__
  ..$ align.reflect    : chr "## Reflection alignment\n\nPerformed by user `drangeli` with `borealis::align.reflect` on Saturday, 10 October "| __truncated__
  ..$ align.procrustes : chr "## Generalized Procrustes analysis\n\nPerformed by user `drangeli` with `borealis::align.procrustes` on Saturda"| __truncated__
  ..$ listed.gdf       : chr "## Geomorph data frame conversion\n\nPerformed by user `drangeli` with `borealis::listed.gdf` on Saturday, 10 O"| __truncated__

Subsetting shape data

Later in the analysis we will have reasons to work with a subset of the full dataset. As data structures become complex, subsetting them becomes difficult. For example, there is no convenient way to subset a geomorph.data.frame using traditional R bracketing.

Subsetting specimens

The function subsetGMM provides a shortcut for subsetting geomorph.data.frame objects and data structures produced by the borealis functions. As always data provenance entries record these changes. In the example below, we will subset only the bumblebee wing specimens that present workers.

x <- which(wing.gpa$gdf$caste=="W")
workers <- subsetGMM(wing.gpa, specimens = x)

Subsetting landmarks

It’s also possible to use subsetGMM to select a subset of the landmarks in a data structure.

distal.wing <- subsetGMM(wing.gpa, landmarks = 1:10)
landmark.plot(distal.wing)

Custom additions to data provenance

It is possible you may need to make custom modifications to your shape dataset. In order to maintain data provenance in these circumstances the function add.provenance allows you to document any custom data manipulation.

wing.gpa$gdf$digitizer[1] <- "FJ"

wing.gpa <- add.provenance(
  wing.gpa,
  name="error.correction",
  title = "Corrected error in the metadata",
  text = "Upon checking lab notes, an error needed to be corrected. The digitizer of specimen 1 appears to have been FJ, not JL." )

cat(wing.gpa$provenance$error.correction)
## Corrected error in the metadata

Performed by user `drangeli` on Saturday, 10 October 2020, 10:15:50

Upon checking lab notes, an error needed to be corrected. The digitizer of specimen 1 appears to have been FJ, not JL.

Reporting data provenance

After creation of a geomorph.data.frame, a dataset is usually done with pre-processing and is ready for analysis. That means that the data will not change from this point forward. If so, then a final report on data provenance can be generated from the records kept in our data object.

write.provenance(wing.gpa, 
                 output.filename = "~/Desktop/wing.shape.provenance.md",
                 title = "Preliminary wing shape data provenance")

This creates a file in markdown format. These files can be treated as plain text and viewed directly, as below.

# Preliminary wing shape data provenance

 ## TPS file creation 

Created by user `drangeli` with `borealis::create.tps` on Tuesday, 23 June 2020, 23:50:46

Input file:  /Users/drangeli/Documents/3.research/2.Bombus.mouthparts/Bombus wing GMM/Bombus Wings - forewings.csv 

The dataset is 20 x 2 x 99 (*p* landmarks x *k* dimensions x *n* specimens)

Metadata are encoded in specimen ID lines from the following factors:
- species
- caste
- digitizer

Several free markdown viewers and editors also exist, such as Typora.

Ordination

Principal Component Analysis (PCA)

To visualize variance in shapes, we perform a form of ordination known as Kendall’s tangent space projection. Mathematically, this is similar to principal component analysis (PCA).

wing.pca <- gm.prcomp(wing.gpa$gdf$coords)

To examine the eigenvalues and proportional variance for each dimension in the analysis, run summary.

summary(wing.pca)
Ordination type: Principal Component Analysis 
Centering and projection: OLS 
Number of observations 97 
Number of vectors 40 

Importance of Components:
                              Comp1        Comp2        Comp3        Comp4        Comp5
Eigenvalues            0.0002302972 0.0001681885 8.952373e-05 6.013661e-05 4.486361e-05
Proportion of Variance 0.2694561165 0.1967867258 1.047460e-01 7.036203e-02 5.249206e-02
Cumulative Proportion  0.2694561165 0.4662428423 5.709889e-01 6.413509e-01 6.938430e-01

While it can be useful to know how to work with the raw output from PCA analysis, often people are most interested in plotting the first two PC axes. PCA is particularly helpful for understanding differences in shapes among groups. And group membership can be mapped onto the shape space plot.

plot(wing.pca, col = as.factor(wing.gpa$gdf$species))

sp <- as.factor(wing.gpa$gdf$species)
legend("topright", levels(sp), 
       col = c(1:length(levels(sp))),
       cex = 0.9, pch = 1, box.lwd = 0,
       bg="transparent")

This is okay, but it’s not very pretty. The borealis package has a function to plot these values using ggplot2 with colorblind-friendly palettes.

ggGMMplot(wing.pca, group = wing.gpa$gdf$species, 
          group.title = 'species', 
          convex.hulls = TRUE, include.legend = TRUE)

It can also be helpful to examine a version of the morphospace plot that provides example shapes. This is possible too, using ggGMMplot with the argument backtransform.examples = TRUE. It then requires a reference shape, which can be supplied with the geomorph function mshape, which calculates a mean shape. Since shape differences can be subtle, it can be helpful to apply a magnification factor using the bt.shape.mag argument.

ggGMMplot(wing.pca, group = wing.gpa$gdf$species, 
          group.title = 'species', convex.hulls = TRUE,
          backtransform.examples = TRUE,
          ref.shape = mshape(wing.gpa$gdf$coords),
          shape.method = "TPS",
          bt.shape.mag = 3,
          bt.links = fw.links)

Phylogenetic PCA

If different groups occupy similar areas of morphospace, an important follow-up question is whether that similarity is due to a shared evolutionary history or convergence (perhaps due to similar ecological influences). One way to examine this question is to look at ordination of shape data along with some representation of group relationships. Let’s look at two approaches.

We have a good phylogeny for species of bumblebees in our sample dataset. The tree is the consensus from a RAxML analysis of five genes, sequenced by Cameron et al. (2007), for 26 species found in northeastern North America. This tree is includes as data in the borealis package.

First, it’s necessary for us to find mean shapes for each species. If you have a tree with tips that represent each specimen in your shape dataset, then this step won’t be necessary.

Because shapes may differ by caste, we’ll limit this analysis to workers. We’ll subset the shape data using using the function subsetGMM, as shown above.

x <- which(wing.gpa$gdf$caste=="W")
workers <- subsetGMM(wing.gpa, specimens = x)

Next, the function coords.subset will split the coordinate shape data into groups, in this example, by species.

coords.by.species <- coords.subset(workers$gdf$coords, group = workers$gdf$species)

This produces a list with elements named for each species. Each element is an array of shape data for the specimens in that species.

names(coords.by.species)
[1] "bimac" "bor"   "ferv"  "imp"   "tern"  "terri" "vag"  

Next, we use the base R function lapply to apply a function to each element of the list. The function mshape can find the mean shape for each element.

mshape.by.species <- lapply(coords.by.species, mshape)

The output is a new list, with elements named for each species, but now each element contains just one shape representing the average for that species.

Ideally, we’d be all set at this point. However, this object still exists as a list. The functions we’ll use next will expect their input to be a 3-dimensional array. So we’ll use some base R commands to reformat the data with substantively changing it.

species.names <- names(mshape.by.species)
mshape.by.species <- array(
  data = unlist(mshape.by.species),
  dim = c(dim(mshape.by.species[[1]])[1], 2, length(species.names)),
  dimnames = list(NULL,NULL,species.names)
)

Now that the shape data is ready, we need to prepare the phylogeny. Start by making a copy of the tree object that’s included with borealis, and plotting it out.

data("Bombus.tree", package = "borealis")
btree <- Bombus.tree
plot(btree)

We’ll need to par it down to only the taxa included in our shape analysis. The names will also need to be changed to match the abbreviated species names we’ve used in the shape dataset. Doing some of these things will require tools from the phytools package. (If necessary run install.packages("phytools")).

The Bombus.tree dataset includes species name abbreviations in the element code.name, which we can copy to the tip.label element, where functions will look for taxon names.

btree$tip.label <- btree$code.name

# Check that species.names (our abbreviations in the shape data) are all present in the tree
species.names %in% btree$tip.label

library(phytools)
btree <- keep.tip(btree, species.names)
plot(btree)

Now we have a phylogeny and a set of mean shapes that can be combined for phylogenetic PCA.

pca.w.phylo <- gm.prcomp(mshape.by.species, phy = btree)
plot(pca.w.phylo, phylo = TRUE, main = "PCA with phylogeny")

The space plotted here is the first two principal component axes. It’s different from the plot we saw above, because each species is represented by only one point. The points are connected by branches that reflect their evolutionary history, as described by the phylogenetic tree above. Based on this result, we might conclude that “bor” and “ferv” (Bombus borealis and B. fervidus) have wing shapes different from the other species in the dataset, and that those shape differences are coincident with a shared ancestry of those two species. In contrast, while “imp” and “terri” (B. impatiens and B. terricola) have similar wing shapes, they are relatively distant in their relatedness. An important caveat here is that the output of the PCA here is exactly the same as before we considered the phylogeny. In this method, the phylogeny is used for plotting only.

geomorph provides a more sophisticated way to represent shape and relatedness in a plot like this. As described by its authors, phylogenetically-aligned PCA (PaCA) “…provides an ordination that aligns phenotypic data with phylogenetic signal, by maximizing variation in directions that describe phylogenetic signal, while simultaneously preserving the Euclidean distances among observations in the data space.” (Kaliontzopoulou 2020)

paca <- gm.prcomp(mshape.by.species, phy = btree, align.to.phy = TRUE)
plot(paca, phylo = TRUE, main = "Phylogenetically-aligned PCA")

In this case, the two methods provide very similar results. However, it will be wise to try both and compare the results.

Modeling

Before we discuss modeling, I want to offer a disclaimer. This subject is complex. I urge everyone to take a good statistics course; preferably several. That said, let’s briefly review what modeling is and then discuss how it can be done with GMM data.

All statistical models ask some version of a simple question: If a set of measurements has some variance, to what degree do certain factors explain that variance? Many different statistical methods exist to ask this question with different kinds of data.

Morphometric data sets are multivariate. In a traditional ANOVA, one dependent variable is measured and modeled based on one or more factors. The analysis of shape data however is complicated by the fact that the dependent variable (shape) consists of a constellation of coordinate positions and their relative distances from one another. Therefore, specialized multivariate analysis methods are necessary.

geomorph provides functions for model construction and comparison (Collyer 2020). In arguments to the function procD.lm, you describe a model using the ~ (tilda) character. I think of this as meaning “y is a function of x”. Below we will model shapes (coords) as a function of the log of centroid size (log(Csize)). These data all come from the wing.gpa$gdf geomorph data frame.

i <- 1e3-1 # number of iterations
size.model <- procD.lm(coords ~ log(Csize), data = wing.gpa$gdf, iter = i) 

The model object can then be examined. By passing it to the anova function, we get an ANOVA table where we can examine the statistics and p-values for individual factors in the model.

size.model
anova(size.model)
Linear Model fit with lm.rrpp

Number of observations: 97
Number of dependent variables: 40  
Data space dimensions: 40  
Sums of Squares and Cross-products: Type I
Number of permutations: 1000
Analysis of Variance, using Residual Randomization
Permutation procedure: Randomization of null model residuals 
Number of permutations: 1000 
Estimation method: Ordinary Least Squares 
Sums of Squares and Cross-products: Type I 
Effect sizes (Z) based on F distributions

           Df       SS        MS    Rsq      F     Z Pr(>F)   
log(Csize)  1 0.005193 0.0051933 0.0633 6.4194 4.142  0.001 **
Residuals  95 0.076855 0.0008090 0.9367                       
Total      96 0.082049                                        
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Call: procD.lm(f1 = coords ~ log(Csize), iter = i, data = wing.gpa$gdf)

From these results, we can conclude that wing size is a strong predictor of wing shape.

Let’s try some more complex models.

species.model <- procD.lm(coords ~ log(Csize) + species, data=wing.gpa$gdf, iter=i) 
sp.caste.model <- procD.lm(coords ~ log(Csize) + species + caste, data=wing.gpa$gdf, iter=i) 

anova(species.model) 
anova(sp.caste.model) 
           Df       SS        MS     Rsq      F       Z Pr(>F)   
log(Csize)  1 0.005193 0.0051933 0.06330 10.665  5.1665  0.001 **
species     7 0.034002 0.0048575 0.41442  9.975 10.6790  0.001 **
Residuals  88 0.042853 0.0004870 0.52229                         
Total      96 0.082049                                           
           Df       SS        MS     Rsq       F       Z Pr(>F)   
log(Csize)  1 0.005193 0.0051933 0.06330 10.8000  5.1911  0.001 **
species     7 0.034002 0.0048575 0.41442 10.1016 10.7222  0.001 **
caste       1 0.001018 0.0010181 0.01241  2.1173  2.1400  0.025 * 
Residuals  87 0.041835 0.0004809 0.50988                          
Total      96 0.082049

So far, all these models have factors that seem to be strong predictors of wing shape. But are the more complex models really better than a simple one? In modeling, there is a danger of false confidence in highly complex models. A model with many factors can partition variance so finely, that it appears to fit the data very well. Every factor have have a very low p-value. But is a more complex model a meaningful improvement over a simple one?

Model comparisons

geomorph allows comparisons among models by simply passing multiple model objects to the anova function. (As an aside, this is not the same “anova” function that is used by some other R packages. When you load geomorph it will overwrite those others for the duration of your R session, or until you load a package with another function called anova.)

The first model passed to anova is taken as the null model. Each of the others is compared to the null.

anova(size.model, species.model, sp.caste.model)
Analysis of Variance, using Residual Randomization
Permutation procedure: Randomization of null model residuals 
Number of permutations: 1000 
Estimation method: Ordinary Least Squares 
Effect sizes (Z) based on F distributions

                                      ResDf Df      RSS       SS        MS     Rsq      F      Z     P Pr(>F)
coords ~ log(Csize) (Null)               95  1 0.076855                    0.00000    
coords ~ log(Csize) + species            88  7 0.042853 0.034002 0.0048575 0.41442 9.9750 10.679 0.001
coords ~ log(Csize) + species + caste    87  8 0.041835 0.035021 0.0043776 0.42683 9.1036 10.899 0.001
Total                                    96    0.082049    

This result suggests that both of the more complex models are improvements over the simple null model that explains wing shape by size alone. The two more complex models can then be compared directly to one another.

anova(species.model, sp.caste.model)
                                      ResDf Df      RSS        SS        MS      Rsq      F    Z     P Pr(>F)
coords ~ log(Csize) + species (Null)     88  1 0.042853                     0.000000
coords ~ log(Csize) + species + caste    87  1 0.041835 0.0010181 0.0010181 0.012409 2.1173 2.14 0.025
Total                                    96    0.082049        

In this case, the model that adds “caste” as a predictive factor does not appear to be much better than the model with only size and species as factors.

A true null model

In the examples above, we compared models that included multiple factors. So our “null” model in each comparison simply needed to omit the factor of interest (e.g. caste in the last example). What if we wanted to make a comparison and our model only had one factor? What would be our null model then? We can actually define a silly null model that contains no factors at all: coords ~ 1

null.model <- procD.lm(coords ~ 1, data=wing.gpa$gdf, iter=i)

Post hoc pairwise comparisons

As with any ANOVA, these tests tell us that species is a factor that explains wing shape, at least in part. But it does not tell us which species differ from one another. Answering that question requires what are broadly called post hoc tests. There are many ways to do post hoc tests, but the simplest and one of the most common is to examine all pairwise contrasts. This can be dome using a function provided in the RRPP package, which underlies much of geomorph.

Right now our dataset includes some species with too few samples for the pairwise function to operate. You can examine sample sizes like this.

c(with(wing.gpa$gdf, by(species, species, length)))
bimac   bor  ferv   imp   san  tern terri   vag 
   23     3     2    28     1    26     2    12

So, first we’ll subset the data to just the four species that have good representation. Then we’ll create a model similar to the one we built earlier, with size and species as predictive factors.

common.sp <- subsetGMM(wing.gpa, specimens = (wing.gpa$gdf$species %in% c("bimac","imp","tern","vag")) )

common.sp.model <- procD.lm(coords ~ log(Csize) + species, data=common.sp$gdf, iter=i) 
anova(common.sp.model)
           Df       SS        MS     Rsq       F      Z Pr(>F)   
log(Csize)  1 0.004258 0.0042576 0.06657  9.4733 4.6087  0.001 **
species     3 0.021944 0.0073146 0.34312 16.2753 9.3920  0.001 **
Residuals  84 0.037752 0.0004494 0.59031                         
Total      88 0.063953 

As before, both factors are significant predictors of wing shape.

We should also construct a null model, that omits the factor we’re interested in. In this case that’s species. – The pairwise function we’ll use later can guess at this, but it’s good practice to be explicit about defining the null model. It can help avoid incorrect assumptions.

common.size.model <- procD.lm(coords ~ log(Csize), data=common.sp$gdf, iter=i) 

Now we can conduct the post hoc pairwise comparisons.

common.sp.pw <- pairwise(fit = common.sp.model, 
                         fit.null = common.size.model,
                         groups = common.sp$gdf$species)
summary(common.sp.pw)
Pairwise comparisons

Groups: bimac imp tern vag 

RRPP: 1000 permutations

LS means:
Vectors hidden (use show.vectors = TRUE to view)

Pairwise distances between means, plus statistics
                    d  UCL (95%)         Z Pr > d
bimac:imp  0.02644867 0.01059573 10.404643  0.001
bimac:tern 0.02364860 0.01426434  5.828100  0.001
bimac:vag  0.02433783 0.01368575  6.638058  0.001
imp:tern   0.03171487 0.01286605 10.051005  0.001
imp:vag    0.04044576 0.01301518 13.426703  0.001
tern:vag   0.02323543 0.01431901  5.549254  0.001

In this example, all of the species have significant differences from one another in their contribution to wing shape. Importantly, these effects are independent of the influence of wing size on shape. Because the model includes a size factor (log(Csize)), that influence is already accounted for. We’ve asked the pairwise function to make comparisons among species, using the argument groups = common.sp$gdf$species.

If we wanted to compare the influence of species, controlling for the influence of caste as well, we could do so this way.

common.caste.model <- procD.lm(coords ~ log(Csize) + caste, data=common.sp$gdf, iter=i) 
common.sp.caste.model <- procD.lm(coords ~ log(Csize) + species + caste, data=common.sp$gdf, iter=i)
common.sp.caste.pw <- pairwise(fit = common.sp.caste.model, 
                               fit.null = common.caste.model,
                               groups = common.sp$gdf$species)
summary(common.sp.caste.pw)
Pairwise distances between means, plus statistics
                    d  UCL (95%)         Z Pr > d
bimac:imp  0.02626333 0.01297581  8.191782  0.001
bimac:tern 0.02437228 0.01764947  4.328084  0.001
bimac:vag  0.02480890 0.01364798  6.801108  0.001
imp:tern   0.03298336 0.01334316 10.156108  0.001
imp:vag    0.04079689 0.01377951 12.566232  0.001
tern:vag   0.02291126 0.01554668  4.736486  0.003

Again, all species appear significantly different from one another in their contributions to wing shape. And these effects are independent of the influences of wing size and caste, which have already been accounted for in the model.

By the way…

This example absolutely requires you to define a null model for the pairwise function. If you try just passing the full model, the function will guess at the null model and may be wrong.

summary(pairwise(common.sp.caste.model, groups = common.sp$gdf$species))
                    d  UCL (95%)          Z Pr > d
bimac:imp  0.02626333 0.02980956 -0.4516418  0.665
bimac:tern 0.02437228 0.02801701 -0.4260401  0.653
bimac:vag  0.02480890 0.02921824 -0.2716658  0.606
imp:tern   0.03298336 0.03572867  0.2165722  0.410
imp:vag    0.04079689 0.04448370 -0.1000897  0.541
tern:vag   0.02291126 0.02770863 -0.7676849  0.784

What? How can there no significant pairwise differences? – The problem here is that pairwise has assumed the wrong null model. If you ever do want to use the short-cut of not specifying a null model, you can check what model the function will assume as its null.

reveal.model.designs(common.sp.caste.model)
                         Reduced                          Full                                  
log(Csize)                     1                    log(Csize)                                  
species               log(Csize)          log(Csize) + species                                  
caste       log(Csize) + species  log(Csize) + species + caste <- Null/Full inherent in pairwise

As you can see, the “Reduced” model that is assumed here is log(Csize) + species. Since we want to test differences among species, it’s essential that we take as our null model one that does not have species in it as a factor. So using this model is what leads to a different (and in this case inappropriate) result.

And while we’re talking about ways that these statistical functions can be misused, there’s another important issue we should cover in the next section!

The sequential order of factors matters

Remember, modeling is about assigning variance in the dependent variable (or shape coordinates) to the factors in the model. There are different ways to do this. The modeling functions in the geomorph and RRPP packages use what’s called a sequential sum of squares by default. This method examines the main effect each factor has on the response variable (or shapes), in the order that they’re named in the model, before considering any interaction terms. This is generally considered to be the most robust and versatile method. (Other types are used in more specialized situations.)

In practical terms however, this means that building your model with the terms in a different order can sometimes produce different results. For example, in the last section we modeled wing shape as a factor of wing size, species and caste. Let’s look at that model again, and compare it against a model that places caste before species.

anova(common.sp.caste.model)
            Df       SS        MS     Rsq       F      Z Pr(>F)   
log(Csize)  1 0.004258 0.0042576 0.06657  9.7017 4.6556  0.001 **
species     3 0.021944 0.0073146 0.34312 16.6678 9.4651  0.001 **
caste       1 0.001328 0.0013277 0.02076  3.0254 3.0695  0.003 **
Residuals  83 0.036424 0.0004388 0.56954                         
Total      88 0.063953 
common.caste.sp.model <- procD.lm(coords ~ log(Csize) + caste + species, data=common.sp$gdf, iter=i) 
anova(common.caste.sp.model)
           Df       SS        MS     Rsq       F      Z Pr(>F)   
log(Csize)  1 0.004258 0.0042576 0.06657  9.7017 4.6556  0.001 **
caste       1 0.002269 0.0022685 0.03547  5.1693 3.7851  0.001 **
species     3 0.021003 0.0070010 0.32841 15.9531 9.3146  0.001 **
Residuals  83 0.036424 0.0004388 0.56954                         
Total      88 0.063953   

Same factors; different order. Notice that the F statistics and p values (Pr > d in this table) are slightly different. In this example, it wouldn’t radically change your interpretation, but with a different dataset the difference might be more extreme.

So what order should you use? In general, you should place what you consider to be the most influential factors first. If you’re not sure, it’s worth examining both orders. If the outcomes differ a lot, then that may be a sign the factors have a strong interaction (e.g. caste influences wing shape, but differently in different species). If so, you should test a model that includes such as interaction, and see how it stacks up against the simpler model.

Interactions terms

We can also model the influence of interactions. For example, we already have support for the hypotheses that species and caste influence shape independently. But what if certain combinations of those factors have a shape that is unique and different from the influences of each factor alone? We can examine that by comparing models with and without an interaction term.

common.spXcaste.model <- procD.lm(coords ~ log(Csize) + species * caste, data=common.sp$gdf, iter=i) 
anova(common.sp.caste.model, common.spXcaste.model)
              Df       SS        MS     Rsq       F      Z Pr(>F)   
log(Csize)     1 0.004258 0.0042576 0.06657 10.0677 4.7297  0.001 **
species        3 0.021944 0.0073146 0.34312 17.2965 9.5719  0.001 **
caste          1 0.001328 0.0013277 0.02076  3.1396 3.1617  0.001 **
species:caste  2 0.002170 0.0010849 0.03393  2.5655 3.4042  0.002 **
Residuals     81 0.034254 0.0004229 0.53562                         
Total         88 0.063953  
                                             ResDf Df      RSS        SS        MS      Rsq      F      Z     P Pr(>F)
coords ~ log(Csize) + species + caste (Null)    83  1 0.036424                     0.000000                           
coords ~ log(Csize) + species * caste           81  2 0.034254 0.0021699 0.0010849 0.033929 2.5655 3.4042 0.002       
Total                                           88    0.063953 

As we suspected, in this case, there is a significant interaction of species and caste that significantly improves the model fit. (Careful readers will note that this might not be a solid conclusion, since the sample sizes for non-worker castes are not nearly as large as the workers. This lack of balanced sampling could impact the analysis.)

You can specify an interaction in a model using the * symbol, as in the example above, or in general terms as Y ~ A * B. However, you can also name the interaction term specifically using :. In general, this looks like Y ~ A + B + A:B. Or we could modify the call above to become…

common.spXcaste.model <- procD.lm(coords ~ log(Csize) + species + caste + species:caste, data=common.sp$gdf, iter=i) 

…it will accomplish the same thing! This : notation can be useful if you have a complex model with many factors where you want to be very explicit above the interactions you test!

Testing for common allometry

Size and shape are often correlated. This means that as specimens get larger, their shape changes in a predictable way. However, we may want to ask if that relationship is consistent for all of the groups in our dataset. Do species all share a common allometry or do some species have unique allometries? The way to approach this statistically is to compare a model, where size and species independently influence shape, to one that allows for an interaction between those two terms. The function plotAllometry also provides a way to visualize these relationships.

species.unique.model <- procD.lm(coords ~ log(Csize) * species, data=wing.gpa$gdf, iter=i) 
plotAllometry(fit = species.unique.model, size = log10(wing.gpa$gdf$Csize), 
              col = as.factor(wing.gpa$gdf$species), xlab = "log10 centroid size")

We can also use the borealis function gg.scaling.plot to look at the relationship between size and wing shape (represented by PC1 values)

gg.scaling.plot(
  x = log10(wing.gpa$gdf$Csize), y = wing.pca$x[,1],
  group=wing.gpa$gdf$species, group.title = "species",
  xlab = "log10 wing centroid size", ylab = "wing shape PC1",
  include.legend = TRUE,
  groups.trendlines = TRUE,
  fixed.aspect = FALSE
)

Note that the two plots above are fairly different. This is because we used PC1 values in the gg.scaling.plot, while plotAllometry is displaying fitted values based on the model of unique allometries.

Regardless of which plot we focus on, the differences in slopes among different species indicate that there might be some variation in allometries. We can test that hypothesis statistically, by comparing models.

anova(species.model, species.unique.model)
                                     ResDf Df      RSS        SS         MS      Rsq      F      Z     P Pr(>F)
coords ~ log(Csize) + species (Null)    88  1 0.042853                      0.000000
coords ~ log(Csize) * species           82  6 0.038875 0.0039784 0.00066307 0.048489 1.3987 1.7219 0.043
Total                                   96    0.082049

The results here are inconclusive. The benefit in model fit gained by adding unique influences of size for each species produce only a marginal difference.

However, it’s worth noting that this dataset is not ideal to answer this question. The sampling is very uneven across species. Ideally, equal numbers of species would be included and represent similar sizes.

Allometry-corrected PCA

Size often has a dominant effect on shape. Therefore it can be helpful to examine aspects of shape that remain when the effects of allometry are subtracted out.

allometry.corrected.pca <- gm.prcomp(size.model$residuals)
ggGMMplot(allometry.corrected.pca, group = wing.gpa$gdf$species, 
          group.title = 'species', convex.hulls = TRUE,
          include.legend = TRUE)

After this correction, we see more separation of the groups in the center, which may reveal aspects of their shape that are different, independent of differences in their size.

Phylogenetic ANOVA

One general feature of ANOVA is that is assumes the levels of each factor are independent of one another. For example, if the species factor has 8 different values, ANOVA assumes that there’s no connection between them. However, we know that some species share ancestors more recently than others. Therefore, this assumption of independence doesn’t hold when compared groups with an evolutionary history. Worse perhaps, is the fact that the degree of relatedness (and therefore the extent of non-independence) will vary among individual species.

Thankfully, methods have been developed to incorporate knowledge of species relationships into an ANOVA. geomoprh includes a function, procD.pgls, which will build models similar to procD.lm while also accounting for phylogeny. What’s happening under the hood here is that the tree is used in the permutations that generate the test statistics. So while the output is similar with procD.pgls and procD.lm, the former is more reliable if you have a good phylogeny available.

One limitation of this approach is that the model can only include one representative from each species (or each tip of the tree, at whatever taxonomic level you’re working with).

For this example, let’s examine the influence of wing size on shape, in a phylogenetic context. We can use the same shape data we created earlier, mshape.by.species, which contains mean worker wing shapes for the 7 most heavily sampled species. The object btree also contains the phylogeny for those species. We can calculate the mean centroid wing size from the larger dataset, then combine these elements into a new geomorph.data.frame.

species.gdf <- geomorph.data.frame(
  coords = mshape.by.species,
  Csize = c(by(workers$gdf$Csize, workers$gdf$species, mean)),
  tree = btree
)

size.pgls <- procD.pgls(coords ~ log(Csize), phy = tree, data = species.gdf, iter = i)

anova(size.pgls)
           Df       SS        MS     Rsq     F      Z Pr(>F)
log(Csize)  1 0.016836 0.0168363 0.25317 1.695 1.1565  0.135
Residuals   5 0.049665 0.0099331 0.74683                    
Total       6 0.066502                                      

In this analysis size does not have a significant influence on wing shape. If you recall, when we looked at the output of anova(size.model), size was a significant influence on shape. To interpret these results, it’s important to remember that earlier, we had a larger dataset and including size as a factor tested whether individual size differences influenced individual wing shapes. Here we’re asking a slightly different question: whether species-level size influence the mean worker wing shape of the species.

You can build multiple models with procD.pgls and compare them in the same way as with procD.lm, by passing both models to the anova function. (With our example dataset here, we don’t have another variable that makes sense to add to the model.)

Disparity comparisons

Organisms vary. The extent of that variation (disparity) may also be different among different groups. Are these differences greater than expected by chance? That can be tested, as shown below.

morphol.disparity(coords ~ Csize, groups = ~ species, data = wing.gpa$gdf)

The output is a table of pairwise mean Procrustes distances and p-values.

We don’t have balanced sampling by species or caste in this dataset, making this a rather inappropriate comparison. We could either increase sampling for the underrepresented species, or we could subset the data to only those species with good sample sizes.

c(with(wing.gpa$gdf, by(species,species, length)))
c(with(workers$gdf, by(species,species, length)))
bimac   bor  ferv   imp   san  tern terri   vag 
   23     3     2    28     1    26     2    12 

bimac   bor  ferv   imp  tern terri   vag 
   12     3     2    26    26     2     8   

Let’s just compare workers from the three most heavily sampled species.

x <- which(workers$gdf$species %in% c("bimac","imp","tern"))
well.sampled.workers <- subsetGMM(workers, specimens = x)

morphol.disparity(coords ~ Csize, groups = ~ species, data = well.sampled.workers$gdf)
Procrustes variances for defined groups
       bimac          imp         tern 
0.0007636383 0.0005756033 0.0005171270 


Pairwise absolute differences between variances
             bimac          imp         tern
bimac 0.0000000000 1.880350e-04 2.465113e-04
imp   0.0001880350 0.000000e+00 5.847636e-05
tern  0.0002465113 5.847636e-05 0.000000e+00


P-Values
      bimac   imp  tern
bimac 1.000 0.050 0.008
imp   0.050 1.000 0.426
tern  0.008 0.426 1.000

From these results we can conclude that workers of bimac (B. bimaculatus) have significantly more variation in wing shapes than workers of the other two species.

Modularity testing

An anatomical module is a group of structures, such as landmarks, that vary less among one another within the module, than they do relative to structures outside the module. In other words, if we were to examine the variance in distances between all landmarks in our dataset, if we consider some group of landmarks to be a module, then within-group variance should be low (and between group variance should be higher) than predicted by chance (Adams & Collyer 2019).

The geomorph function modularity.test implements such a test for two or module modules. Modules are defined by a character vector describing the membership of landmarks in each module.

The most sensible modularity hypotheses will be suggested by anatomical or developmental evidence of greater independence among the hypothetical modules. There are often separate developmental mechanisms governing the proximal and distal regions of appendages, such as the wing. So as an example, we’ll test the hypothesis that the wing exhibits modularity in proximal and distal regions, defined by a division just distal of the pterostigma.

modularity.hypothesis1 <- rep("proximal",wing.gpa$landmark.number)
modularity.hypothesis1[c(1,4,5,6:9,17)] <- "distal"

We’re ready to test whether the two regions exhibit modularity. The function modularity.test requires an array of shape data in the argument A. The modularity hypothesis is passed into the argument partition.gp. The argument CI is a logical parameter that says whether the function should calculate a confidence interval on the test statistic. (That’s usually useful.) Finally, iter specifies the number of iterations. This function is very computationally demanding. Use a low value of iter to explore hypotheses early on, remembering that the lowest possible p-value is 1/iter. Then for confirmation, invest compute time in a run with a higher iter value for greater precision.

mt1 <- modularity.test(wing.gpa$gdf$coords, modularity.hypothesis1, CI=TRUE, iter=99 )
summary(mt1) 
CR: 0.8787
P-value: 0.01
Effect Size: -3.1206
Based on 1000 random permutations
Confidence Intervals 0.8171
Confidence Intervals 0.9708

These results support this modularity hypothesis. However, testing several variations on the hypothesis can be helpful to identify the bounds of a module.


borealis GitHub page