7. Input data types in the mobest R package
The following section briefly lists the main mobest input data types with their constructors.
7.1. Basic data types
mobest employs a number of basic S3 data types to formalize the input to almost all of its functions. Most of them are tabular and inherit from tibble::tibble
. The constructors check certain properties to insure input correctness.
7.1.1. Spatial coordinates
mobest::create_geopos()
creates an object of class mobest_spatialpositions
which is a tibble
that represents spatial positions. Spatial positions in mobest are always 2-dimensional coordinates in a Cartesian space. For real world coordinates that means, that they have to be transformed to a projected coordinate system (e.g. with sf::st_transform()
): mobest can not be used with longitude and latitude coordinates (see Projecting the spatial data).
mobest::create_geopos(
id = 1:10, # ID columns
x = c(2,4,5,6,2,3,6,7,8,5), # spatial x coordinate
y = c(9,5,9,4,3,6,2,6,7,3) # spatial y coordinate
)
For the interpolation fields we often want regular, spatial grids covering a specific spatial area. These can be constructed with mobest::create_prediction_grid()
, which takes an object of class sf
with polygons in a projected coordinate system. It also yields an object of class mobest_spatialpositions
.
7.1.2. Spatiotemporal coordinates
mobest_spatialpositions
can be transformed to mobest_spatiotemporalpositions
with mobest::geopos_to_spatpos()
. This function calculates the permutations of all spatial positions with a new, implicitly temporal dimension z
. mobest_spatiotemporalpositions
is also derived from tibble
.
mobest::geopos_to_spatpos(
geopos = mobest::create_geopos(
id = 1:10,
x = c(2,4,5,6,2,3,6,7,8,5),
y = c(9,5,9,4,3,6,2,6,7,3)
),
z = c(-5, -4, -3)
)
mobest::create_spatpos()
directly creates mobest_spatiotemporalpositions
objects to represent spatiotemporal positions.
mobest::create_spatpos(
id = 1:10, # ID column
x = c(2,4,5,6,2,3,6,7,8,5), # spatial x coordinate
y = c(9,5,9,4,3,6,2,6,7,3), # spatial y coordinate
z = c(1,1,1,1,1,2,2,2,2,2) # temporal coordinate
)
7.1.3. Genetic coordinates
mobest::create_obs
creates an object mobest_observations
, which is a tibble
with genetic coordinates. Genetic coordinates can be any simple numeric measure of ancestry, for example the position of the samples in PCA space.
mobest::create_obs(
ac1 = runif(10, 0, 1), # "ac" here for "ancestry component", e.g. PCA coordinate 1
ac2 = runif(10, 0, 1), # e.g. PCA coordinate 2
)
Names and number of the components can be chosen freely, so instead of ac1
+ ac2
as in the example here, one could, for example, also have PC1
+ PC2
+ PC3
, or MDS1
+ MDS2
.
7.1.4. Kernel parameter settings
Gaussian process regression requires a parametrized covariance function: a “kernel”. One mobest_kernel
can be constructed with mobest::create_kernel()
.
mobest::create_kernel(
dsx = 800 * 1000, # lengthscale parameter spatial x dimension
dsy = 800 * 1000, # lengthscale parameter spatial y dimension
dsx = 800, # lengthscale parameter temporal dimension
g = 0.1, # nugget parameter
on_residuals = T, # should a linear model take over the main trends
# before the kriging interpolation? default: TRUE
auto = F # should the lengthscale and nugget values be
# automatically determined by laGP's maximum likelihood
# algorithm? default: FALSE
)
mobest_kernel
includes these input parameters in the form of an R list
. It only represents one specific kernel, though, for one specific dependent variable (e.g. an ancestry component ac1
). To account for the fact that a mobest analysis typically involves multiple genetic dimensions mobest::create_kernset()
provides a wrapper to bundle multiple named (by the dependent variable name) kernels directly in an object of class mobest_kernelsetting
.
mobest::create_kernset(
ac1 = mobest::create_kernel(1000000, 1000000, 200, 0.1),
ac2 = mobest::create_kernel(1000000, 1000000, 200, 0.1)
)
If a function requires both input of type mobest_observations
and mobest_kernelsetting
, then the names of the individual ancestry components must be identical, i.e. fit to each other.
7.2. Permutation data types
When working with real data we often need to explore permutations of data or account for uncertainty by sampling from distributions (e.g. uncertain dating). To represent that, mobest provides wrapper classes and constructors with a *_multi
suffix, to bundle multiple individual elements in a list class. Some of the core functions provide interfaces that automatically consider all permutations of these input lists.
Available are:
mobest_spatialpositions_multi
(mobest::create_geopos_multi()
)mobest_spatiotemporalpositions_multi
(mobest::create_spatpos_multi()
)mobest_observations_multi
(mobest::create_obs_multi()
)mobest_kernelsetting_multi
(mobest::create_kernset_multi()
)
And here is an example how they can be filled with named arguments:
multiple_kernel_settings <- mobest::create_kernset_multi(
kernel_1 = mobest::create_kernset(
ac1 = mobest::create_kernel(1000000, 1000000, 200, 0.1),
ac2 = mobest::create_kernel(1000000, 1000000, 200, 0.1)
),
kernel_2 = mobest::create_kernset(
ac1 = mobest::create_kernel(1000000, 1000000, 200, 0.1),
ac2 = mobest::create_kernel(1000000, 1000000, 250, 0.1)
)
)