The COMA package implements a new algorithm for optimum mate allocation (OMA), as well as the more traditional method, optimum contribution selection (OCS).
COMA requires two input files. The first file contains the marker
effects and allele dosage data for the parental candidates. Dominance
marker effects are optional, but when included should correspond to a
breeding value parameterization, including the effect of
heterosis/inbreeding depression (Endelman 2023).
The blup
command in R/StageWise is one
option to compute marker effects.
The vignette dataset comes from the University of Wisconsin potato breeding program. There are 170 tetraploid clones genotyped at 12K markers, and the marker effects are derived from a multi-trait index.
geno.file <- system.file("vignette_data", "geno.csv", package = "COMA")
geno <- read.csv(geno.file,check.names=F)
geno[1:4,1:5]
## marker add dom W17037-1 W17037-10
## 1 solcap_snp_c2_51460 -18.71 8.09 0.60 1.34
## 2 solcap_snp_c2_36608 8.48 19.14 2.00 1.00
## 3 solcap_snp_c2_36615 7.51 8.00 1.66 1.40
## 4 solcap_snp_c2_36658 46.53 4.71 4.00 2.00
The second input file is the kinship matrix to control inbreeding. Based on simulation results (Endelman 2024), the pedigree IBD kinship matrix is recommended for either OCS or OMA. The kinship matrix for the potato clones was generated using R package AGHmatrix.
K.file <- system.file("vignette_data", "kinship.csv", package = "COMA")
#preview files
K <- as.matrix(read.csv(K.file,check.names=F,row.names=1))
K[1:3,1:3]
## W17037-1 W17037-10 W17037-11
## W17037-1 0.2631 0.1397 0.1397
## W17037-10 0.1397 0.2631 0.1397
## W17037-11 0.1397 0.1397 0.2631
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00600 0.04700 0.08600 0.08062 0.09690 0.18820
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00360 0.01747 0.01820 0.02354 0.02050 0.09293
Now we are ready for read_data
:
library(COMA)
data <- read_data(geno.file=geno.file,
kinship.file=K.file,
ploidy=4, matings="all",
standardize=TRUE, n.core=2)
head(data$parents)
## id merit
## 1 W17037-1 -0.6518888
## 2 W17037-10 0.6155129
## 3 W17037-11 0.4235588
## 4 W17037-12 0.8016658
## 5 W17037-13 0.3160459
## 6 W17037-15 0.6178174
## parent1 parent2 merit
## 1 W17037-1 W17037-1 -2.06643337
## 2 W17037-10 W17037-1 -0.47212478
## 3 W17037-11 W17037-1 -0.42772748
## 4 W17037-12 W17037-1 -0.08777775
## 5 W17037-13 W17037-1 -0.24849597
## 6 W17037-15 W17037-1 -0.32733990
The matings
argument for read_data
indicates how to create the table of possible matings. The above example
used “all”, which creates all unique parental combinations. Consult the
function
documentation for other options. When standardize
=TRUE,
genetic values are scaled by the additive standard deviation (\(\sigma_A\)). This is for convenience when
interpreting the response values.
As shown above, the parents
output contains the
predicted merit (GEBV) for each individual. The matings
output contains the predicted merit (F1 progeny mean) for each possible
mating. For dioecious species (with separate sexes), a data frame with
sex information is needed as argument sex
in
read_data
, and the column headers for the matings are
“female”,“male” instead of “parent1”,“parent2”.
The first argument of the oma
function is
dF
=c(min,max), which represents lower and upper bounds on
the inbreeding rate for the next generation, dF1. The upper bound is
also applied to the average inbreeding rate over two generations, dF2,
assuming random mating of the progeny. The second argument,
parents
=data.frame(id,min,max), contains lower and upper
bounds on the contribution of each parent (do not include a column for
parental merit). The third argument, matings
, is a data
frame of the merit for each possible mating, as well as lower and upper
bounds on the allocation.
parents <- data.frame(id=data$parents$id, min=0, max=1)
matings <- data.frame(data$matings, min=0, max=1)
ans0 <- oma(dF=c(-1,0.005), parents=parents, matings=matings, ploidy=4, K=data$K)
ans0$response
## dF1 dF2 merit n.parent n.mate
## 1 NA NA NA 0 0
The above result shows there is no feasible solution at 0.5%
inbreeding for both generations, due to the relatedness of the clones.
The argument dF.adapt
can be used to automatically
increasing the upper bound until a solution is found:
ans1 <- oma(dF=c(-1,0.005), parents=parents, matings=matings, ploidy=4, K=data$K,
dF.adapt=list(step=0.001,max=0.05))
ans1$response
## dF1 dF2 merit n.parent n.mate
## 1 -0.0082 0.008 1.215909 33 31
With no lower bound, a solution was found at dF2 = 0.8%, while the negative value for dF1 indicates the progeny would be 0.8% more outbred than the parents. When a similar approach was used in a simulation study (Endelman 2024), the population became trapped in a local optimum with limited long-term gain. Imposing a lower bound of 0.5% for dF1 led to more long-term gain in the simulation. As the following code shows, for the potato dataset, this required increasing the dF2 rate to 1.0%.
ans2 <- oma(dF=c(0.005,0.008), parents=parents, matings=matings, ploidy=4, K=data$K,
dF.adapt=list(step=0.001,max=0.05))
ans2$response
## dF1 dF2 merit n.parent n.mate
## 1 0.005 0.01 1.228329 43 43
Interestingly, the second solution uses more parents even though the
inbreeding rate is higher. The optimal parental contributions are
returned as oc
, and the optimal mate allocations are
returned as om
.
## id value
## 40 W17038-4 0.0015535796
## 43 W17039-11 0.0008559787
## 71 W17039-69 0.0944907096
## 77 W17041-3 0.0167267015
## 78 W17042-3 0.0116170009
## 79 W17042-4 0.0227763478
## parent1 parent2 value
## 5987 W17062-26 W17038-4 0.0031071592
## 6374 W17062-26 W17039-11 0.0017119573
## 9497 W17042-8 W17039-69 0.0292801800
## 9581 W17AF6677-8 W17039-69 0.0197916027
## 9583 W17AF6684-10 W17039-69 0.0004466013
## 9584 W17AF6685-2 W17039-69 0.1394630353
The parental contributions of the two solutions can be visually
compared with the COMA function plot_ribbon
:
library(ggplot2)
plot_ribbon(oc=list(`-0.008`=ans1$oc,`0.005`=ans2$oc)) +
theme(legend.key.size=unit(0.1,'cm'))
COMA also has a function for OCS, which optimizes parental
contributions but not specific matings. The arguments are similar,
except dF is only a single number for the upper bound on group kinship,
and the parents
data frame includes a column for merit:
par2 <- data.frame(data$parents, min=0, max=1)
ans3 <- ocs(dF=0.005, parents=par2, ploidy=4, K=data$K,
dF.adapt=list(step=0.001, max=0.05))
ans3$response
## dF merit n.parent
## 1 0.015 1.222502 29
The smallest feasible inbreeding rate under OCS was 1.5%. The predicted merit is based on parental breeding values and therefore neglects specific combining ability (SCA). When SCA was accounted for, the OCS solution (using random mating) decreased from 1.2 to 1.0 \(\sigma_A\).
ocs.matings <- merge(expand.grid(parent1=ans3$oc$id,parent2=ans3$oc$id,stringsAsFactors = F),
data$matings)
i1 <- match(ocs.matings$parent1,ans3$oc$id)
i2 <- match(ocs.matings$parent2,ans3$oc$id)
ocs.matings$value <- ifelse(i1==i2,ans3$oc$value[i1]*ans3$oc$value[i1],
2*ans3$oc$value[i1]*ans3$oc$value[i2])
sum(ocs.matings$merit*ocs.matings$value)
## [1] 1.038668
Although COMA was developed for outbred populations, it can also be
used to manage diversity under long-term recurrent selection in inbred
crops. In this case, a sensible choice for the base, or reference,
population to compute inbreeding rate is a hypothetical progeny
population generated by random mating (RM) of the parents. This is
controlled by the argument base
in the ocs
and
oma
functions. The default value is “current”, which is
appropriate for outbred parents. For inbred parents, use “RM”.
Because COMA is based on convex optimization, additional linear constraints can be incorporated. The potato dataset contains presence/absence information for a genetic marker for potato virus Y resistance (PVYR), which is a critical trait. Two different selection approaches will be illustrated: (1) only allowing matings with a resistant parent; (2) imposing a lower bound on the R gene frequency.
The following code implements method 1 with the 0.5% lower bound:
library(dplyr)
par.file <- system.file("vignette_data", "parents.csv",
package = "COMA")
data2 <- read.csv(par.file)
PVYR.id <- data2$id[data2$PVYR=="Y"]
#calculate proportion of matings with PVYR parent from previous solution
n.mate <- ans2$response$n.mate
nR <- sum(ans2$om$parent1 %in% PVYR.id | ans2$om$parent2 %in% PVYR.id)
#add constraint
ans2a <- oma(dF=c(0.005,0.01), ploidy=4L, K=data$K, parents=parents,
matings=filter(matings, parent1%in%PVYR.id | parent2%in%PVYR.id))
#calculate frequency of PVYR gene, assuming single copy in parents
R.parents <- which(ans2$oc$id %in% PVYR.id)
freq2 <- sum(ans2$oc$value[R.parents])*0.25
R.parents <- which(ans2a$oc$id %in% PVYR.id)
freq2a <- sum(ans2a$oc$value[R.parents])*0.25
result <- data.frame(constraint=c("none","method 1"),
rbind(ans2$response,ans2a$response),
prop.resistant.matings=c(nR/n.mate,1),
Ry.freq=c(freq2,freq2a))
kable(result,digits=3)
constraint | dF1 | dF2 | merit | n.parent | n.mate | prop.resistant.matings | Ry.freq |
---|---|---|---|---|---|---|---|
none | 0.005 | 0.01 | 1.228 | 43 | 43 | 0.442 | 0.086 |
method 1 | 0.005 | 0.01 | 1.018 | 42 | 40 | 1.000 | 0.125 |
The above table compares the solution with and without the PVYR constraint. As expected, increasing the proportion of PVYR matings to 100% reduced the predicted merit. The frequency of the R gene in the progeny is also shown, assuming a single copy in the resistant parents.
Under method 2, the R gene frequency is constrained directly, which
is less restrictive than method 1 because some matings could have two
PVYR parents while others have none. Both ocs
and
oma
allow linear inequality or equality constraints on any
linear combination of the parental contributions. When the coefficients,
\(\mathbf{s}\), of the contribution
variables, \(\mathbf{y}\), are the R
gene frequencies of each parent, this creates the desired constraint.
The coefficients are included as an additional column in the
parents
data.frame, and the column header is parsed to
complete the equation. In this case, the header “gt0.125” represents
greater than or equal to 0.125: \[
\mathbf{s}^\prime \mathbf{y} \ge 0.125\]
par3 <- data.frame(parents,
gt0.125=ifelse(parents$id %in% PVYR.id, 0.25, 0))
ans2b <- oma(dF=c(0.005,0.01), ploidy=4L, K=data$K, parents=par3, matings=matings)
nR <- sum(ans2b$om$parent1 %in% PVYR.id | ans2b$om$parent2 %in% PVYR.id)
R.parents <- which(ans2b$oc$id %in% PVYR.id)
freq2b <- sum(ans2b$oc$value[R.parents])*0.25
result <- rbind(result,
data.frame(constraint="method 2",ans2b$response,
prop.resistant.matings=nR/n.mate, Ry.freq=freq2b))
kable(result,digits=3)
constraint | dF1 | dF2 | merit | n.parent | n.mate | prop.resistant.matings | Ry.freq |
---|---|---|---|---|---|---|---|
none | 0.005 | 0.01 | 1.228 | 43 | 43 | 0.442 | 0.086 |
method 1 | 0.005 | 0.01 | 1.018 | 42 | 40 | 1.000 | 0.125 |
method 2 | 0.005 | 0.01 | 1.070 | 43 | 42 | 0.628 | 0.125 |
As expected, the predicted merit under method 2 was higher than method 1.
The function read_data
computes genetic merit based on
genomic prediction of the progeny mean. But the oma
and
ocs
functions are not limited to this definition; the user
can supply their own merit values for each candidate parent or mating.
For OCS, individual phenotypes can be used. For OMA, it is possible to
incorporate progeny variance into the merit definition, e.g., through
the “usefulness” criterion.