# Multi-group analysis using generalized additive kernel canonical correlation analysis We first briefly review CCA and its variants. Then, we present our GAKCCA method and describe the algorithm for implementation.

### Canonical correlation analysis and its variants

For two multi-variate groups, canonical correlation analysis finds linear combination of each group that maximizes correlation between two linear combinations. That is, CCA finds ({{mathbf{b}}}_1=(b_{11},ldots , b_{1p_1})^T) and ({{mathbf{b}}}_2= (b_{21},ldots , b_{2p_2})^T) that satisfy the following: ( max _{{{mathbf{b}}}_1, {{mathbf{b}}}_2} displaystyle mathrm {Cov}left( {{mathbf{b}}}_1^T {{mathbf{X}}}_1, {{mathbf{b}}}_2^T {{mathbf{X}}}_2 right) ) subject to ( mathrm {Var}left( {{textbf{b}}}_1^T {{textbf{X}}}_1 right) = mathrm {Var}left( {{textbf{b}}}_2^T {{textbf{X}}}_2 right) = 1), where ({{textbf{X}}}_1 = (X_{11}, ldots , X_{1 p_1})^T) has (p_1) variables and ({{textbf{X}}}_2 = (X_{21}, ldots , X_{2 p_2} )^T) has (p_2) variables. Here, ((cdot ,)^T) denotes the transpose of a matrix. Variance constraints are to reduce the freedom of scaling for ({{mathbf{b}}}_1) and ({{mathbf{b}}}_2).

Instead of linear combination of variables in each group in CCA, Kernel canonical correlation analysis utilizes nonlinear functions to extract the relationship between two groups. KCCA can be formulated as follow: ( max _{f_1,f_2} mathrm {Cov}Big ( f_1({{textbf{X}}}_1), f_2({{textbf{X}}}_2) Big ) ) subject to (mathrm {Var}Big ( f_1({{textbf{X}}}_1) Big ) = mathrm {Var}Big ( f_2({{textbf{X}}}_2) Big ) = 1), where (f_j : {mathbb {R}}^{p_j} rightarrow {mathbb {R}}) for (j=1,2) is an unknown function in the reproducing kernel Hilbert space (RKHS)8.

Note that both CCA and KCCA assume two groups of variables. To expand beyond two groups, Kettenring12 introduced multi-group generalization of CCA (GCCA or MCCA). GCCA finds linear combinations of each group that optimize certain criterion to reveal multi-group structure. Given J multi-variate groups ({{textbf{X}}}_1, ldots , {{textbf{X}}}_J), GCCA finds ({{textbf{b}}}_1, ldots , {{textbf{b}}}_J) by considering ( max _{{{textbf{b}}}_1, ldots , {{textbf{b}}}_J} sum _{j,k=1;j not =k}^{J} c_{jk} g left[ mathrm {Cov}Big ( {{textbf{b}}}_j^T {{textbf{X}}}_j, {{textbf{b}}}_k^T {{textbf{X}}}_k Big ) right] ) subject to (mathrm {Var}Big ( {{textbf{b}}}_j^T {{textbf{X}}}_j Big ) = 1 hbox {~for~} j=1, ldots , J. ) A function g, which is called a scheme function, is related to a criterion for selecting canonical variates12. The examples of g are (g(x) = x) (Horst scheme15), (g(x)=|x|) (Centroid scheme16) or (g(x)=x^2) (Factorial scheme17). (c_{jk}) is an element of (J times J) design matrix C, where (c_{jk} = 1) if j and k groups are related and (c_{jk}=0), otherwise.

Tenenhaus and Tenenhaus18 extended GCCA to a regularization version by imposing a constraint on the norm of a coefficient vector in a linear combination as well as the variance of the linear combination (RGCCA). Specifically, the constraint is given by (tau _j ||{{textbf{b}}}_j||^2 + (1- tau _j) mathrm {Var}Big ( {{textbf{b}}}_j^T {{textbf{X}}}_j Big ) = 1) for ( j=1, ldots , J,) where (varvec{tau } = (tau _1, ldots , tau _J)^T) is a regularization parameter vector (or shrinkage parameter). Regularization parameters enable an inversion operation by avoiding ill-conditioned variance matrices13,18. All (tau _j)’s are between 0 and 1.

Also, Tenenhaus et al.13 developed a nonlinear version of GCCA (KGCCA) by considering a function for each group. That is, KGCCA finds (f_1,ldots , f_J) that satisfy ( max _{f_1,ldots , f_J} sum _{j,k=1;j not =k}^{J} c_{jk} g left[ mathrm {Cov}Big ( f_j({{textbf{X}}}_j), f_k ({{textbf{X}}}_k) Big ) right] ) subject to (mathrm {Var}Big ( f_j ({{textbf{X}}}_j) Big ) = 1 hbox {~for~} j=1, ldots , J,) where each (f_j : {mathbb {R}}^{p_j} rightarrow {mathbb {R}}) is an unknown function in the RKHS. g and (c_{jk}) are same as those in RGCCA.

### Generalized additive Kernel canonical correlation analysis

In this subsection, we introduce our approach that considers an additive structure in the multi-group setting. As in the previous subsection, we consider J multi-variate random variable groups ({{textbf{X}}}_j = left( X_{j1}, ldots , X_{j p_j} right) ^T in {mathbb {R}}^{p_j}) for (j=1, ldots , J). KCCA considers a function on the j-th group variable, (f_j({{textbf{X}}}_j)), where (f_j) is a nonlinear function in the RKHS. In our approach, GAKCCA, we assume that (f_j) is an additive function in RKHS as in Balakrishnan et al.14. That is,

begin{aligned} f_j in {mathscr {H}}_j = left{ h_j text { }Big | text { } h_j (x_1, ldots , x_{p_j} ) = sum _{l=1}^{p_j} h_{jl} (x_{l}) text { and } h_{jl} in {mathscr {H}}_{jl} right} , end{aligned}

where each ({mathscr {H}}_{jl}) is an RKHS with a kernel (phi _{jl} (cdot ,cdot )). Then, GAKCCA finds (f_j in {mathscr {H}}_j) that satisfies0

begin{aligned} max _{f_1, ldots , f_J} displaystyle sum _{j,k=1 ; j not = k}^{J} c_{jk} g Big [ mathrm {Cov}Big ( f_j({{textbf{X}}}_j), f_k({{textbf{X}}}_k) Big ) Big ] ,,,hbox {subject to}, mathrm {Var}Big ( f_j({{textbf{X}}}_j) Big ) = 1 ,hbox { for }, j=1, ldots , J, end{aligned}

(1)

where g and (c_{jk}) are a scheme function and an element of the design matrix C, respectively. Since we assume (f_j in {mathscr {H}}_j), we can write (f_j ({{textbf{X}}}_j) = sum _{l=1}^{p_j} f_{jl}(X_{jl})) so that (1) becomes

begin{aligned} max _{f_{11}, ldots , f_{1 p_1}, ldots , f_{J 1}, ldots , f_{J p_J}} displaystyle sum _{j,k=1 ; j not = k}^{J} c_{jk} g left[ sum _{l=1}^{p_j} sum _{m=1}^{p_k} mathrm {Cov}Big ( f_{jl} (X_{jl}), f_{km} (X_{km}) Big ) right] end{aligned}

(2)

subject to (sum _{l=1}^{p_j} sum _{l’=1}^{p_j} mathrm {Cov}Big ( f_{jl} (X_{jl}) , f_{jl’} (X_{jl’}) Big ) = 1) for (j=1, ldots , J). We denote the expression in the Eq. (2) as (rho _{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J}).

When we introduce a covariance operator on the RKHS, mathematical treatment can be simpler13,19,20. The mean operator (m_{{mathscr {H}}_{jl}}) with respect to (X_{jl}) is defined by

begin{aligned} langle f_{jl} , m_{{mathscr {H}}_{jl}} rangle _{{mathscr {H}}_{jl}} = E Big ( f_{jl} (X_{jl}) Big ) = E Big ( langle f_{jl}, phi _{jl} (cdot , X_{jl} ) rangle _{{mathscr {H}}_{jl}} Big ), end{aligned}

where (langle cdot , cdot rangle _{{mathscr {H}}_{jl}} ) is an inner product on ({mathscr {H}}_{jl}). The covariance operator (Sigma _{jl,km}) with respect to (X_{jl}) and (X_{km}) can be also defined as

begin{aligned} langle f_{jl}, Sigma _{jl,km} f_{km} rangle _{{mathscr {H}}_{jl}}= & {} mathrm {Cov}Big ( f_{jl} (X_{jl}), f_{km} (X_{km}) Big ) \= & {} E Big ( langle f_{jl}, phi _{jl} (cdot , X_{jl}) – m_{{mathscr {H}}_{jl}} rangle _{{mathscr {H}}_{jl}} langle f_{km}, phi _{km} (cdot , X_{km}) – m_{{mathscr {H}}_{km}} rangle _{{mathscr {H}}_{km}} Big ). end{aligned}

Then, the Eq. (2) can be expressed as

begin{aligned} rho _{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J} = max _{f_{11}, ldots , f_{1 p_1} ldots f_{J 1} ldots f_{J p_J}} displaystyle sum _{j,k=1 ; j not = k}^{J} c_{jk} g left[ sum _{l=1}^{p_j} sum _{m=1}^{p_k} langle f_{jl}, Sigma _{jl,km} f_{km} rangle _{{mathscr {H}}_{jl}} right] end{aligned}

(3)

subject to (sum _{l=1}^{p_j} sum _{l’=1}^{p_j} langle f_{jl}, Sigma _{jl,jl’} f_{jl’} rangle _{{mathscr {H}}_{jl}} = 1) for ( j=1, ldots , J.)

Note that the Eq. (3) is a theoretical expression. We now explain how to derive an empirical version using samples. Suppose that we have n samples of ( { {{textbf{X}}}_1, ldots , {{textbf{X}}}_J } ). The i-th sample of ({{textbf{X}}}_j) is denoted by ({{textbf{x}}}_j^{(i)} = left( x_{j1}^{(i)}, ldots , x_{j p_j}^{(i)} right) ). Fukumizu et al.21 suggested an estimated mean operator ({widehat{m}}_{jl}) and an estimated covariance operator ({widehat{Sigma }}_{jl,km}) which satisfy the following properties:

begin{aligned} langle f_{jl} , {widehat{m}}_{{mathscr {H}}_{jl}} rangle _{{mathscr {H}}_{jl}} = {widehat{E}} Big ( f_{jl} (X_{jl}) Big ) = frac{1}{n} sum _{i=1}^{n} langle f_{jl}, phi _{jl} (;dot, x^{(i)}_{jl} ) rangle _{{mathscr {H}}_{jl}} = leftlangle f_{jl}, frac{1}{n} sum _{i=1}^{n} phi _{jl} (cdot , x^{(i)}_{jl} ) rightrangle _{{mathscr {H}}_{jl}} end{aligned}

and

begin{aligned} langle f_{jl}, {widehat{Sigma }}_{jl,km} f_{km} rangle _{{mathscr {H}}_{jl}}= & {} {widehat{mathrm {Cov}}} left( f_{jl} (X_{jl}), f_{km} (X_{km}) right) = frac{1}{n} sum _{i=1}^{n} {hat{f}}_{jl} (x_{jl}^{(i)}) {hat{f}}_{km} (x_{km}^{(i)}), end{aligned}

(4)

where ( {hat{f}}_{jl} (x_{jl}^{(i)}) = langle f_{jl} , {widehat{phi }}_{jl}^{(i)} rangle _{{mathscr {H}}_{jl}}), ({widehat{phi }}_{jl}^{(i)} = phi _{jl}^{(i)} – frac{1}{n} sum _{xi =1}^{n} phi _{jl}^{(xi )} ) and ( phi _{jl}^{(i)} = phi _{jl} ( cdot , x_{jl}^{(i)} )).

Bach and Jordan8 utilized the linear space spanned by ({widehat{phi }}_{jl}^{(1)}, ldots , {widehat{phi }}_{jl}^{(n)}) denoted by ({mathscr {S}}_{jl}) to write (f_{jl} = sum _{i=1}^{n} a_{jl}^{(i)} {widehat{phi }}_{jl}^{(i)} + f_{jl}^{perp}), where (a_{jl}^{(i)}) is a coefficient corresponding to ({widehat{phi }}_{jl}^{(i)}) which needs to be estimated and (f_{jl}^{perp}) is orthogonal to ({mathscr {S}}_{jl}). With these facts, we can further simplify the Eq. (4) by introducing an (n times n) symmetric Gram matrix ({{textbf{K}}}_{jl})22 whose ((i,i’))-component is (({{textbf{K}}}_{jl})_{(i,i’)} = phi _{jl} (X_{jl}^{(i)}, X_{jl}^{(i’)}) = langle phi _{jl}^{(i)}, phi _{jl}^{(i’)} rangle _{{mathscr {H}}_{jl}}). The centered ({{textbf{K}}}_{jl}) can be represented as (widehat{{{textbf{K}}}}_{jl} = left( I_n – frac{1}{n} J_n right) ^T {{textbf{K}}}_{jl} left( I_n – frac{1}{n} J_n right) ), where (I_n) is the (n times n) identity matrix and (J_n) is the (n times n) matrix whose components are all ones, and its ((i,i’))-component is ( (widehat{{{textbf{K}}}}_{jl})_{(i,i’)} = langle {widehat{phi }}_{jl}^{(i)}, {widehat{phi }}_{jl}^{(i’)} rangle _{{mathscr {H}}_{jl}}. ) Then, using

begin{aligned} {hat{f}}_{jl} (x_{jl}^{(i)}) = leftlangle f_{jl} , {widehat{phi }}_{jl}^{(i)} rightrangle _{{mathscr {H}}_{jl}} = leftlangle sum _{i’=1}^{n} a_{jl}^{(i’)} {widehat{phi }}_{jl}^{(i’)} + f_{jl}^{perp}, {widehat{phi }}_{jl}^{(i)} rightrangle _{{mathscr {H}}_{jl}} = sum _{i’=1}^{n} a_{jl}^{(i’)} langle {widehat{phi }}_{jl}^{(i’)}, {widehat{phi }}_{jl}^{(i)} rangle _{{mathscr {H}}_{jl}} = sum _{i’=1}^{n} a_{jl}^{(i’)} (widehat{{{textbf{K}}}}_{jl})_{(i’,i)} , end{aligned}

(5)

the Eq. (4) becomes

begin{aligned} {widehat{mathrm {Cov}}} left( f_{jl} (X_{jl}), f_{km} (X_{km}) right)= & {} frac{1}{n} sum _{i=1}^{n} sum _{i’=1}^{n} sum _{i”=1}^{n} a_{jl}^{(i’)} (widehat{{{textbf{K}}}}_{jl})_{(i’,i)} (widehat{{{textbf{K}}}}_{km})_{(i,i”)} a_{km}^{(i”)} = frac{1}{n} {{textbf{a}}}_{jl}^T widehat{{{textbf{K}}}}_{jl}^T widehat{{{textbf{K}}}}_{km} {{textbf{a}}}_{km}, end{aligned}

where ( {{textbf{a}}}_{jl} = left( a_{jl}^{(1)}, ldots , a_{jl}^{(n)}right) ^T). The third equality in the Eq. (5) is due to the fact that (f_{jl}^{perp}) is orthogonal to ({mathscr {S}}_{jl}). ({mathscr {S}}_{jl}) is the inner-product linear space generated by (left{ {widehat{phi }}_{jl}^{(1)}, ldots , {widehat{phi }}_{jl}^{(n)} right} ) with inner product (langle cdot , cdot rangle _{{mathscr {H}}_{jl}}). This leads to (langle f_{jl}^{perp}, {widehat{phi }}_{jl}^{(i)} rangle _{{mathscr {H}}_{jl}}=0) for all (i=1,ldots ,n).

Note that the centered Gram matrix (widehat{{{textbf{K}}}}_{jl}) is singular since the sum of rows or columns is zero. Thus, the constraint (sum _{l=1}^{p_j} sum _{l’=1}^{p_j} langle f_{jl}, {widehat{Sigma }}_{jl,jl’} f_{jl’} rangle = sum _{l=1}^{p_j} sum _{l’=1}^{p_j} frac{1}{n} {{textbf{a}}}_{jl}^T widehat{{{textbf{K}}}}_{jl}^T widehat{{{textbf{K}}}}_{jl’} {{textbf{a}}}_{jl’} = 1) does not provide a unique solution to our method. So, similar to the regularization approach for the KCCA method8,13, we use (sum _{i=1}^{n} a_{jl}^{(i)} {widehat{phi }}_{jl}^{(i)}) instead of (f_{jl}) and introduce regularization parameters (tau _j >0) in the constraint conditions such as

begin{aligned} sum _{l=1}^{p_j} sum _{l’=1}^{p_j} leftlangle sum _{i=1}^{n} a_{jl}^{(i)} {widehat{phi }}_{jl}^{(i)}, left{ (1-tau _j){widehat{Sigma }}_{jl,jl’} + tau _j mathrm {I}_{jl, jl’} right} sum _{i=1}^{n} a_{jl’}^{(i)} {widehat{phi }}_{jl’}^{(i)} rightrangle _{{mathscr {H}}_{jl}} = 1, qquad j=1, ldots , J, end{aligned}

(6)

where (mathrm {I}_{jl, jl’}) is an identity operator if (l = l’) and a zero operator, otherwise. With the (widehat{{{textbf{K}}}}_{jl}), the Eq. (6) can be rewritten as

begin{aligned} (1-tau _j) sum _{l=1}^{p_j} sum _{l’=1}^{p_j} frac{1}{n} {{textbf{a}}}_{jl}^T widehat{{{textbf{K}}}}_{jl}^T widehat{{{textbf{K}}}}_{jl’} {{textbf{a}}}_{jl’} + tau _j sum _{l=1}^{p_j} {{textbf{a}}}_{jl}^T widehat{{{textbf{K}}}}_{jl} {{textbf{a}}}_{jl} = 1, qquad j=1, ldots , J. end{aligned}

In summary, the empirical version of the Eq. (3) with regularization parameters is expressed as

begin{aligned} {widehat{rho }}_{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J} = max _{{{textbf{a}}}_{11}, ldots , {{textbf{a}}}_{1 p_1} ldots {{textbf{a}}}_{J 1} ldots {{textbf{a}}}_{J p_J}} displaystyle sum _{j,k=1 ; j not = k}^{J} c_{jk} g left[ sum _{l=1}^{p_j} sum _{m=1}^{p_k} frac{1}{n} {{textbf{a}}}_{jl}^T widehat{{{textbf{K}}}}_{jl}^T widehat{{{textbf{K}}}}_{km} {{textbf{a}}}_{km} right] end{aligned}

(7)

subject to ((1-tau _j) sum _{l=1}^{p_j} sum _{l’=1}^{p_j} frac{1}{n} {{textbf{a}}}_{jl}^T widehat{{{textbf{K}}}}_{jl}^T widehat{{{textbf{K}}}}_{jl’} {{textbf{a}}}_{jl’} + tau _j sum _{l=1}^{p_j} {{textbf{a}}}_{jl}^T widehat{{{textbf{K}}}}_{jl} {{textbf{a}}}_{jl} = 1,) for ( j=1, ldots , J.)

To find the solution, (left{ widehat{{{textbf{a}}}}_{11}, ldots , widehat{{{textbf{a}}}}_{1 p_1}, ldots , widehat{{{textbf{a}}}}_{J 1}, ldots , widehat{{{textbf{a}}}}_{J p_J} right} ) to the equation (7), an algorithm similar to the one considered in Tenenhaus et al.13 is developed. The detailed algorithm is described in the Supplementary Appendix A.

In the classical CCA method, the contribution of a variable in a group in relation between the group and the other group is measured by correlation23. To be specific, the contribution of (X_{1l}) in ({{textbf{X}}}_1) for the relation between ({{textbf{X}}}_1) and ({{textbf{X}}}_2) is measured by (mathrm {Corr}( {widehat{b}}_{1l} X_{1l}, {widehat{{{textbf{b}}}}}_{2}^T {{textbf{X}}}_2 )), where ({widehat{{{textbf{b}}}}}_1) and ({widehat{{{textbf{b}}}}}_2) are canonical weights in CCA. A high absolute value of (mathrm {Corr}( {widehat{b}}_{1l} X_{1l}, {widehat{{{textbf{b}}}}}_{2}^T {{textbf{X}}}_2 )) implies that (X_{1l}) plays a significant role in the relation between ({{textbf{X}}}_1) and ({{textbf{X}}}_2). Similarly, we can measure the contribution of a variable in a group in relation between the group and the other group in our approach, GAKCCA. We define the contribution coefficient of (X_{jl}), the lth variable in the jth group, in relation between ({{textbf{X}}}_j) and ({{textbf{X}}}_k) as

begin{aligned} r_{X_{jl},{{textbf{X}}}_k}= & {} mathrm {Corr}Big ( f_{jl} (X_{jl}), f_{k} ({{textbf{X}}}_{k}) Big ). end{aligned}

We also define the measure for the relation between ({{textbf{X}}}_{j}) and ({{textbf{X}}}_k) as

begin{aligned} r_{{{textbf{X}}}_j,{{textbf{X}}}_k}= & {} mathrm {Corr}Big ( f_{j} ({{textbf{X}}}_{j}), f_{k} ({{textbf{X}}}_{k}) Big ). end{aligned}

The empirical version of (r_{X_{jl}, {{textbf{X}}}_k}) and ( r_{{{textbf{X}}}_j, {{textbf{X}}}_k}) can be formulated as

begin{aligned} {widehat{r}}_{X_{jl},{{textbf{X}}}_k}= & {} {widehat{mathrm {Corr}}} Big ( f_{jl} (X_{jl}), f_{k} ({{textbf{X}}}_{k}) Big ) = displaystyle frac{displaystyle sum _{m=1}^{p_k} widehat{{{textbf{a}}}}_{jl}^T widehat{{{textbf{K}}}}_{jl}^T widehat{{{textbf{K}}}}_{km} widehat{{{textbf{a}}}}_{km}}{sqrt{widehat{{{textbf{a}}}}_{jl}^T widehat{{{textbf{K}}}}_{jl}^T widehat{{{textbf{K}}}}_{jl} widehat{{{textbf{a}}}}_{jl} }, sqrt{displaystyle sum _{m=1}^{p_k} displaystyle sum _{m’=1}^{p_k} widehat{{{textbf{a}}}}_{km}^T widehat{{{textbf{K}}}}_{km}^T widehat{{{textbf{K}}}}_{km’} widehat{{{textbf{a}}}}_{km’}} }, end{aligned}

and

begin{aligned} {widehat{r}}_{{{textbf{X}}}_j,{{textbf{X}}}_k}= & {} displaystyle frac{displaystyle sum _{l=1}^{p_j} sum _{m=1}^{p_k} widehat{{{textbf{a}}}}_{jl}^T widehat{{{textbf{K}}}}_{jl}^T widehat{{{textbf{K}}}}_{km} widehat{{{textbf{a}}}}_{km}}{sqrt{displaystyle sum _{l=1}^{p_j} displaystyle sum _{l’=1}^{p_j} widehat{{{textbf{a}}}}_{jl}^T widehat{{{textbf{K}}}}_{jl}^T widehat{{{textbf{K}}}}_{jl’} widehat{{{textbf{a}}}}_{jl’} } sqrt{ displaystyle sum _{m=1}^{p_k} displaystyle sum _{m’=1}^{p_k} widehat{{{textbf{a}}}}_{km}^T widehat{{{textbf{K}}}}_{km}^T widehat{{{textbf{K}}}}_{km’} widehat{{{textbf{a}}}}_{km’}} }. end{aligned}

Simulation study shows that empirical contribution coefficient and measure for the relation between two groups describe structural information of variable groups well.

#### Regularization parameter selection

There can be several approaches for choosing appropriate regularization parameters. We consider a cross validation idea for selecting regularization parameters for GAKCCA. Using the whole data, we approximate (f_j) and denote as ({hat{f}}_j). Using the split data, we approximate (f_j) and denote as ({hat{f}}_j^{-g}) which is obtained by excluding the gth split. Then, we compare these two estimates to select the regularization parameters. This approach is similar to that of Ashad Alam and Fukumizu24.

In detail, we describe the selection procedure as follows. We split the n samples of (left{ {{textbf{X}}}_1, ldots , {{textbf{X}}}_J right} ) into G subsets, denoting ( {{textbf{x}}}, ldots , {{textbf{x}}}[G]), where ({{textbf{x}}}[g]) contains (n_g) samples of (left{ {{textbf{X}}}_1, ldots , {{textbf{X}}}_J right} ) and (n_1 + ldots n_G = n). For each (j = 1, ldots , J), we estimate (f_j) such that

begin{aligned} {widehat{f}}_j= & {} sum _{l=1}^{p_j} sum _{i=1}^{n} {widehat{a}}^{(i)}_{jl} phi _{jl}^{(i)}, ~~~ {widehat{f}}^{-g}_j = sum _{l=1}^{p_j} sum _{ i : X_{jl}^{(i)} notin X[(g)] } {widehat{a}}_{jl}^{(i),(-g)} phi _{jl}^{(i),(-g)}, end{aligned}

where ({widehat{a}}_{jl}^{(i),(-g)}) and (phi _{jl}^{(i),(-g)}) are calculated from the data excluding ( {{textbf{x}}}[g]) while ({widehat{a}}^{(i)}_{jl} ) and (phi _{jl}^{(i)} ) are obtained from the entire data. Then, we obtain

begin{aligned} L(varvec{tau }) = L(tau _1, ldots , tau _J) = frac{1}{G} sum _{g=1}^{G} sum _{j =1}^{J} sum _{{{textbf{x}}}in {{textbf{x}}}[g]} left( frac{{widehat{f}}_j({{textbf{x}}}) – {widehat{f}}^{(-g)}_j({{textbf{x}}})}{{widehat{f}}_j({{textbf{x}}})} right) ^2 end{aligned}

and selection of (tau ) is made by minimizing (L(varvec{tau })). The main idea of this procedure is that (f_{jl}) can be expressed as (f_{jl} = sum _{i=1}^{n} a_{jl}^{(i)} {widehat{phi }}_{jl}^{(i)} + f_{jl}^{perp}) by reproducing property in RKHS and we consider (sum _{i=1}^{n} a_{jl}^{(i)} {widehat{phi }}_{jl}^{(i)}) as an approximation of (f_{jl}). Then cross validation procedure chooses (varvec{tau } = (tau _1, ldots , tau _J) ) which minimizes the variability of the estimate of (f_j) caused by the selection of data. Note that (tau _j)’s may not be equal, but for the purpose of simplicity in computation, we assume all (tau _j)’s are equal in the simulation study and real data analysis.

#### Permutation test

In the classical CCA method, Wilks’ lambda statistic is widely used to test the hypothesis that there is no relationship between two groups25. However, it is difficult to apply the Wilks’ lambda test for GAKCCA due to multivariate normal distribution assumption of the Wilks’ lambda test. Nonlinear extension of GAKCCA makes the model more complex, so formulating test statistics based on the unknown nonlinear function is not feasible. Thus, we consider a permutation test approach to test (rho _{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J} =0). That is, we approximate the sampling distribution of test statistics, ({widehat{rho }}_{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J}), by obtaining test statistics from resampling under the null hypothesis.

First, from the original data, we calculate ({widehat{rho }}_{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J}), denoted as ({widehat{rho }}_{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J}^{obs}). Second, for the j-th group, we sample ( left{ {{textbf{x}}}_j^{(1)*}, ldots , {{textbf{x}}}_j^{(n)*} right} ) from ( left{ {{textbf{x}}}_j^{(1)}, ldots , {{textbf{x}}}_j^{(n)} right} ) with replacement. We do the same procedure for all groups. Note that the resampled set (left{ {{textbf{x}}}_1^{(k)*}, ldots , {{textbf{x}}}_J^{(k)*} right} ) does not necessarily keep the order as it should not matter under the null hypothesis. Third, from the resampled data, we calculate ({widehat{rho }}_{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J}). Fourth, we repeat second and third steps m times and obtain ({widehat{rho }}_{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J}^{left{ 1 right} } ldots , {widehat{rho }}_{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J}^{left{ mright} }). Lastly, we find an empirical distribution ({widehat{F}}) from ({widehat{rho }}_{{{textbf{X}}}_1, cdots , {{textbf{X}}}_J}^{left{ 1 right} } cdots , {widehat{rho }}_{{{textbf{X}}}_1, cdots , {{textbf{X}}}_J}^{left{ mright} }). We reject the null hypothesis if (1- {widehat{F}}left( {widehat{rho }}_{{{textbf{X}}}_1, ldots , {{textbf{X}}}_J}^{obs}right) ) is less than the pre-specified significant level. In this paper, we set (m=300).

Analogous hypothesis test methods can be applied to test whether a certain variable is helpful for relationship within groups or not via the contribution coefficient.

### Ethic approval

The data collection was approved by the Seoul National University Research Ethics Committee and all methods to collect the data were performed in accordance with the relevant guidelines and regulations. Informed written consent was obtained from all participants prior to actual participation. Also, all data were anonymized prior to analysis.