bd {Ball}  R Documentation 
Compute Ball Divergence statistic, which is a generic dispersion measure in Banach spaces.
bd( x, y = NULL, distance = FALSE, size = NULL, num.threads = 1, kbd.type = c("sum", "maxsum", "max") )
x 
a numeric vector, matrix, data.frame, or a list containing at least two numeric vectors, matrices, or data.frames. 
y 
a numeric vector, matrix, data.frame. 
distance 
if 
size 
a vector recording sample size of each group. 
num.threads 
number of threads. If 
kbd.type 
a character string specifying the Ksample Ball Divergence test statistic,
must be one of 
Given the samples not containing missing values, bd
returns Ball Divergence statistics.
If we set distance = TRUE
, arguments x
, y
can be a dist
object or a
symmetric numeric matrix recording distance between samples;
otherwise, these arguments are treated as data.
Ball divergence statistic measure the distribution difference of two datasets in Banach spaces. The Ball divergence statistic is proven to be zero if and only if two datasets are identical.
The definition of the Ball Divergence statistics is as follows. Given two independent samples \{x_{1}, …, x_{n}\} with the associated probability measure μ and \{y_{1}, …, y_{m}\} with ν, where the observations in each sample are i.i.d. Let δ(x,y,z)=I(z\in \bar{B}(x, ρ(x,y))), where δ(x,y,z) indicates whether z is located in the closed ball \bar{B}(x, ρ(x,y)) with center x and radius ρ(x, y). We denote:
A_{ij}^{X}=\frac{1}{n}∑_{u=1}^{n}{δ(X_i,X_j,X_u)}, \quad A_{ij}^{Y}=\frac{1}{m}∑_{v=1}^{m}{δ(X_i,X_j,Y_v)},
C_{kl}^{X}=\frac{1}{n}∑_{u=1}^{n}{δ(Y_k,Y_l,X_u)}, \quad C_{kl}^{Y}=\frac{1}{m}∑_{v=1}^{m}{δ(Y_k,Y_l,Y_v)}.
A_{ij}^X represents the proportion of samples \{x_{1}, …, x_{n}\} located in the ball \bar{B}(X_i,ρ(X_i,X_j)) and A_{ij}^Y represents the proportion of samples \{y_{1}, …, y_{m}\} located in the ball \bar{B}(X_i,ρ(X_i,X_j)). Meanwhile, C_{kl}^X and C_{kl}^Y represent the corresponding proportions located in the ball \bar{B}(Y_k,ρ(Y_k,Y_l)). The Ball Divergence statistic is defined as:
D_{n,m}=A_{n,m}+C_{n,m}
Ball Divergence can be generalized to the Ksample test problem. Suppose we
have K group samples, each group include n_{k} samples.
The definition of Ksample Ball Divergence statistic could be
to directly sum up the twosample Ball Divergence statistics of all sample pairs (kbd.type = "sum"
)
∑_{1 ≤q k < l ≤q K}{D_{n_{k},n_{l}}},
or to find one sample with the largest difference to the others (kbd.type = "maxsum"
)
\max_{t}{∑_{s=1, s \neq t}^{K}{D_{n_{s}, n_{t}}},}
to aggregate the K1 most significant different twosample Ball Divergence statistics (kbd.type = "max"
)
∑_{k=1}^{K1}{D_{(k)}},
where D_{(1)}, …, D_{(K1)} are the largest K1 twosample Ball Divergence statistics among \{D_{n_s, n_t} 1 ≤q s < t ≤q K\}. When K=2, the three types of Ball Divergence statistics degenerate into twosample Ball Divergence statistic.
See bd.test
for a test of distribution equality based on the Ball Divergence.

Ball Divergence statistic 
Wenliang Pan, Yuan Tian, Xueqin Wang, Heping Zhang
Wenliang Pan, Yuan Tian, Xueqin Wang, Heping Zhang. Ball Divergence: Nonparametric two sample test. Ann. Statist. 46 (2018), no. 3, 1109–1137. doi:10.1214/17AOS1579. https://projecteuclid.org/euclid.aos/1525313077
############# Ball Divergence ############# x < rnorm(50) y < rnorm(50) bd(x, y)