pysiglib.sig_mmd#
Added in version v0.2.1.
- sig_mmd(sample1, sample2, dyadic_order, *, static_kernel=None, time_aug=False, lead_lag=False, end_time=1.0, n_jobs=1, max_batch=-1)[source]#
Computes the squared maximum mean discrepancy (MMD)
\[d(\mu, \nu)^2 := \sup_f(\mathbb{E}_{x \sim \mu}[f(x)] - \mathbb{E}_{y \sim \nu}[f(y)]).\]\[= \mathbb{E}_{xx' \sim \mu}[k(x,x')] - 2\mathbb{E}_{x,y \sim \mu \times \nu}[k(x,y)] + \mathbb{E}_{y,y' \sim \nu}[k(y,y')].\]Given a batch of sample paths \(\{x_i\}_{i=1}^m \sim \mu\) and \(\{y_j\}_{j=1}^n \sim \nu\), the MMD is computed using the unbiased estimator
\[\widehat{d}(\mu, \nu)^2 = \frac{1}{m(m-1)}\sum_{j \neq i}k(x_i, x_j) - \frac{2}{mn}\sum_{i,j}k(x_i, x_j) + \frac{1}{n(n-1)}\sum_{j \neq i} k(y_i, y_j).\]Optionally, a static kernel can be specified. For details, see the documentation on static kernels.
- Parameters:
sample1 (numpy.ndarray | torch.tensor) – The batch of sample paths drawn from \(\mu\), of shape
(*batch_shape, length_1, dimension). All dimensions beforelength_1are batch dimensions and are flattened to a single batch \(m\) of paths.sample2 (numpy.ndarray | torch.tensor) – The batch of sample paths drawn from \(\nu\), of shape
(*batch_shape, length_2, dimension). All dimensions beforelength_2are flattened to a single batch \(n\) of paths. Independent ofsample1’s batch shape.dyadic_order (int | tuple) – If set to a positive integer \(\lambda\), will refine the paths by a factor of \(2^\lambda\). If set to a tuple of positive integers \((\lambda_1, \lambda_2)\), will refine the first path by \(2^{\lambda_1}\) and the second path by \(2^{\lambda_2}\).
static_kernel (None | pysiglib.StaticKernel) – Static kernel passed to the signature kernel computation. If
None(default), the linear kernel will be used. For details, see the documentation on static kernels.time_aug (bool) – If set to True, will compute the signature of the time-augmented path, \(\hat{x}_t := (t, x_t)\), defined as the original path with an extra channel set to time, \(t\). This channel spans \([0, t_L]\), where \(t_L\) is given by the parameter
end_time.lead_lag (bool) – If set to True, will compute the signature of the path after applying the lead-lag transformation.
end_time (float) – End time for time-augmentation, \(t_L\).
n_jobs (int) – (Only applicable to CPU computation) Number of threads to run in parallel. If n_jobs = 1, the computation is run serially. If set to -1, all available threads are used. For n_jobs below -1, (max_threads + 1 + n_jobs) threads are used. For example if n_jobs = -2, all threads but one are used.
max_batch (int) – Maximum batch size to run in parallel. If the computation is failing due to insufficient memory, this parameter should be decreased. If set to -1, the entire batch is computed in parallel.
- Returns:
Signature MMD
- Return type:
numpy.ndarray | torch.tensor
Example:#
import torch import pysiglib sample1 = torch.rand((20, 100, 5)) sample2 = torch.rand((15, 100, 5)) mmd = pysiglib.sig_mmd(sample1, sample2, dyadic_order=2) print(mmd)
# MMD with a static kernel and time augmentation import torch import pysiglib sample1 = torch.rand((20, 100, 5)) sample2 = torch.rand((15, 100, 5)) rbf = pysiglib.RBFKernel(sigma=0.5) mmd = pysiglib.sig_mmd( sample1, sample2, dyadic_order=2, static_kernel=rbf, time_aug=True, max_batch=8, ) print(mmd)
Citation#
If you found this library useful in your research, please consider citing the paper:
@article{shmelev2025pysiglib,
title={pySigLib-Fast Signature-Based Computations on CPU and GPU},
author={Shmelev, Daniil and Salvi, Cristopher},
journal={arXiv preprint arXiv:2509.10613},
year={2025}
}