Tensor sketch

[1][2] Such a sketch can be used to speed up explicit kernel methods, bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms.

The term tensor sketch was coined in 2013[4] describing a technique by Rasmus Pagh[5] from the same year.

Later research works generalized it to a much larger class of dimensionality reductions via Tensor random embeddings.

Tensor random embeddings were introduced in 2010 in a paper[6] on differential privacy and were first analyzed by Rudelson et al. in 2012 in the context of sparse recovery.

[7] Avron et al.[8] were the first to study the subspace embedding properties of tensor sketches, particularly focused on applications to polynomial kernels.

In this context, the sketch is required not only to preserve the norm of each individual vector with a certain probability but to preserve the norm of all vectors in each individual linear subspace.

This is a much stronger property, and it requires larger sketch sizes, but it allows the kernel methods to be used very broadly as explored in the book by David Woodruff.

can be multiplied on vectors with tensor structure much faster than normal matrices.

In 2020[15] it was shown that any matrices with random enough independent rows suffice to create a tensor sketch.

is necessary for constructions using tensor randomized projections with Gaussian entries.

in tensor sketches based on the face-splitting product, a different approach was developed in 2020[15] which applies We can achieve such an

by letting With this method, we only apply the general tensor sketch method to order 2 tensors, which avoids the exponential dependency in the number of rows.

The Fast Johnson Lindenstrauss Transform (FJLT),[16] was introduced by Ailon and Chazelle in 2006.

values on the diagonal, instead of being fully independent, it is possible to compute

, splits up into two Fast Johnson–Lindenstrauss transformations, and the total reduction takes time

The same approach can be extended to compute higher degree products, such as

Jin et al.,[17] the same year, showed a similar result for the more general class of matrices call RIP, which includes the subsampled Hadamard matrices.

They showed that these matrices allow splitting into tensors provided the number of rows is

These fast constructions can again be combined with the recursion approach mentioned above, giving the fastest overall tensor sketch.

It is also possible to do so-called "data aware" tensor sketching.

[18] Kernel methods are popular in machine learning as they give the algorithm designed the freedom to design a "feature space" in which to measure the similarity of their data points.

A simple kernel-based binary classifier is based on the following computation: where

Typical examples are the radial basis function kernel,

Sometimes it is faster to do an "explicit" kernel method, in which a pair of functions

The idea of tensor sketch is that we can compute approximate functions

This method was shown in 2020[15] to work even for high degree polynomials and radial basis function kernels.

The idea of Compressed Matrix Multiplication is the general identity where

Bilinear pooling is the technique of taking two input vectors,

In[19] the authors considered using tensor sketch to reduce the number of variables needed.

In 2017 another paper[20] takes the FFT of the input features, before they are combined using the element-wise product.