From Matrix to Tensor:
The Transition to Numerical Multilinear Algebra
Lecture 9. The Curse of Dimensionality
Charles F. Van Loan
Cornell University
The Gene Golub SIAM Summer School 2010
Selva di Fasano, Brindisi, Italy
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Where We Are
Lecture 1. Introduction to Tensor Computations
Lecture 2. Tensor Unfoldings
Lecture 3. Transpositions, Kronecker Products, Contractions
Lecture 4. Tensor-Related Singular Value Decompositions
Lecture 5. The CP Representation and Tensor Rank
Lecture 6. The Tucker Representation
Lecture 7. Other Decompositions and Nearness Problems
Lecture 8. Multilinear Rayleigh Quotients
Lecture 9. The Curse of Dimensionality
Lecture 10. Special Topics
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
What is this Lecture About?
Big Problems
1. A single N-by-N matrix problem is big if N is big.
2. A problem that involves N small p-by-p problems is big if N is
big.
3. A problem that involves a tensor A ∈ IRn1×···×nd is big if
N = n1 · · · nd
is big and that can happen rather easily if d is big.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
What is this Lecture About?
Data Sparse Representation
We are used to solving big matrix problems when the matrix is
data-sparse, i.e., when A ∈ IRN×N can be represented with many
fewer than N2 numbers.
What if N is so big that we cannot even store length-N vectors?
How could we apply (for example) the Rayleigh Quotient
procedure in such a situation?
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
What is this Lecture About?
A Very Large Eigenvalue Problem
We will look at a problem where A ∈ IR2d×2d is data sparse but
where d is sufficiently big to make the storage of length-2d vectors
impossible.
Vectors will be approximated by data sparse tensors of high order.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
A Very Large Eigenvalue Problem
A Problem From Quantum Chemistry
Given a 2d -by-2d symmetric matrix H, find a vector a that
minimizes
r(a) =
aTHa
aTa
Of course: a = amin, λ = r(amin) ⇒ Ha = λa.
What if d = 100?
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Perspective
H
The Google
Matrix ⇒
Both N -by-N matrices are data sparse, i.e., defined by many
fewer than N2 numbers.
A Very Large Eigenvalue Problem
The H-Matrix
H =
d∑
ij
tijH
T
i Hj +
d∑
ijkl
vijklH
T
i H
T
j HkHl
Hi = I2i−1 ⊗
[
0 1
0 0
]
⊗ I2d−i
T ∈ IRd×d is symmetric and V ∈ IRd×d×d×d has symmetries.
Sparsity
nzeros =
(
1
64
d4 − 3
32
d3 +
27
64
d2 − 11
32
d + 1
)
2d − 1
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
A Very Large Eigenvalue Problem
Modeling Electron Interactions
Have d “sites” (grid points) in physical space.
The goal is compute a wave function, an element of a 2d Hilbert
space.
The Hilbert space is a product of d , 2-dimensional Hilbert spaces.
(A site is either occupied or not occupied.)
A (discretized) wavefunction is a d-tensor, 2-by-2-by-2-by-2...
It is the vector that minimizes aTHa/aTa where...
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
A Very Large Eigenvalue Problem
The H-Matrix
H =
d∑
ij
tijH
T
i Hj +
d∑
ijkl
vijklH
T
i H
T
j HkHl
⇑
Kinetic
Energy
Weights
⇑
Potential
Energy
Weights
Hi = I2i−1 ⊗
[
0 1
0 0
]
⊗ I2d−i
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
A Very Large Eigenvalue Problem
Dealing with N = 2d ≈ 2100
Intractable: min
a ∈ IRN
aTHa
aTa
Tractable: min
a ∈ IRN
a data sparse
aTHa
aTa
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Tensor Networks
Definition
A tensor network is a tensor of high dimension that is built up
from many sparsely connected tensors of low-dimension.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
A(1)
A(2)
A(3) A(4) A(6) A(7)
A(5)
A(8)
A(9)
A(10)
Nodes are tensors and the edges are contractions.
A(1)
A(2)
A(3) A(4) A(6) A(7)
A(5)
A(8)
A(9)
A(10)
At each site there is a tensor. Its dimension is k + 1 where k is
the number of site neighbors. E.g.,
A
(2) = A(2)(1:m, 1:m, 1:2)
A
(4) = A(4)(1:m, 1:m, 1:m, 1:m, 1:2)
A(1)
A(2)
A(3) A(4) A(6) A(7)
A(5)
A(8)
A(9)
A(10)
Each edge represents a contraction, e.g.,
m∑
i2,3=1
A
(2)(i2,3, i2,5, n2) ∗ A
(3)(i1,3, i2,3, i3,4, n3)
A(n1, n2, n3, n4, n5, n6, n7, n8, n9, n10)
=
Σi1,3 , i1,8 , i2,3 , i2,5 , i3,4 , i4,5 , i4,6 , i4,10 , i6,7 , i7,10 , i8,9 , i8,10 , i9,10
A(1)(i1,3 , i1,8 , n1) ∗
A(2)(i2,3 , i2,5 , n2) ∗
A(3)(i1,3 , i2,3 , i3,4 , n3) ∗
A(4)(i3,4 , i4,5 , i4,6 , i4,10 , n4) ∗
A(5)(i2,5 , i4,5 , n5) ∗
A(6)(i4,6 , i6,7 , n6) ∗
A(7)(i6,7 , i7,10 , n7) ∗
A(8)(i1,8 , i8,9 , n8) ∗
A(9)(i8,9 , i9,10 , n9) ∗
A(10)(i4,10 , i7,10 , i9,10 , n10)
Star Network
A8)
A(1)
A(2)
A(7) A(9) A(3)
A(6)
A(5)
A(4)
Torus Network
A(1) A(2) A(3) A(4) A(5)
Grid Network
A(4,1) A(4,2) A(4,3) A(4,4) A(4,5)
A(3,1) A(3,2) A(3,3) A(3,4) A(3,5)
A(2,1) A(2,2) A(2,3) A(2,4) A(2,5)
A(1,1) A(1,2) A(1,3) A(1,4) A(1,5)
A 5-Site Linear Tensor Network
A(1) A(2) A(3) A(4) A(5)
A(1) : 2×m
A(2) : m×m× 2
A(3) : m×m× 2
A(4) : m×m× 2
A(5) : m× 2
m is a parameter, typically around 100. This defines a length-32
vector which we index using five subscripts...
Scalar Definition
a(n1, n2, n3, n4, n5)
=
m
Σ
i1=1
m
Σ
i2=1
m
Σ
i3=1
m
Σ
i4=1
A(1)(n1, i1) ∗ A
(2)(i1, i2, n2) ∗ A
(3)(i2, i3, n3) ∗ A
(4)(i3, i4, n4) ∗ A
(5)(i4, n5)
In general, a length-2d vector is represented by O(dm2) numbers.
Computing a(1:2,1:2,1:2,1:2,1:2)
A(1) A(2) A(3) A(4) A(5)
a ( 1 , 1 , 1 , 1 , 1 )
Computing a(1:2,1:2,1:2,1:2,1:2)
A(1) A(2) A(3) A(4) A(5)
a ( 2 , 1 , 1 , 1 , 1 )
Computing a(1:2,1:2,1:2,1:2,1:2)
A(1) A(2) A(3) A(4) A(5)
a ( 1 , 2 , 1 , 1 , 1 )
Computing a(1:2,1:2,1:2,1:2,1:2)
A(1) A(2) A(3) A(4) A(5)
a ( 1 , 1 , 2 , 1 , 2 )
Computing a(1:2,1:2,1:2,1:2,1:2)
A(1) A(2) A(3) A(4) A(5)
a ( 2 , 2 , 2 , 2 , 2 )
Linear Tensor Network
Recall the Block Vec Operation
[
F1
F2
]
⊗
G1G2
G3
⊗ [ H1
H2
] = [ F1
F2
]
⊗
G1H1
G1H2
G2H1
G2H2
G3H1
G3H2
=
F1G1H1
F1G1H2
F1G2H1
F1G2H2
F1G3H1
F1G3H2
F2G1H1
F2G1H2
F2G2H1
F2G2H2
F2G3H1
F2G3H2
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Linear Tensor Network
In the ”Language” of Block Vec Products...
a =
[
A11
A21
]
⊗
[
A12
A22
]
⊗ · · · ⊗
[
A1,d−1
A2,d−1
]
⊗
[
A1d
A2d
]
where [
A11
A21
]
=
[
wT1
wT2
]
2 row vectors
[
A1k
A2k
]
=
[
m-by-m
m-by-m
]
k = 2:d − 1
[
A1d
A2d
]
=
[
z1
z2
]
2 column vectors
a is a length-2d vector that depends on O(dm2) numbers
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Back to the Main Problem...
Constrained Minimization
Minimize
r(a) =
aTHa
aTa
subject to the constraint that
a =
[
A11
A21
]
⊗
[
A12
A22
]
⊗ · · · ⊗
[
A1,d−1
A2,d−1
]
⊗
[
A1d
A2d
]
Let us look at both the denominator and the numerator in light of the
fact that N = 2d .
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Avoiding O(2d)
2-Norm of a Linear Tensor Network...
If
a =
[
A11
A21
]
⊗
[
A12
A22
]
⊗ · · · ⊗
[
A1,d−1
A2,d−1
]
⊗
[
A1d
A2d
]
then
aTa = wT
(
d−1∏
k=2
(A1k ⊗ A1k) + (A2k ⊗ A2k)
)
z
where
w = A11 ⊗ A11 + A21 ⊗ A21 = w1 ⊗ w1 + w2 ⊗ w2
z = A1d ⊗ A1d + A2d ⊗ A2d = z1 ⊗ z1 + z2 ⊗ z2
A1k and A2k are m-by-m, k = 2:d − 1. Overall work is O(dm3).
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Avoiding O(d4)
Recall...
H =
d∑
ij
tijH
T
i Hj +
d∑
ijkl
vijklH
T
i H
T
j HkHl
Hi = I2i−1 ⊗
[
0 1
0 0
]
⊗ I2d−i
The V-Tensor Has Familiar Symmetries
V(i , j , k, `) =
V(j , i , k, `)
V(i , j , `, k)
V(k, `, i , j)
and so we can find symmetric matrices B1, . . . ,Br so
V = B1 ◦ B1 + · · ·+ Br ◦ Br
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Avoiding O(d4)
Idea
Approximate V with B1 ◦ B1 (or some short sum of the B’s)
because then vijk` = B1(i , j)B1(k, `) and
H =
d∑
ij
tijH
T
i Hj +
d∑
ijkl
vijklH
T
i H
T
j HkHl
=
d∑
ij
tijH
T
i Hj +
∑
ij
B1(i , j)HiHj
T ∑
ij
B1(i , j)HiHj
Think about aTHa and note that we have reduced evaluation by a factor
of O(d2).
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Optimization Approach
For k = 1:d ...
Minimize
r(a) =
aTHa
aTa
= r(A1k ,A2k)
subject to the constraint that
a =
[
A11
A21
]
⊗
[
A12
A22
]
⊗ · · · ⊗
[
A1,d−1
A2,d−1
]
⊗
[
A1d
A2d
]
and all by A1k and A2k are fixed.
This projected subproblem can reshaped into a smaller, 2m2-by-2m2
Rayleigh Quotient minimization...
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Optimization Approach
The Subproblem
Minimize
r(ak) =
aTk Hkak
aTk ak
where
ak =
[
vec(A1k)
vec(A2k)
]
and
Hk = T
T
k HTk Tk ∈ IR2
d×m2
can be formed in time polynomial in m.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Tensor-Based Thinking
Key Attributes
1 An ability to reason at the index-level about the constituent
contractions and the order of their evaluation.
2 An ability to reason at the block matrix level in order to
expose fast, underlying Kronecker product operations.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Data-Sparse Representations and Facorizations
How Could We Compute the QR Factorization of This?
[
F1
F2
]
⊗
G1G2
G3
⊗ [ H1
H2
] = [ F1
F2
]
⊗
G1H1
G1H2
G2H1
G2H2
G3H1
G3H2
=
F1G1H1
F1G1H2
F1G2H1
F1G2H2
F1G3H1
F1G3H2
F2G1H1
F2G1H2
F2G2H1
F2G2H2
F2G3H1
F2G3H2
Without “Leaving” the Data Sparse Representation?
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Data-Sparse Representations and Factorizations
QR Factorization and Block vector Products
If [
F1
F2
]
=
[
Q1
Q2
]
R
then[
F1
F2
]
⊗
G1G2
G3
⊗ [ H1
H2
]
=
[
Q1
Q2
]
⊗
RG1RG2
RG3
⊗ [ H1
H2
]
If RG1RG2
RG3
=
U1U2
U3
S
then...
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Data-Sparse Representations and Factorizations
QR Factorization and Block vector Products
[
F1
F2
]
⊗
G1G2
G3
⊗ [ H1
H2
]
=
[
Q1
Q2
]
⊗
U1U2
U3
⊗ [ SH1
SH2
]
If [
SH1
SH2
]
=
[
V1
V2
]
T
then[
F1
F2
]
⊗
G1G2
G3
⊗ [ H1
H2
]
=
[ Q1
Q2
]
⊗
U1U2
U3
⊗ [ V1
V2
]T
The Matrix in Parentheses is Orthogonal
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality
Summary of Lecture 9.
Key Words
The Curse of Dimensionality refers to the challenges that arise
when dimension increases.
Clever data-sparse representations are one way to address the
issues.
A tensor network is a way of combining low-order tensors to
obtain a high-order tensor.
Reliable methods that scale with dimension are the goal.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality