Active Wavelet Networks for Face Alignment
Changbo Hu, Rogerio Feris, Matthew Turk
Dept. Computer Science, University of California, Santa Barbara
{cbhu,rferis,mturk}@cs.ucsb.edu
Abstract
The active appearance model (AAM) algorithm has proved to be a successful
method for face alignment and synthesis. By elegantly combining both shape
and texture models, AAM allows fast and robust deformable image match-
ing. However, the method is sensitive to partial occlusions and illumination
changes. In such cases, the PCA-based texture model causes the reconstruc-
tion error to be globally spread over the image. In this paper, we propose
a new method for face alignment called active wavelet networks (AWN),
which replaces the AAM texture model by a wavelet network representa-
tion. Since we consider spatially localized wavelets for modeling texture,
our method shows more robustness against partial occlusions and some illu-
mination changes.
1 Introduction
Many computer vision tasks such as face recognition and facial expression analysis re-
quire the accurate alignment between a given face and a canonical face. Extensive re-
search has been conducted on this topic, especially using model-based approaches [5, 3].
Among model-based methods, the active appearance model (AAM) algorithm [5] has
achieved good results in face alignment. The method makes use of statistical models of
both shape and texture, allowing fast and robust deformable image matching. Several
variations of AAM have also been proposed to improve the original algorithm, namely
view-based AAM [6], Direct Appearance Models [9], a compositional approach [2] and
3D AAM [1].
Despite the success of AAM and its variations, problems still remain to be solved.
AAM is sensitive to illumination changes, especially if the lighting in the test image sig-
nificantly differs from the lighting encoded in the training set. Moreover, under the pres-
ence of partial occlusion, the PCA-based texture model of AAM causes the reconstruction
error to be globally spread over the image, thus impairing alignment.
Figure 1: (a) Partial occluded image. (b) PCA reconstruction. Note that the error is spread
over the image. (c) Wavelet reconstruction.
This paper proposes a new method, called active wavelet networks (AWN), in which
a Gabor wavelet network representation (GWN) [11] is used to model the texture vari-
ation in the training set. The GWN approach represents a face image through a linear
combination of 2D Gabor functions whose parameters (position, scale and orientation)
and weights are optimally determined to preserve the maximum image information for a
chosen number of wavelets.
Because of the localization property of wavelets, when partial occlusion or highlight
illumination problems arise, the matching is more robust than with AAM. Figure 1 illus-
trates a comparison between PCA and a Gabor wavelet reconstruction for a partial oc-
cluded face image. Note that the error is globally spread over the image in PCA, whereas
it remains local in the wavelet representation.
Our method also offers some advantages regarding efficiency, since some computa-
tions are limited to the spatial support of the filters, rather than the whole image.
The remainder of this paper is organized as follows. Related work is discussed in
Section 2. In Section 3, we present the AWN approach for face alignment. Experiments
are presented in Section 4 and conclusions are given in Section 5.
2 Related Work
Recently, Gabor wavelet networks [11] has been proposed as an effective approach for
object representation, with successful applications in face tracking [7] and pose estimation
[10]. The idea is to represent an object as a linear combination of 2D Gabor wavelets, with
parameters and weights optimally determined from the continuous space. Face alignment
may be performed by affinely transforming the wavelet representation to match a new
image. However, since a GWN is optimized for a particular image, alignment becomes a
problem when different individuals are considered.
This problem was tackled by Feris et al. [8] using a set of exemplars, but the method
is computationally expensive. The solution we adopt in this paper is to optimize a single
GWN over a set of face images. This requires all the face images to have the same shape.
More details about this technique will be described in section 3.2.
Another wavelet-based approach is the bunch-graph method [13]. In this approach,
Gabor jets are used as feature vectors and an elastic graph matching algorithm is adopted
for face alignment. Our method is faster than the bunch graph method because we use a
sparser representation based on GWN and a more efficient search technique based on the
active appearance algorithm.
The work most closely related to this paper is the active appearance model method
[5]. AAM uses PCA to encode both shape and texture variation, as well as the correla-
tions between them. By assuming a linear relationship between appearance variation and
texture variation and between texture variation and pose variation, AAM learns the linear
regression models from training data. The model search is driven by the residual of the
search image and model reconstruction.
The Shape-AAM method [4] is a variation of the standard AAM algorithm, generally
applied when shape modes are fewer than texture modes. Instead of manipulating the
combined appearance parameters, ShapeAAM uses image residuals to drive shape and
pose parameters, and then compute texture parameters directly from the image, given the
current shape. In this paper, our texture model is the GWN representation from the shape-
free image set and the search method is similar to Shape-AAM. Our method enables the
search to be robust to partial occlusion and some illumination changes, while providing
more efficiency.
It is worth mentioning that AAM is better suited for image synthesis than our method,
since a small number of eigenfaces can generate photo-realistic images, whereas a high
number of wavelets would be required for this purpose.
3 Active Wavelet Networks
In this section, we introduce active wavelet networks for face alignment. Our method
starts with a training set, in which each image is labelled with landmark points on the
subject’s face. Thus, each sample has a labelled shape and an image texture.
Consider the training set of shape and texture to be Ω = {(xi,gxi )}, i = 1...N,where N
is the number of training images, xi = {(xij,yij)}, j = 1...M, is a shape specified by a set of
M points, and gxi is the texture enclosed by the shape xi. We model the shape variation by
PCA, and the texture is represented by a GWN model. We will describe the shape model
and the GWN texture representation in the following subsections.
3.1 Statistical Shape Model
Given the training set, all shapes are aligned to a common coordinate frame and then
the shape variation can be modelled by PCA in a lower dimensional shape space. So, a
Figure 2: (a) Labelled training image. (b) Shape-free texture.
normalized shape x can be approximated as:
x = x¯+Pb (1)
where x¯ is the mean shape, P is a set of orthogonal modes of variation and b is a set of
shape parameters. Using the shape landmarks as control points, we can warp the training
images to the mean shape. Figure 2 illustrates a labelled image and its texture warped
into the mean shape. The set of shape-free textures G = {gx¯i }, i = 1...N is used to learn
the GWN representation, as described next.
3.2 Wavelet Network Model
We have used a wavelet network to model the face texture as an alternative to Principal
Component Analysis in standard AAM. As already mentioned, the use of spatially local-
ized wavelets allows more robustness with respect to partial occlusions and illumination
changes.
The constituents of a wavelet network are single wavelets and their associated coeffi-
cients. We adopted the odd-Gabor function as the mother wavelet. It is well known that
Gabor filters are recognized as good feature detectors and provide the best trade-off be-
tween spatial and frequency resolution [12]. Considering the 2D image case, each single
odd Gabor wavelet can be expressed as follows:
ψn(x) = exp
[
−1
2
(S(x−µ))T (S(x−µ))
]
× sin
[
(S(x−µ))T
(
1
0
)]
(2)
where x represents image coordinates and n = (sx,sy,θ ,µx,µy) are parameters which
compose the terms S =
(
sx cosθ −sy sinθ
sx sinθ sy cosθ
)
, and µ =
(
µx
µy
)
, that allow scaling,
orientation, and translation. A Gabor wavelet network for a given image consists in a set
of n wavelets {ψnk} and a set of associated weights {wk}, specifically chosen so that the
GWN reconstruction:
ˆI(x) =
n
∑
k=1
wkψnk(x) (3)
116 216 original
Number of Wavelets
52
Figure 3: The image shows a facial reconstructions with variable accuracy, considering
(from left to right) 52, 116 and 216 wavelets.
best approximates the target image. We modified the original formulation of GWNs to
allow the optimization of a single GWN in a set of shape-free images, obtained through
warping, as described in previous section.
3.2.1 Calculation of Wavelet Parameters
Assuming that we have a set of shape-free face images of different people, {gx¯i },1 ≤
i ≤ N, that are truncated to the region that the face occupies, we can calculate the GWN
representation parameters as follows:
1. Randomly drop n wavelets of assorted position, scale, and orientation, within the
bounds of the normalized face images.
2. Perform gradient descent (e.g., via Levenberg-Marquardt optimization) over the set
of wavelet parameters to minimize the total sum of differences between the training
images and their wavelet reconstructions:
arg min
nk,wik
∥∥∥∥∥ N∑i=1 gx¯i − (
n
∑
k=1
wikψnk(x))
∥∥∥∥∥
2
. (4)
One advantage of the GWN approach is that one can trade-off computational effort
with representational accuracy, by increasing or decreasing the number n of wavelets (see
Figure 3).
3.2.2 Calculation of Texture Parameters
In the standard Shape-AAM method, the texture parameters for a given image are com-
puted by projecting the image into an eigenspace learned from the training set. In our
method, the texture parameters {tk},k= 1...n correspond to wavelet coefficients, obtained
by orthogonally projecting the image into the learned wavelet subspace.
However, Gabor wavelet functions are not orthogonal, thus implying that, for a given
family Ψ of Gabor wavelets, it is not possible to calculate tk by a simple inner product of
the Gabor wavelet ψnk with the image.
In fact, a family of dual wavelets ˜Ψ= {ψ˜n1 . . . ψ˜nn} has to be considered. The wavelet
ψ˜n j is the dual wavelet of the wavelet ψni iff 〈ψni , ψ˜n j〉= δi, j.
Given a normalized face image g and a set of optimized wavelets Ψ= {ψn1 , . . . ,ψnN},
the texture parameters are given by:
tk = 〈g, ψ˜nk〉. (5)
It can be shown that ψ˜nk = ∑l
(
A−1
)
k,l ψnl , where A is the wavelet interference matrix,
with Ak,l = 〈ψnk ,ψnl 〉.
3.3 AWN Search
Given a new face image, and a rough estimation of face pose, the search process aims to
determine shape and pose parameters that best fit the model into the new image. The AWN
search algorithm is a variation of the Shape-AAM method, where the main difference is
the calculation of texture parameters and image reconstruction, which are based on the
GWN model.
Let gx¯ be the normalized image enclosed by a shape x and gˆx¯ its GWN reconstruction.
The residual between both images is:
δg = gx¯− gˆx¯ (6)
The residual δg is used to drive the shape parameters b and the affine pose parameters
p, assuming a linear relationship:
δb = Bδg, δp = Pδg (7)
where the two regression matrices B and P are computed offline, by perturbing the face
model parameters on training data. Our search algorithm can be described as follows.
Given a new face image, a starting shape x and pose p,
1. Sample the image enclosed by the current shape and normalize it to obtain gx¯
2. Use GWN to compute texture parameters using eq. 5 and reconstruct the texture
gˆx¯ = ∑nk=1 tkψnk .
3. Compute the residual using eq. 6
4. Predict the the shape and the pose parameters using eq. 7.
5. If the change of δg is small enough or the maximum number of iterations was
reached, stop; else go to step 1.
A successful search results in a AWN model that is well aligned with the input face
image.
0 10 20 30 40 50 60 70 80 90 100
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Image Number
Av
era
ge
Po
int
Er
ror
AAM
AWN
Figure 4: Comparison of precision of AAM and AWN in images captured under normal
conditions. The two methods achieve similar performance.
4 Experimental Results
In this section, we present our experimental results, comparing our method with AAM.
We have learned the AWN and AAM models in a training set containing 40 face images
of different individuals. We derived a statistical shape model using a 15-dimensional
eigenspace, capturing 98% of variation in the training set. The texture model for AAM
was built using 75 modes, also capturing 98% of the total texture variation.
Considering a small test database of 90 images of different individuals under normal
conditions, the performance of AWN and AAM is similar, as shown in Figure 4. The
graph shows the average error per shape point for both methods for each image in the test
set. We tested several different wavelet subspaces for the AWN texture model, varying the
number and initial parameters of wavelets. Nine wavelets were used for comparison in
Figure 4. We also verified that the range of initial location for good convergence is about
8 pixels in both methods. This range could in fact be enlarged by a multi-scale approach.
In this experiment, the initial model location was randomly chosen with maximum range
of 8 pixels from the ground-truth.
We evaluated the performance of AAM and AWN against occlusions in simulated
images, using white patches of different sizes at random locations. Figure 5 shows ex-
amples of such simulated images. The correct convergence rate in 50 partial occluded
images, for both AAM and AWN methods, is shown as a function of the occluding patch
size in Figure 6. The occluded parts are easily detected in our wavelet representation and
the algorithm compensates to perform robust alignment. In this experiment, we used 60
small-support wavelets in the wavelet network model. We considered the convergence to
be correct when the average error per shape point was less than 3.5 pixels. Figure 7 shows
an example of alignment on a face with sunglasses, where AAM fails due to the occluded
Figure 5: Examples of random occluding white patches of sizes 5x5,10x10 up to 50x50.
0 10 20 30 40 50 60
0
0.2
0.4
0.6
0.8
1
Occluding patch size
Co
nv
er
ge
nc
e
pe
rc
en
ta
ge
AAM
AWN
Figure 6: The graph shows the correct convergence rate in 50 partial occluded images, for
both AAM and AWN methods, considering each size of the occluding patch.
part, while our method works well.
We also did experiments with images under different illumination conditions. AWN
showed to be more robust due to the fact that local highlight changes have a correspondent
local effect in the representation. Figure 8 shows an example of matching under varying
illumination where our method performs better than AAM. In future work we intend to
use an explicit illumination model and also quantitatively evaluate the robustness of AWN
against illumination changes.
The AWN approach offers the advantage in efficiency of using just the support of Ga-
bor filters, rather than the whole image, for faster subspace projection and reconstruction.
So far, this improvement has been minor because the time for image reconstruction in
AAM is not significant compared to the time for other computations, such as warping the
image into the mean shape. Our method takes 10ms per iteration in a 1.6 GHz Pentium
IV, using images of size 128x192.
5 Conclusions
We have presented a new method for automatic face alignment called active wavelet net-
works (AWN). Our method takes the advantages of active appearance models for de-
Figure 7: Example of sensitivity of AAM (upper) and AWN (lower) to partial occlusion.
Images from left to right are initialization, iteration 2 and final matches.
Figure 8: Example of sensitivity of AAM (upper) and AWN (lower) to illumination
changes. Images from left to right are initialization, iteration 2 and final matches.
formable image matching, while relying in a texture model based on Gabor wavelet net-
works. Our experimental results show that AWN is more robust to illumination changes
and partial occlusion than AAM.
As future work, we plan to extend our approach to view-based face alignment and
recognition. We also intend to use an explicit model for handling illumination changes.
Acknowledgments
We are thankful to Volker Krueger for valuable discussions and part of the source code.
References
[1] J. Ahlberg. Using the active appearance algorithm for face and facial feature track-
ing. In ICCV’01 Workshop on Recognition, Analysis and Tracking of Faces and
Gestures in Real-Time Systems, Vancouver, BC, Canada, 2001.
[2] S. Baker and I. Matthews. Equivalency and efficiency of image alignment algo-
rithms. In Computer Vision and Pattern Recognition, pages 1090–1097, 2001.
[3] T. Cootes, D. Cooper, C. Taylor, and J. Graham. Active shape models – their training
and application. Computer Vision and Image Understanding, 61(1):38–59, 1995.
[4] T. Cootes, G. Edwards, and C. Taylor. A comparative evaluation of active appear-
ance model algorithms. In British Machine Vision Conference, pages 680–689,
Southampton, UK, 1998.
[5] T. Cootes, G. Edwards, and C. Taylor. Active appearance models. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, 23(6):681–685, 2001.
[6] T. Cootes, G. Wheeler, K. Walker, and C. Taylor. View-based active appearance
models. Image and Vision Computing, 20:657–664, 2002.
[7] R. Feris, J. Gemmell, K. Toyama, and V. Krueger. Hierarchical wavelet networks
for facial feature localization. In ICCV’01 Workshop on Recognition, Analysis and
Tracking of Faces and Gestures in Real-Time Systems, Vancouver, BC, Canada,
2001.
[8] R. Feris, V. Krueger, and R. Cesar Jr. Efficient real-time face tracking in wavelet
subspace. In ICCV’01 Workshop on Recognition, Analysis and Tracking of Faces
and Gestures in Real-Time Systems, Vancouver, BC, Canada, 2001.
[9] X. Hou, S. Li, H. Zhang, and Q. Cheng. Direct appearance models. In Computer
Vision and Pattern Recognition, pages 828–833, 2001.
[10] V. Krueger, S. Bruns, and G. Sommer. Efficient head pose estimation with ga-
bor wavelet networks. In British Machine Vision Conference, University of Bristol,
2000.
[11] V. Krueger and G. Sommer. Gabor wavelet networks for object representation. Jour-
nal of the Optical Society of America, 2002.
[12] B.S. Manjunath and R. Chellappa. A unified approach to boundary perception:
edges, textures, and illusory contours. IEEE Transactions on Neural Networks,
4(1):96–107, 1993.
[13] L. Wiskott, J. Fellous, N. Krueger, and C. Malsburg. Face recognition by elastic
bunch graph matching. IEEE Transactions on Pattern Analysis and Machine Intel-
ligence, 19:775–779, 1997.