18.415/6.854 Advanced Algorithms October 1994
Linear Programming
Lecturer: Michel X. Goemans
1 An Introduction to Linear Programming
Linear programming is a very important class of problems, both algorithmically and
combinatorially. Linear programming has many applications. From an algorithmic
point-of-view, the simplex was proposed in the forties (soon after the war, and was
motivated by military applications) and, although it has performed very well in prac-
tice, is known to run in exponential time in the worst-case. On the other hand, since
the early seventies when the classes P and NP were defined, it was observed that linear
programming is in NP∩ co-NP although no polynomial-time algorithm was known at
that time. The first polynomial-time algorithm, the ellipsoid algorithm, was only dis-
covered at the end of the seventies. Karmarkar’s algorithm in the mid-eighties lead to
very active research in the area of interior-point methods for linear programming. We
shall present one of the numerous variations of interior-point methods in class. From
a combinatorial perspective, systems of linear inequalities were already studied at the
end of the last century by Farkas and Minkovsky. Linear programming, and especially
the notion of duality, is very important as a proof technique. We shall illustrate its
power when discussing approximation algorithms. We shall also talk about network
flow algorithms where linear programming plays a crucial role both algorithmically
and combinatorially. For a more in-depth coverage of linear programming, we refer
the reader to [1, 4, 7, 8, 5].
A linear program is the problem of optimizing a linear objective function in the
decision variables, x1 . . . xn, subject to linear equality or inequality constraints on the
xi’s. In standard form, it is expressed as:
Min
n∑
j=1
cjxj (objective function)
subject to:
n∑
j=1
aijxj = bi, i = 1 . . .m (constraints)
xj ≥ 0, j = 1 . . . n (non-negativity constraints)
where {aij, bi, cj} are given.
A linear program is expressed more conveniently using matrices:
min cTx subject to
{
Ax = b
x ≥ 0
LP-1
where
x =
x1
...
xn
∈ Rn×1
b =
b1
...
bm
∈ Rm×1
c =
c1
...
cn
∈ Rn×1
A =
a11
. . .
amn
∈ Rm×n
2 Basic Terminology
Definition 1 If x satisfies Ax = b, x ≥ 0, then x is feasible.
Definition 2 A linear program (LP) is feasible if there exists a feasible solution,
otherwise it is said to be infeasible.
Definition 3 An optimal solution x∗ is a feasible solution s.t. cTx∗ = min{cTx :
Ax = b, x ≥ 0}.
Definition 4 LP is unbounded (from below) if ∀λ ∈ R, ∃ a feasible x∗ s.t. cTx∗ ≤ λ.
3 Equivalent Forms
A linear program can take on several forms. We might be maximizing instead of
minimizing. We might have a combination of equality and inequality contraints.
Some variables may be restricted to be non-positive instead of non-negative, or be
unrestricted in sign. Two forms are said to be equivalent if they have the same set of
optimal solutions or are both infeasible or both unbounded.
1. A maximization problem can be expressed as a minimization problem.
max cTx⇔ min−cTx
2. An equality can be represented as a pair of inequalities.
aTi x = bi
⇔
{
aTi x ≤ bi
aTi x ≥ bi
⇔
{
aTi x ≤ bi
−aTi x ≤ −bi
LP-2
3. By adding a slack variable, an inequality can be represented as a combination
of equality and non-negativity constraints.
aTi x ≤ bi ⇔ aTi x+ si = bi, si ≥ 0.
4. Non-positivity constraints can be expressed as non-negativity constraints.
To express xj ≤ 0, replace xj everywhere with −yj and impose the condition
yj ≥ 0.
5. x may be unrestricted in sign.
If x is unrestricted in sign, i.e. non-positive or non-negative, everywhre replace
xj by x
+
j − x−j , adding the constraints x+j , x−j ≥ 0.
In general, an inequality can be represented using a combination of equality and
non-negativity constraints, and vice versa.
Using these rules, min
{
cTx s.t. Ax ≥ b} can be transformed into min
{
cTx+ − cTx−
s.t. Ax+ −Ax− − Is = b, x+, x−, s ≥ 0}. The former LP is said to be in canonical
form, the latter in standard form.
Conversely, an LP in standard form may be written in canonical form. min
{
cTx
s.t. Ax = b, x ≥ 0} is equivalent to min
{
cTx s.t. Ax ≥ b, −Ax ≥ −b, Ix ≥ 0}.
This may be rewritten as A
′
x ≥ b′ , where A′ =
A-A
I
and b′ =
b-b
0
.
4 Example
Consider the following linear program:
min x2 subject to
x1 ≥ 2
3x1 − x2 ≥ 0
x1 + x2 ≥ 6
−x1 + 2x2 ≥ 0
The optimal solution is (4, 2) of cost 2 (see Figure 1). If we were maximizing x2
instead of minimizing under the same feasible region, the resulting linear program
would be unbounded since x2 can increase arbitrarily. From this picture, the reader
should be convinced that, for any objective function for which the linear program is
bounded, there exists an optimal solution which is a “corner” of the feasible region.
We shall formalize this notion in the next section.
LP-3
(2,1)
(2,4)
x1
x2
3x − x > 021
1
1
1
x + x > 6
x > 2
2
2−x + 2x > 0
Figure 1: Graph representing primal in example.
An example of an infeasible linear program can be obtained by reversing some of
the inequalities of the above LP:
x1 ≤ 2
3x1 − x2 ≥ 0
x1 + x2 ≥ 6
−x1 + 2x2 ≤ 0.
5 The Geometry of LP
Let P = {x : Ax = b, x ≥ 0} ⊆ Rn.
Definition 5 x is a vertex of P if 6 ∃y 6= 0 s.t. x+ y, x− y ∈ P .
Theorem 1 Assume min{cTx : x ∈ P} is finite, then ∀x ∈ P, ∃ a vertex x′ such that
cTx
′ ≤ cTx.
Proof:
If x is a vertex, then take x
′
= x.
If x is not a vertex, then, by definition, ∃y 6= 0 s.t. x + y, x − y ∈ P . Since
A(x+ y) = b and A(x− y) = b, Ay = 0.
LP-4
WLOG, assume cTy ≤ 0 (take either y or −y). If cTy = 0, choose y such that ∃j
s.t. yj < 0. Since y 6= 0 and cTy = cT (−y) = 0, this must be true for either y or −y.
Consider x + λy, λ > 0. cT (x + λy) = cTx + λcTy ≤ cTx, since cTy is assumed
non-positive.
Case 1 ∃j such that yj < 0
As λ increases, component j decreases until x+ λy is no longer feasible.
Choose λ = min{j:yj<0}{xj/−yj} = xk/−yk. This is the largest λ such that
x + λy ≥ 0. Since Ay = 0, A(x + λy) = Ax + λAy = Ax = b. So x + λy ∈ P ,
and moreover x+ λy has one more zero component, (x+ λy)k, than x.
Replace x by x+ λy.
Case 2 yj ≥ 0 ∀j
By assumption, cTy < 0 and x+ λy is feasible for all λ ≥ 0, since A(x+ λy) =
Ax+λAy = Ax = b, and x+λy ≥ x ≥ 0. But cT (x+λy) = cTx+λcTy → −∞
as λ→∞, implying LP is unbounded, a contradiction.
Case 1 can happen at most n times, since x has n components. By induction on
the number of non-zero components of x, we obtain a vertex x
′
.
✷
Remark: The theorem was described in terms of the polyhedral set P = {x :
Ax = b : x ≥ 0}. Strictly speaking, the theorem is not true for P = {x : Ax ≥
b}. Indeed, such a set P might not have any vertex. For example, consider P =
{(x1, x2) : 0 ≤ x2 ≤ 1} (see Figure 2). This polyhedron has no vertex, since for any
x ∈ P , we have x + y, x − y ∈ P , where y = (1, 0). It can be shown that P has a
vertex iff Rank(A) = n. Note that, if we transform a program in canonical form into
standard form, the non-negativity constraints imply that the resulting matrix A has
full column rank, since
Rank
A-A
I
= n.
Corollary 2 If min{cTx : Ax = b, x ≥ 0} is finite, There exists an optimal solution,
x∗, which is a vertex.
Proof:
Suppose not. Take an optimal solution. By Theorem 1 there exists a vertex
costing no more and this vertex must be optimal as well. ✷
Corollary 3 If P = {x : Ax = b, x ≥ 0} 6= ∅, then P has a vertex.
LP-5
(0,1)
(0,0) x
x
1
2
Figure 2: A polyhedron with no vertex.
Theorem 4 Let P = {x : Ax = b, x ≥ 0}. For x ∈ P , let Ax be a submatrix of A
corresponding to j s.t. xj > 0. Then x is a vertex iff Ax has linearly independent
columns. (i.e. Ax has full column rank.)
Example A =
2 1 3 07 3 2 1
0 0 0 5
x =
2
0
1
0
Ax =
2 37 2
0 0
, and x is a vertex.
Proof:
Show ¬i→ ¬ii.
Assume x is not a vertex. Then, by definition, ∃y 6= 0 s.t. x + y, x − y ∈ P .
Let Ay be submatrix corresponding to non-zero components of y.
As in the proof of Theorem 1,
Ax + Ay = b
Ax − Ay = b
}
⇒ Ay = 0.
Therefore, Ay has dependent columns since y 6= 0.
Moreover,
x + y ≥ 0
x − y ≥ 0
}
⇒ yj = 0 whenever xj = 0.
Therefore Ay is a submatrix of Ax. Since Ay is a submatrix of Ax, Ax has
linearly dependent columns.
LP-6
Show ¬ii→ ¬i.
Suppose Ax has linearly dependent columns. Then ∃y s.t. Axy = 0, y 6= 0.
Extend y to Rn by adding 0 components. Then ∃y ∈ Rn s.t. Ay = 0, y 6= 0
and yj = 0 wherever xj = 0.
Consider y
′
= λy for small λ ≥ 0. Claim that x+ y′, x− y′ ∈ P , by argument
analogous to that in Case 1 of the proof of Theorem 1, above. Hence, x is not
a vertex.
✷
6 Bases
Let x be a vertex of P = {x : Ax = b, x ≥ 0}. Suppose first that |{j : xj > 0}| = m
(where A is m × n). In this case we denote B = {j : xj > 0}. Also let AB = Ax; we
use this notation not only for A and B, but also for x and for other sets of indices.
Then AB is a square matrix whose columns are linearly independent (by Theorem
4), so it is non-singular. Therefore we can express x as xj = 0 if j 6∈ B, and since
ABxB = b, it follows that xB = A
−1
B b. The variables corresponding to B will be called
basic. The others will be referred to as nonbasic. The set of indices corresponding to
nonbasic variables is denoted by N = {1, . . . , n} − B. Thus, we can write the above
as xB = A
−1
B b and xN = 0.
Without loss of generality we will assume that A has full row rank, rank(A) = m.
Otherwise either there is a redundant constraint in the system Ax = b (and we can
remove it), or the system has no solution at all.
If |{j : xj > 0}| < m, we can augment Ax with additional linearly independent
columns, until it is an m×m submatrix of A of full rank, which we will denote AB.
In other words, although there may be less than m positive components in x, it is
convenient to always have a basis B such that |B| = m and AB is non-singular. This
enables us to always express x as we did before, xN = 0, xB = A
−1
B b.
Summary x is a vertex of P iff there is B ⊆ {1, . . . , n} such that |B| = m and
1. xN = 0 for N = {1, . . . , n} −B
2. AB is non-singular
3. xB = A
−1
B b ≥ 0
In this case we say that x is a basic feasible solution. Note that a vertex can have
several basic feasible solution corresponding to it (by augmenting {j : xj > 0} in
different ways). A basis might not lead to any basic feasible solution since A−1B b is
not necessarily nonnegative.
LP-7
Example:
x1 + x2 + x3 = 5
2x1 − x2 + 2x3 = 1
x1, x2, x3 ≥ 0
We can select as a basis B = {1, 2}. Thus, N = {3} and
AB =
(
1 1
2 −1
)
A−1B =
(
1
3
1
3
2
3
−1
3
)
A−1B b =
(
2
3
)
x = (2, 3, 0)
Remark. A crude upper bound on the number of vertices of P is
(
n
m
)
. This number
is exponential (it is upper bounded by nm). We can come up with a tighter approx-
imation of
(
n−m
2
m
2
)
, though this is still exponential. The reason why the number is
much smaller is that most basic solutions to the system Ax = b (which we counted)
are not feasible, that is, they do not satisfy x ≥ 0.
7 The Simplex Method
The Simplex algorithm [Dantzig,1947] [2] solves linear programming problems by
focusing on basic feasible solutions. The basic idea is to start from some vertex v and
look at the adjacent vertices. If an improvement in cost is possible by moving to one
of the adjacent vertices, then we do so. Thus, we will start with a bfs corresponding
to a basis B and, at each iteration, try to improve the cost of the solution by removing
one variable from the basis and replacing it by another.
We begin the Simplex algorithm by first rewriting our LP in the form:
min cBxB + cNxN
s.t. ABxB + ANxN = b
xB, xN ≥ 0
Here B is the basis corresponding to the bfs we are starting from. Note that, for
any solution x, xB = A
−1
B b − A−1B ANxN and that its total cost, cTx can be specified
as follows:
LP-8
cTx = cBxB + cNxN
= cB(A
−1
B b− A−1B ANxN ) + cNxN
= cBA
−1
B b+ (cN − cBA−1B AN)xN
We denote the reduced cost of the non-basic variables by c˜N , c˜N = cN −cBA−1B AN ,
i.e. the quantity which is the coefficient of xN above. If there is a j ∈ N such that
c˜j < 0, then by increasing xj (up from zero) we will decrease the cost (the value of
the objective function). Of course xB depends on xN , and we can increase xj only as
long as all the components of xB remain positive.
So in a step of the Simplex method, we find a j ∈ N such that c˜j < 0, and increase
it as much as possible while keeping xB ≥ 0. It is not possible any more to increase
xj , when one of the components of xB is zero. What happened is that a non-basic
variable is now positive and we include it in the basis, and one variable which was
basic is now zero, so we remove it from the basis.
If, on the other hand, there is no j ∈ N such that c˜j < 0, then we stop, and
the current basic feasible solution is an optimal solution. This follows from the new
expression for cTx since xN is nonnegative.
Remarks:
1. Note that some of the basic variables may be zero to begin with, and in this
case it is possible that we cannot increase xj at all. In this case we can replace
say j by k in the basis, but without moving from the vertex corresponding to
the basis. In the next step we might replace k by j, and be stuck in a loop.
Thus, we need to specify a “pivoting rule” to determine which index should
enter the basis, and which index should be removed from the basis.
2. While many pivoting rules (including those that are used in practice) can lead
to infinite loops, there is a pivoting rule which will not (known as the minimal
index rule - choose the minimal j and k possible [Bland, 1977]). This fact was
discovered by Bland in 1977. There are other methods of “breaking ties” which
eliminate infinite loops.
3. There is no known pivoting rule for which the number of pivots in the worst
case is better than exponential.
4. The question of the complexity of the Simplex algorithm and the last remark
leads to the question of what is the length of the shortest path between two
vertices of a convex polyhedron, where the path is along edges, and the length
of the path in measured in terms of the number of vertices visited.
LP-9
Hirsch Conjecture: For m hyperplanes in d dimensions the length of the
shortest path between any two vertices of the arrangement is at most m− d.
This is a very open question — there is not even a polynomial bound proven
on this length.
On the other hand, one should note that even if the Hirsch Conjecture is true,
it doesn’t say much about the Simplex Algorithm, because Simplex generates
paths which are monotone with respect to the objective function, whereas the
shortest path need not be monotone.
Recently, Kalai (and others) has considered a randomized pivoting rule. The
idea is to randomly permute the index columns of A and to apply the Simplex
method, always choosing the smallest j possible. In this way, it is possible to
show a subexponential bound on the expected number of pivots. This leads to
a subexponential bound for the diameter of any convex polytope defined by m
hyperplanes in a d dimension space.
The question of the existence of a polynomial pivoting scheme is still open
though. We will see later a completely different algorithm which is polynomial,
although not strongly polynomial (the existence of a strongly polynomial algo-
rithm for linear programming is also open). That algorithm will not move from
one vertex of the feasible domain to another like the Simplex, but will confine
its interest to points in the interior of the feasible domain.
A visualization of the geometry of the Simplex algorithm can be obtained from
considering the algorithm in 3 dimensions (see Figure 3). For a problem in the form
min{cTx : Ax ≤ b} the feasible domain is a polyhedron in R3, and the algorithm
moves from vertex to vertex in each step (or does not move at all).
8 When is a Linear Program Feasible ?
We now turn to another question which will lead us to important properties of linear
programming. Let us begin with some examples.
We consider linear programs of the form Ax = b, x ≥ 0. As the objective function
has no effect on the feasibility of the program, we ignore it.
We first restrict our attention to systems of equations (i.e. we neglect the non-
negativity constraints).
Example: Consider the system of equations:
x1 + x2 + x3 = 6
2x1 + 3x2 + x3 = 8
2x1 + x2 + 3x3 = 0
and the linear combination
LP-10
Objective
function
Figure 3: Traversing the vertices of a convex body (here a polyhedron in R3).
−4 × x1 + x2 + x3 = 6
1 × 2x1 + 3x2 + x3 = 8
1 × 2x1 + x2 + 3x3 = 0
The linear combination results in the equation
0x1 + 0x2 + 0x3 = −16
which means of course that the system of equations has no feasible solution.
In fact, an elementary theorem of linear algebra says that if a system has no
solution, there is always a vector y such as in our example (y = (−4, 1, 1)) which
proves that the system has no solution.
Theorem 5 Exactly one of the following is true for the system Ax = b:
1. There is x such that Ax = b.
2. There is y such that ATy = 0 but yT b = 1.
This is not quite enough for our purposes, because a system can be feasible,
but still have no non-negative solutions x ≥ 0. Fortunately, the following lemma
establishes the equivalent results for our system Ax = b, x ≥ 0.
Theorem 6 (Farkas’ Lemma) Exactly one of the following is true for the system
Ax = b, x ≥ 0:
LP-11
1. There is x such that Ax = b, x ≥ 0.
2. There is y such that ATy ≥ 0 but bTy < 0.
Proof:
We will first show that the two conditions cannot happen together, and then than
at least one of them must happen.
Suppose we do have both x and y as in the statement of the theorem.
Ax = b =⇒ yTAx = yT b =⇒ xTATy = yT b
but this is a contradiction, because yT b < 0, and since x ≥ 0 and ATy ≥ 0, so
xTAT y ≥ 0.
The other direction is less trivial, and usually shown using properties of the Sim-
plex algorithm, mainly duality. We will use another tool, and later use Farkas’ Lemma
to prove properties about duality in linear programming. The tool we shall use is the
Projection theorem, which we state without proof:
Theorem 7 (Projection Theorem) Let K be a closed convex (see Figure 4) non-
empty set in Rn, and let b be any point in Rn. The projection of b onto K is a point
p ∈ K that minimizes the Euclidean distance ‖b − p‖. Then p has the property that
for all z ∈ K, (z − p)T (b− p) ≤ 0 (see Figure 5) non-empty set.
not convex convex
Figure 4: Convex and non-convex sets in R2.
We are now ready to prove the other direction of Farkas’ Lemma. Assume that
there is no x such that Ax = b, x ≥ 0; we will show that there is y such that ATy ≥ 0
but yT b < 0.
Let K = {Ax : x ≥ 0} ⊆ Rm (A is an m× n matrix). K is a cone in Rm and it is
convex, non-empty and closed. According to our assumption, Ax = b, x ≥ 0 has no
solution, so b does not belong to K. Let p be the projection of b onto K.
LP-12
b
p
z
Figure 5: The Projection Theorem.
Since p ∈ K, there is a w ≥ 0 such that Aw = p. According to the Projection
Theorem, for all z ∈ K, (z−p)T (b−p) ≤ 0 That is, for all x ≥ 0 (Ax−p)T (b−p) ≤ 0
We define y = p−b, which implies (Ax−p)T y ≥ 0. Since Aw = p, (Ax−Aw)Ty ≥
0. (x− w)T (ATy) ≥ 0 for all x ≥ 0 (remember that w was fixed by choosing b).
Set x = w +
0
0
...
1
...
0
(w plus a unit vector with a 1 in the i-th row). Note that x
is non-negative, because w ≥ 0.
This will extract the i-th column of A, so we conclude that the i-th component of
ATy is non-negative (ATy)i ≥ 0, and since this is true for all i, ATy ≥ 0.
Now it only remains to show that yT b < 0.
ytb = (p−y)Ty = pTy−yTy Since (Ax−p)Ty ≥ 0 for all x ≥ 0, taking x to be zero
shows that pTy ≤ 0. Since b 6∈ K, y = p−b 6= 0, so yTy > 0. So yT b = pTy−yTy < 0.
✷
Using a very similar proof one can show th