## Description

2 Data Capture and Overview

Figure 1: You will capture your cellphone images to reconstruct camera pose and 3D

points.

In this assignment, you will use your cellphone images (more than 5) to reconstruct

3D camera poses and points with full bundle adjustment. Make sure you have enough

baseline (translation) between images for well conditioned fundamental matrix while

retaining enough number of correspondences between image. Avoid a scene dominated

by a planar surface, i.e., the images need to contain many 3D objects as shown in

Figure 1.

2

CSCI 5980: Assignment #5

Bundle Adjustment

You will write a full pipeline of the structure from motion algorithm including matching, camera pose estimation using fundamental matrix, PnP, triangulation, and bundle

adjustment. A nonlinear optimization is always followed by the initial estimate by

linear least squares solution. The pipeline is described in Algorithm 1.

Algorithm 1 Structure from Motion

1: [Mx, My] = GetMatches(I1, · · · , IN )

2: Normalize coordinate in Mx and My, i.e., x = K−1x.

3: Select two images Ii1 and Ii2 for the initial pair reconstruction.

4: [R, C, X] = CameraPoseEstimation([Mx(:,i1) My(:,i1)], [Mx(:,i2) My(:,i2)])

5: P = {P1,P2} where Pi1 =

I3 0

, Pi2 = R

I3 −C

6: R = {i1, i2}

7: while |R| < N do
8: i = GetBestFrame(Mx, My, R);
9: [Ri
, Ci
] = PnP RANSAC([Mx(:,i) My(:,i)], X)
10: [Ri
, Ci
] = PnP Nonlinear(Ri Ci
, [Mx(:,i) My(:,i)], X)
11: Pi = Ri
I3 −Ci
12: for f = 1 : |R| do
13: U = FindUnreconstructedPoints(X, Rf , i, Mx, My)
14: for j = 1 : |U| do
15: u = [Mx(Uj
, i), My(Uj
, i)] and v = [Mx(Uj
, Rf ), My(Uj
, Rf )]
16: x = LinearTriangulation(u, Pi
, v, PRf
)
17: x = NonlinearTriangulation(X, u, Ri
, Ci
, v, RRf
, CRf
)
18: X = X ∪ x
19: end for
20: end for
21: P = P ∪ Pi and R = R ∪ i.
22: [P, X] = BundleAdjustment(P, X, R, Mx, My)
23: end while
3
CSCI 5980: Assignment #5
Bundle Adjustment
3 Matching
Given a set of images, I1, · · · , IN , you will find matches across all images where N is
the number of images similar to HW #4. Pick a reference image, Iref , and match with
other images using SIFT features from VLFeat, i.e., Iref ↔ I1, · · · , Iref ↔ IN (no need
to match Iref ↔ Iref ).
Your matches are outlier free, i.e., bidirectional knn match → ratio test → inliers from
the fundamental matrix based RANSAC. Based on the matches, you will build a measurement matrix, Mx and My:
[Mx, My] = GetMatches(I1, · · · , IN )
Mx: F×N matrix storing x coordinate of correspondences
My: F×N matrix storing y coordinate of correspondences
The f
th feature point in image Ii corresponds to a point in image Ij
. The x and y
coordinates of the correspondence is stored at (f, i) and (f, j) elements in Mx and My,
respectively. If (f, i) does not correspond to any point in image Ik, you set -1 to indicate
no match as shown in Figure 2.
Important: For better numerical stability, you can transform the measurements to the
normalized coordinate by multiplying K−1
, i.e., x = K−1x where x is 2D measured
points in homogeneous coordinate. You can run structure from motion in the normalized coordinate by factoring out K. When visualizing projection in the image, the
coordinate needs to be transformed back to original coordinate by multiplying K.
F
N
f ,i x f , j f x
i j
Mx
N
f ,i y f , j f y
i j
My
( xf ,i , y f ,i) ↔ ( xf , j , y f , j) ↔ ( xf ,k , y f ,k )
−1
k
−1
k
Figure 2: The f
th feature point in image Ii corresponds to a point in image Ij
. The
x and y coordinates of the correspondence is stored at (f, i) and (f, j) elements in Mx
and My, respectively. If (f, i) does not correspond to any point in image Ik, you set -1
to indicate no match.
4
CSCI 5980: Assignment #5
Bundle Adjustment
4 Camera Pose Estimation
You will write a camera pose estimation code that takes correspondences between two
images, Ii1 and Ii2 where i1 and i2 are the indices of the initial images to reconstruct
selected manually.
[R, C, X] = CameraPoseEstimation(u1, u2)
R and C: the relative transformation of the i2 image
u1 and u2: 2D-2D correspondences
As studied in HW #4, you will compute:
1. Fundamental matrix via RANSAC on correspondences, Mx(:,i1), My(:,i2)
2. Essential matrix from the fundamental matrix
3. Four configurations of camera poses given the essential matrix
4. Disambiguation via chierality (using 3D point linear triangulation):
X = LinearTriangulation(u, Pi
, v, Pj )
Write-up:
(a) Inlier matches
Top view Oblique view
(b) 3D camera pose
Figure 3: Camera pose estimation.
(1) Visualize inlier matches as shown in Figure 3(a).
(2) Visualize camera pose and 3D reconstructed points in 3D as shown in Figure 3(b).
5
CSCI 5980: Assignment #5
Bundle Adjustment
5 Nonlinear 3D Point Refinement
You will write a nonlinear triangulation code. Given the linear estimate for the point
triangulation, X, you will refine the 3D point X =
X Y Z T
to minimize geometric
error (reprojection error) via iterative nonlinear least squares estimation,
∆X =
∂f(X)
∂X
T
∂f(X)
∂X
!−1
∂f(X)
∂X
T
(b − f(X)). (1)
Write-up:
(1) Derive the point Jacobian, i.e., ∂f(X)j
∂X
and write the following code.
df dX = JacobianX(K, R, C, X)
(2) Write a code to refine the 3D point by minimizing the reprojection error and visualize reprojection error reduction similar to Figure 5.
X = NonlinearTriangulation(X, u1, R1, C1, u2, R2, C2)
Algorithm 2 Nonlinear Point Refinement
1: b =
u
T
1 u
T
2
T
2: for j = 1 : nIters do
3: Build point Jacobian, ∂f(X)j
∂X
.
4: Compute f(X).
5: ∆X =
∂f(X)
∂X
T ∂f(X)
∂X + λI
−1
∂f(X)
∂X
T
(b − f(X))
6: X = X + ∆X
7: end for
6
CSCI 5980: Assignment #5
Bundle Adjustment
6 Camera Registration
You will register an additional image, Ij using 2D-3D correspondences.
Write-up:
(1) (3D-2D correspondences) Given 3D triangulated points, find 2D-3D matches, X ↔
u.
(2) (Perspective-n-Point algorithm) Write a code that computes 3D camera pose from
3D-2D correspondences:
[R, C] = LinearPnP(u, X)
X: n × 3 matrix containing n 3D reconstructed points
u: n × 2 matrix containing n 2D points in the additional image I3
R and C: rotation and translation for the additional image.
Hint: After the linear solve, rectify the rotation matrix such that det(R) = 1 and scale
C according to the rectification.
7
CSCI 5980: Assignment #5
Bundle Adjustment
(3) (RANSAC PnP) Write a RANSAC algorithm for the camera pose registration (PnP)
given n matches using the following pseudo code:
Algorithm 3 PnP RANSAC
1: nInliers ← 0
2: for i = 1 : M do
3: Choose 6 correspondences, Xr and ur, randomly from X and u.
4: [Rr, tr] = LinearPnP(ur, Xr)
5: Compute the number of inliers, nr, with respect to Rr, tr.
6: if nr > nInliers then

7: nInliers ← nr

8: R = Rr and t = tr

9: end if

10: end for

Visualize 3D registered pose as shown in Figure 4.

-40

-20

0

20

(a) Front view

0

10

20

30

40

50

60

0 -80 -60 -40 -20 0 20 40

(b) Top view

Figure 4: Additional image registration.

(4) (Reprojection) Visualize measurement and reprojection to verify the solution.

8

CSCI 5980: Assignment #5

Bundle Adjustment

7 Nonlinear Camera Refinement

Given the initial estimate Ri and ti

, you will refine the camera pose to minimize

geometric error (reprojection error) via iterative nonlinear least squares estimation,

∆p =

∂f(p)

∂p

T

∂f(p)

∂p

!−1

∂f(p)

∂p

T

(b − f(p)),

f(p) =

u1/w1

v1/w1

.

.

.

un/wn

vn/wn

,

u

v

w

= Ri

I3 −C

X

1

, b =

x1

y1

.

.

.

xn

yn

(2)

where p =

CT q

T

T

. C ∈ R3

is the camera optical center and q ∈ S

3

is the

quaternion representation of the camera rotation.

It is possible to minimize the overshooting by adding damping, λ as follows:

∆p =

∂f(p)

∂p

T

∂f(p)

∂p

+ λI

!−1

∂f(p)

∂p

T

(b − f(p)), (3)

where λ is the damping parameter. You can try λ ∈ [0, 10].

Note that the conversion between quaternion and rotation matrix is given as follows:

R =

1 − 2q

2

z − 2q

2

y −2qzqw + 2qyqx 2qyqw + 2qzqx

2qxqy + 2qwqz 1 − 2q

2

z − 2q

2

x

2qzqy − 2qxqw

2qxqz − 2qwqy 2qyqz + 2qwqx 1 − 2q

2

y − 2q

2

x

,

q =

qw

(R32 − R23)/(4qw)

(R13 − R31)/(4qw)

(R21 − R12)/(4qw)

, where qw =

√

1 + R11 + R22 + R33

2

, kqk = 1 (4)

9

CSCI 5980: Assignment #5

Bundle Adjustment

Write-up:

(1) Derive the quaternion Jacobian to rotation using Equation (4), i.e., ∂R

∂q

and write

the following code. Note: ignore the normalization kqk = 1.

dR dq = JacobianQ(q)

(2) Derive the rotation Jacobian to projection using Equation (2), i.e., ∂f(p)j

∂R where

f(p)j =

uj/wj vj/wj

T

and write the following code. Note: use vectorized form of

the rotation matrix.

df dR = JacobianR(R, C, X)

(3) Derive the expression of ∂f(p)j

∂q

using the chain rule.

(4) Derive the camera center Jacobian to projection using Equation (2), i.e., ∂f(p)j

∂C

and

write the following code.

df dC = JacobianC(R, C, X)

(5) Write a code to refine the camera pose by minimizing the reprojection error and

visualize reprojection error reduction as shown in Figure 5:

[R, C] = PnP Nonlinear(R C, u, X)

Algorithm 4 Nonlinear Camera Pose Refinement

1: p =

CT q

T

T

2: for j = 1 : nIters do

3: C = p1:3, R=Quaternion2Rotation(q), q = p4:7

4: Build camera pose Jacobian for all points, ∂f(p)j

∂p =

h

∂f(p)j

∂C

∂f(p)j

∂q

i

.

5: Compute f(p).

6: ∆p =

∂f(p)

∂p

T ∂f(p)

∂p + λI

−1

∂f(p)

∂p

T

(b − f(p)) using Equation (3).

7: p = p + ∆p

8: Normalize the quaternion scale, p4:7 = p4:7/kp4:7k.

9: end for

10

CSCI 5980: Assignment #5

Bundle Adjustment

Measurement

Linear estimate (reproj: 0.199104)

Nonlinear estimate (reproj: 0.119272)

(a) Reprojection error

Measurement

Linear estimate (reproj: 0.199104)

Nonlinear estimate (reproj: 0.119272)

(b) Close up

Figure 5: Nonlinear refinement reduces the reprojection error (0.19→0.11).

8 Bundle Adjustment

You will write a nonlinear refinement code that simultaneously optimizes camera poses

and 3D points using the sparse nature of the Jacobian matrix.

[P, X] = BundleAdjustient(P, X, R, Mx, My)

For example, consider 3 camera poses and 2 points. The Jacobian matrix can be written

as follows:

J =

∂f(p1,X1)

∂p1

02×7 02×7

∂f(p1,X1)

∂X1

02×3

02×7

∂f(p2,X1)

∂p2

02×7

∂f(p2,X1)

∂X1

02×3

02×7 02×7

∂f(p3,X1)

∂p3

∂f(p3,X1)

∂X1

02×3

∂f(p1,X2)

∂p1

02×7 02×7 02×3

∂f(p1,X2)

∂X2

02×7

∂f(p2,X2)

∂p2

02×7 02×3

∂f(p2,X2)

∂X2

02×7 02×7

∂f(p3,X2)

∂p3

02×3

∂f(p3,X2)

∂X2

=

Jp JX

(5)

where Jp and JX are the Jacobian for camera and point, respectively, and λ ∈ [0, 10].

The normal equation, J

TJ∆x = J

T

(b − f(x)) can be decomposed into:

A B

BT D

∆pb

∆Xb

=

ep

eX

, (6)

where

A = J

T

pJp + λI, B = J

T

pJX, D = J

T

XJX + λI

ep = J

T

p

(b − f(x)), eX = J

T

X(b − f(x))

where pb =

p

T

1

· · · p

T

I

and Xb =

XT

1

· · · XT

M

where I and M are the number

of images and points, respectively.

11

CSCI 5980: Assignment #5

Bundle Adjustment

The decomposed normal equation in Equation (6) allows us to efficiently compute the

inverse of J

TJ using Schur complement of D:

∆pb = (A − BD−1B

T

)

−1

(ep − BD−1

eX),

∆Xb = D−1

(eX − B

T∆pb)

where D is a block diagonal matrix whose inverse can be efficiently computed by inverting small block matrix:

D =

d1

.

.

.

dM

, D−1 =

d

−1

1

.

.

.

d

−1

M

(7)

The bundle adjustment algorithm is summarized in Algorithm 5. Note that not all

points are visible from cameras. You need to reason about the visibility, i.e., if the

point is not visible from the camera, the corresponding Jacobian and measurement

from J and b will be omitted, respectively.

12

CSCI 5980: Assignment #5

Bundle Adjustment

Algorithm 5 Bundle Adjustment

1: pb =

p

T

1

· · · p

T

I

T

and Xb =

XT

1

· · · XT

M

2: for iter = 1 : nIters do

3: Empty Jp, JX, b, f, Dinv.

4: for i = 1 : M do

5: d = 03×3

6: for j = 1 : I do

7: if the i

th point is visible from the j

th image then

8: J1 = 02×7I and J2 = 02×3M

9: J1(:, 7(j − 1) + 1 : 7j) = ∂f(pj ,Xi)

∂pj

10: J2(:, 3(i − 1) + 1 : 3i) = ∂f(pj ,Xi)

∂Xi

11: Jp =

J

T

p J

T

1

T

and JX =

J

T

X J

T

2

T

12: d = d +

∂f(pj ,Xi)

∂Xi

T ∂f(pj ,Xi)

∂Xi

13: b =

b

T u

T

ij

14: f =

f

T x

T

ij

where µ

xij

1

= Rj

I −Cj

15: end if

16: end for

17: d = d + λI

18: Dinv = blkdiag(Dinv, d

−1

)

19: end for

20: ep = J

T

p

(b − f)

21: eX = J

T

X(b − f)

22: A = J

T

pJp + λI, B = J

T

pJX, D−1 = Dinv

23: ∆pb = (A − BD−1BT

)

−1

(ep − BD−1eX)

24: Normalize quaternions.

25: ∆Xb = D−1

(eX − BT∆pb)

26: end for

Write-up: You will first start with two images and 10 3D points to test your bundle

adjustment program.

(1) Derive Jp and JX.

(2) Run Algorithm 5 and visualize the reprojection error similar to Figure 5.

13

CSCI 5980: Assignment #5

Bundle Adjustment

9 Putting All Things Together

Write-up: You will run with all images and 3D points based on Algorithm 1.

(1) Visualize 3D camera pose and points as shown in Figure 6.

(2) Visualize reprojection for all images.

Top view Oblique view

Figure 6: You will reconstruct all images and 3D points using structure from motion.

14