time solution on commodity hardware. As a result, Dynam-
icFusion is the first system capable of real-time dense recon-
struction in dynamic scenes using a single depth camera.
The remainder of this paper is structured as follows. Af-
ter discussing related work, we present an overview of Dy-
namicFusion in Section 2 and provide technical details in
Section 3. We provide experimental results in Section 4 and
conclude in Section 5.
1. Related Work
While no prior work achieves real-time, template-free,
non-rigid reconstruction, there are two categories of closely
related work: 1) real-time non-rigid tracking algorithms,
and 2) offline dynamic reconstruction techniques.
Real-time non-rigid template tracking. The vast ma-
jority of non-rigid tracking research focuses on human body
parts, for which specialised shape and motion templates are
learnt or manually designed. The best of these demonstrate
high accuracy, real-time performance capture for tracking
faces [16, 3], hands [21, 20], complete bodies [27], or gen-
eral articulated objects [23, 33].
Other techniques directly track and deform more gen-
eral mesh models. [12] demonstrated the ability to track
a statically acquired low resolution shape template and up-
grade its appearance with high frequency geometric details
not present in the original model. Recently, [37] demon-
strated an impressive real-time version of a similar tech-
nique, using GPU accelerated optimisations. In that sys-
tem, a dense surface model of the subject is captured while
remaining static, yielding a template for use in their real-
time tracking pipeline. This separation into template gen-
eration and tracking limits the system to objects and scenes
that are completely static during the geometric reconstruc-
tion phase, precluding reconstruction of things that won’t
reliably hold still (e.g., children or pets).
Offline simultaneous tracking and reconstruction of
dynamic scenes. There is a growing literature on offline
non-rigid tracking and reconstruction techniques. Several
researchers have extended ICP to enable small non-rigid
deformations, e.g., [1, 2]. Practical advancements to pair-
wise 3D shape and scan alignment over larger deformations
make use of reduced deformable model parametrisations
[14, 4]. In particular, embedded deformation graphs [25]
use a sparsely sampled set of transformation basis func-
tions that can be efficiently and densely interpolated over
space. Quasi-rigid reconstruction has also been demon-
strated [15, 35] and hybrid systems, making use of a known
kinematic structure (e.g., a human body), are able to per-
form non-rigid shape denoising [36]. Other work combines
non-rigid mesh template tracking and temporal denoising
and completion [13], but does not obtain a single consistent
representation of the scene.
More closely related to our work are template-free tech-
niques. An intriguing approach to template-free non-rigid
alignment, introduced in [17] and [26], treats each non-
rigid scan as a view from a 4D geometric observation and
performs 4D shape reconstruction. [30, 29] reconstruct
a fixed topology geometry by performing pair-wise scan
alignment. [24] use a space-time solid incompressible flow
prior that results in water tight reconstructions and is ef-
fective against noisy input point-cloud data. [28] intro-
duce animation cartography that also estimates shape and
a per frame deformation by developing a dense correspon-
dence matching scheme that is seeded with sparse landmark
matches. Recent work using multiple fixed kinect cameras
[8] [7] demonstrates larger scale non-rigid reconstruction by
densely tracking and fusing all depth map data into a novel
directional distance function representation.
All of these techniques require three to four orders of
magnitude more time than is available within a real-time
setting.
2. DynamicFusion Overview
DynamicFusion decomposes a non-rigidly deforming
scene into a latent geometric surface, reconstructed into a
rigid canonical space S ⊆ R
3
; and a per frame volumetric
warp field that transforms that surface into the live frame.
There are three core algorithmic components to the system
that are performed in sequence on arrival of each new depth
frame:
1. Estimation of the volumetric model-to-frame warp
field parameters (Section 3.3)
2. Fusion of the live frame depth map into the canonical
space via the estimated warp field (Section 3.2)
3. Adaptation of the warp-field structure to capture newly
added geometry (Section 3.4)
Figure 2 provides an overview.
3. Technical Details
We will now describe the components of DynamicFusion
in detail. First, we describe our dense volumetric warp-field
parametrisation. This allows us to model per-frame defor-
mations in the scene. The warp-field is the key extension
over static state space representations used in traditional re-
construction and SLAM systems, and its estimation is the
enabler of both non-rigid tracking and scene reconstruction.
3.1. Dense Non-rigid Warp Field
We represent dynamic scene motion through a volumet-
ric warp-field, providing a per point 6D transformation
W : S 7→ SE(3). Whereas a dense 3D translation field
would be sufficient to describe time varying geometry, we
have found that representing the real-world transformation
相关文档
评论