We present a method to derive a time-evolving triangle mesh representation from a sequence of binary volumetric data representing an arbitrary motion. Multi-view reconstruction studios use a multiple camera set to turn an actor's performance into a time series of visual hulls. However, the reconstructed sequence lacks temporal coherence as each frame is generated independently, preventing easy post-production editing with off-the-shelf modeling tools. We propose an automated tracking approach to convert the raw input sequence into a single, animated mesh. An initial mesh is globally deformed via as-rigid-as-possible, detail-preserving transformations guided by a motion flow estimated from consecutive frames. Local optimization is added to better match the mesh surface to the current visual hull, leading to a robust 4D mesh reconstruction.