4D multiview reconstruction of moving actors has many applications in the entertainment industry and although studios providing such services become more accessible, efforts have to be done in order to improve the underlying technology to produce high-quality 4D contents. In this paper, we enable surface matching for an animated mesh sequence in order to introduce coherence in the data. The context is provided by an indoor multi-camera system which performs synchronized video captures from multiple viewpoints in a chroma key studio. Our input is given by a volumetric silhouette-based reconstruction algorithm that generates a visual hull at each frame of the video sequence. These 3D volumetric models differ from one frame to another, in terms of structure and topology, which makes them very difficult to use in post-production and 3D animation software solutions. Our goal is to transform this input sequence of independent 3D volumes into a single dynamic volumetric structure, directly usable in post-production. These volumes are then transformed into an animated mesh. Our approach is based on a motion estimation procedure. An unsigned distance function on the volumes is used as the main shape descriptor and a 3D surface matching algorithm minimizes the interference between unrelated surface regions. Experimental results, tested on our multiview datasets, show that our method outperforms approaches based on optical flow when considering robustness over several frames.