Περίληψη: | Abstracting complex 3d shapes with simple geometric primitives is an idea that has existed for a long time. Applications such as modeling, design, recognition, robotic grasping, and computer graphics can make the most of parsimonious, low-dimensional, interpretable shape representations. However, multiple failures due to lack of computational power and data caused this idea to be abandoned, until advancements in hardware technology and computer vision allowed it to resurface.
While multiple works are dedicated to this very problem, most of them require supervision, work with low descriptive capacity geometric primitives or operate on impractical data, such as depth images or multi-views, that are unable to properly and directly capture depth and shape details. In this thesis, we seek to tackle this problem using an unsupervised, learning-based approach to this problem, using superquadric surfaces as the defacto geometric primitive. We utilize point clouds, a very popular data format that can be easily captured by sensory devices and inherently incorporates depth information. In order to take advantage of the success of convolutional neural networks, we use a volumetric representation of the input point clouds. Contrary to most works, we introduce a model based on sparse convolution, eliminating the memory disadvantages of volumetric approaches and allowing a significantly deeper architecture.
We first diagnose possible problems with the model and attempt to remedy them by employing popular, architectural, and optimization techniques. We then proceed to evaluate our models on 3 categories of the ShapeNet dataset and use common metrics to assess their quality. Finally, we make empirical observations about the model’s behaviour and test these observations by experimenting with missing and noisy data to prove the model’s robustness under harsh conditions.
|