Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks

Primitive-based representations seek to infer semantically consistent part arrangements across different object instances. Existing primitive-based methods rely on simple shapes for decomposing complex objects into parts such as cuboids, superquadrics, spheres or convexes. Due to their simple parametrization, these primitives have limited expressivity and cannot capture arbitrarily complex geometries. Therefore, existing part-based methods require a large number of primitives for extracting geometrically accurate reconstructions. However, using more primitives comes at the expense of less interpretable reconstructions. Namely, a primitive is not an identifiable part anymore.

We introduce a novel 3D primitive representation that is defined as a deformation between shapes and is parametrized as a learned homeomorphic mapping implemented with an Invertible Neural Network (INN). We argue that a primitive should be a non trivial genus-zero shape with well defined implicit and explicit representations. Using an INN allows us to efficiently compute the implicit and explicit representation of the predicted shape and impose various constraints on the predicted parts. In contrast to prior work, that directly predict the primitive parameters (i.e. centroids and sizes for cuboids and superquadrics and hyperplanes for convexes), we employ the INN to fully define each primitive. This allows us to have primitives that capture arbitrarily complex geometries, hence the ability of our model to parse objects into expressive shape abstractions that are more geometrically accurate using an order of magnitude fewer primitives compared to approaches that rely on simple convex shape primitives.

Given an input image and a watertight mesh of the target object we seek to learn a representation with M primitives that best describes the target object. We define our primitives via a deformation between shapes that is parametrized as a learned homeomorphism implemented with an Invertible Neural Network (INN). For each primitive, we seek to learn a homeomorphism between the 3D space of a simple genus-zero shape and the 3D space of the target object, such that the deformed shape matches a part of the target object. Due to its simple implicit surface definition and tesselation, we employ a sphere as our genus-zero shape. Note that using an INN allows us to efficiently compute the implicit and explicit representation of the predicted shape and impose various constraints on the predicted parts.

In the following interactive visualization, the naming of the parts has been done manually. However, the model had no part supervision during training. The semantic parts have emerged naturally from reconstructing the geometry.

Humans

Show all parts heads bodies left-arms right-arms left-legs right-legs

Planes

Show all parts noses bodies left-wings right-wings tails

We compare the representation power of Neural Parts to other primitive-based methods by evaluating the reconstruction quality with varying number of primitives on three datasets. We observe that our model is more geometrically accurate, more semantically consistent and yields more meaningful parts (i.e. primitives are identifiable parts such as thumbs, legs, wings, tires, etc.) compared to simpler primitives.

We observe that Neural Parts consistently use the same primitive for representing the same object part regardless of the breadth of the part's motion. Notably, this temporal consistency is an emergent property of our method and not one that is enforced with any kind of loss.

This research was supported by the Max Planck ETH Center for Learning Systems.