Menpo’s Data Types¶
Menpo is a high level software package. It is not a replacement for scikit-image, scikit-learn, or opencv - it ties all these types of packages together in to a unified framework for building and fitting deformable models. As a result, most of our algorithms take as input a higher level representation of data than simple numpy arrays.
Why have data types - what’s wrong with numpy arrays?¶
Menpo’s data types are thin wrappers around numpy arrays. They give semantic
meaning to the underlying array through providing clearly named and consistent
properties. As an example let’s take a look at PointCloud
, Menpo’s
workhorse for spatial data. Construction requires a numpy array:
x = np.random.rand(3, 2)
pc = PointCloud(x)
It’s natural to ask the question:
Is this a collection of three 2D points, or two 3D points?
In Menpo, you never do this - just look at the properties on the pointcloud:
pc.n_points # 3
pc.n_dims # 2
If we take a look at the properties we can see they are trivial:
@property
def n_points(self):
return self.points.shape[0]
@property
def n_dims(self):
return self.points.shape[1]
Using these properties makes code much more readable in algorithms accepting Menpo’s types. Let’s imagine a routine that does some operation on an image and a related point cloud. If it accepted numpy arrays, we might see something like this on the top line:
def foo_arrays(x, img):
# preallocate the result
y = np.zeros(x.shape[1],
x.shape[2],
img.shape[-1])
...
On first glance it is not at all apparent what y
‘s shape is semantically.
Now let’s take a look at the equivalent code using Menpo’s types:
def foo_menpo(pc, img):
# preallocate the result
y = np.zeros(pc.n_dims,
pc.n_points,
img.n_channels)
...
This time it’s immediately apparent what y
‘s shape is. Although this is a
somewhat contrived example, you will find this pattern applied consistently
across Menpo, and it aids greatly in keeping the code readable.
Key points¶
1. Containers store the underlying numpy array in an easy to access
attribute. For the PointCloud
family see the .points
attribute. On
Image
and subclasses, the actual data array is stored at .pixels
.
2. Importing assets though menpo.io will result in our data
containers, not numpy arrays. This means in a lot of situations you never
need to remember the Menpo conventions for ordering of array data - just ask
for an image and you will get an Image
object.
3. All containers copy data by default. Look for the copy=False
keyword
argument if you want to avoid copying a large numpy array for performance.
4. Containers perform sanity checks. This helps catch obvious bugs like
misshaping an array. You can sometimes suppress them for extra performance with
the skip_checks=True
keyword argument.