Properties in C++
In object oriented programming languages, encapsulation is present in different ways. The Java and C++ approach is to write accessor and mutator (getter and setter) methods for every variable that needs to be accessed or mutated outside the set of class methods.
In C++, one can return a
const
reference to a data member from an accessor
to prevent benign client code from corrupting the data. Java, not having
const
, can not prevent this at compile time, but can enforce access control
at runtime through further mutators. For any given data member
x
, one can
define an accessor
getX
and a mutator
setX
, to control the value to
which
x
may be set. This method breaks transparency without adding value to
object oriented design. C++ programmers tend not to use the prefixes
get
and
set
, but rather name their data member something different such as
x_
or
m_x
. This keeps the name
x
free for the accessor (
T x ()
)
and the mutator (
void x (T)
). Still, this does not help transparency very
much. Consider the following code:
// Public data members struct point { int x; int y; }; point p = { 1, 2 }; p.x += 20; // Private data members with accessor/mutator functions struct point { int x () const { return x_; } int y () const { return x_; } void x (int v) { x_ = v; } void y (int v) { y_ = v; } // Needs a constructor, due to the private data members point (int xv, int yv) : x_ (xv), y_ (yv) { } // Private data members make the class non-standard layout, // disabling many optimisations and making the class a whole // lot less useful (for example, offsetof won't work, the // objects cannot be passed through `...', the objects cannot // be cast to a reference to their first data member's type, etc. private: int x_; int y_; }; // aggregate initialisers won't work anymore point p (1, 2); p.x (p.x () + 20); // more code to be written for the same effect
Properties
Many modern programming languages provide a syntactic construct known as properties. Different languages have different ways of expressing the same idea: publicly accessible class data members with code attached that controls and validates input from client code. One may also attach code that calculates the actual value returned to the user. For instance, you might want to store the value in a database and transparently handle database queries using properties.
Previous attempts
Many have attempted to implement properties in C++. The common approach is to
use a nested class. Emad Barsoum has
implemented
properties using a class template in which he stores the
this
pointer of the
containing object and member function pointers to the
get
and
set
functions. This approach and implementation has several serious flaws:
- Member function pointers
These are in fact small structs containing things like vtable offset and the actual function's address. Dereferencing member function pointers therefore is a very inefficient process.
- operator ValueType
This is an implementation issue. Consider a property of type
bitmap
, which consists of a 64KB matrix.operator ValueType
copies the entirebitmap
and its data every time it is used. - operator =
The assignment operator is not enough. To have a fully functional property, all operators have to be supported. This is a minor flaw, as these can be added later.
- Storage of pointers
The property class stores two member function pointers that may be NULL and a
this
pointer that may also be NULL. On a 32 bit platform, this means 20 bytes of storage per property: 8 bytes per member function pointer and 4 bytes for thethis
pointer. On 64 bit platforms using the LP64, ILP64 or LLP64 data models, the size of that property is 40 bytes. - High overhead
Due to the use of member function pointers and
this
pointers, it is impossible for a compiler to inline the calls. The property's pointers may even change after they were initialised, so even the best optimising compiler in the theoretic world can not inline a thing. This means high overhead when dereferencing the pointers and even more overhead when calling the functions. - Need to initialise
The property requires the containing class' constructor to initialise it. If a class has many properties, the constructor becomes one big mess. The implementation does not even use constructor initialiser lists, so in the worst case, initialising the properties will be two times six full machine word copies (in the best case, it's six). If you forget to initialise a property, your program will die with an assertion error in a getter or setter.
- Read/write semantics
Whether or not client code is allowed to call an accessor or mutator is checked at runtime. An assertion failure is the result of invalid access.
Another approach is one of those which require compiler support and are therefore not acceptable as generic solution: The MSVC hack or Another MSVC hack.
The new approach
I present a zero-overhead approach which even has the very convenient benefit of being a trivial standard layout class, also known as Plain Old Data (or POD). These properties do not, by themselves, use any memory. Value properties use the memory required to store their value, non-value properties use the mandatory byte of memory that every empty struct is required to have. (This is, by the way, required in order to be able to take its address).
The prop class template
The
prop
class template is parametrised by the container type, the value
type, the accessor and mutator functions and an offset function. As you can
see, I did not use member function pointers anywhere. Instead, the object is
passed as first argument to a free (namespace scope) or static member
function. The reason for this is, plainly put, speed. Current C++ compiler do
a pretty bad job at optimising dereferences of member function pointers, but a
terrific job when it comes to static functions.
template< typename Class, typename T, T const & (get) (Class const &), void (set) (Class &, T const &), size_t (offset) () > struct prop {
You may be wondering what the
offset
function is for. As I said earlier,
this property has zero overhead, so it can't store the
this
pointer of its
container. That is where the
offset
function comes in. We use it to
calculate the location of the container's
this
in memory.
It is very important that this is a function, as we will see later.
The following two functions calculate the
this
pointer and return a
reference to the parent object. Due to
offset
being a function and its
location being known at compile time, it can easily be inlined.
Class &self () { return *reinterpret_cast<Class *> (reinterpret_cast<char *> (this) - offset ()); } Class const &self () const { return *reinterpret_cast<Class const *> (reinterpret_cast<char const *> (this) - offset ()); }
We need all assignment operators (=, +=, -=, *=, /=, %=, ^=, |=, &=,
<<= and >>=). We can define them all, because
prop
is a class
template and ISO/IEC 14882:2003(E), paragraph 14.7.1, clause 1 states that
"not the definitions or default arguments, of the class member functions,
[...] [are implicitly instantiated when a class template is implicitly
instantiated]". In other words, class templates are instantiated lazily,
on demand.
prop &operator = (T const &rhs) { set (self (), rhs); return *this; } prop &operator += (T const &rhs) { set (self (), get (self ()) + rhs); return *this; } // Analogous for the other operators
These operators define all we need for the mutator, but not for the accessor.
Instead of defining all operators (unary +, unary -, binary +, binary *, etc.),
we can define
operator T const &
and the C++ compiler will take care of
required conversions:
operator T const & () const { return get (self ()); }
Now, all we need is a way to access members of the property. Then we are done:
T *operator -> () { return &const_cast<T &> (get (self ())); } T const *operator -> () const { return &get (self ()); } };
The
const_cast
is strictly considered undefined behaviour, but if the
property itself is non-const, we know the value is non-const, as well. This is
a dire hack, but I don't see a better way to solve it. One way would be to
pass a non-const accessor function as additional template argument, but that
would probably cause more code bloat than it is worth.
Value properties
The above class template provides support for properties that do not store
their own value. Often, we want the value to be stored, though, and we do not
want the overhead of storing it in the containing class (remember: empty
classes still need one byte of storage). Therefore, we introduce another class
template, called
value_prop
, with different accessor and mutator function
types:
template< typename Class, typename T, T const & (get) (Class const &, T const &), void (set) (Class &, T &, T const &), size_t (offset) () > struct value_prop {
Now, in addition to the container's
this
pointer, the accessor
get
and
mutator
set
also receive a reference to the value stored in the
instantiated property class template.
get
receives a const reference.
Now, the operator definitions look slightly different:
value_prop &operator = (T const &rhs) { set (self (), value, rhs); return *this; } value_prop &operator += (T const &rhs) { set (self (), value, get (self (), value) + rhs); return *this; }
and the class has a public member variable called
value
. I decided to make
it public so the instantiated class template would remain POD. This means one
can, by directly accessing
value
, corrupt the class state, but benign
client code will not do that.
Using properties
To use a property in a class, we need four things: the property object, the
accessor, the mutator and very importantly, the
offset
function.
struct point { // The X-coordinate value_prop<point, int, get_x, set_x, offset_x> x; static int const &get_x (point const &self, int const &property) { return property; } static void set_x (point &self, int &property, int const &value) { property = value; } static size_t offset_x () { return offsettof (point, x); } // The Y-coordinate int y_; prop<point, int, get_y, set_y, offset_y> y; static int const &get_y (point const &self) { return self.y_; } static void set_y (point &self, int const &value) { self.y_ = value; } static size_t offset_y () { return offsettof (point, y); } };
The offset returned by the
offset_*
functions is the number of bytes
between the location of the property object and the
this
pointer of the
enclosing class. We use that in the
prop
class template to calculate the
memory location of
*this
. The reason this is a function is simple: We need
to pass the offset to the property class template somehow. The first
possibility that comes to mind is storing it in a static class member or a
global variable and passing a pointer to that. This makes the declaration of a
property more messy and forgetting to actually define the global variable
gives annoying undefined reference errors. We cannot make the value an
integral constant, because the class we are invoking
offsetof
on is
incomplete when we want to do it inside the class. If it has to be a variable,
it has to have external linkage so the template accepts it. A function is the
best solution I could come up with: it can be inlined completely, it does not
have extra declaration or definition overhead and it can be passed as template
argument.
The practiced reader will notice the use of
offsetof
and think "Hey, that
will only work for POD
struct
s". This is true, but as I mentioned earlier,
both
prop
and
value_prop
are POD
struct
s. This is 100% standard
compliant.
ISO/IEC 14882:2003(E): 18.1 Types, clause 5:
The macro offsetof accepts a restricted set of type arguments in this International Standard. type shall be a POD structure or a POD union (clause 9).
ISO/IEC 14882:2003(E): 9.4 Classes, clause 4:
A POD-struct is an aggregate class that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor.
The above does not mean POD structs may not have member functions.
Using macros
The above class definition is repetetive and error prone, so we use the following macros:
#define def_prop(T, Class, name, get, set) \ static size_t offset_ ## name () { return offsetof (Class, name); } \ static T const &get_ ## name (Class const &self) get \ static void set_ ## name (Class &self, T const &value) set \ ::prop<Class, T, get_ ## name, set_ ## name, offset_ ## name> name #define def_value_prop(T, Class, name, get, set) \ static size_t offset_ ## name () { return offsetof (Class, name); } \ static T const &get_ ## name (Class const &self, T const &property) get \ static void set_ ## name (Class &self, T &property, T const &value) set \ ::value_prop<Class, T, get_ ## name, set_ ## name, offset_ ## name> name
Now, we can rewrite the class definition of
point
as follows:
struct point { def_value_prop (int, point, x, { return property; }, { property = value; } ); int y_; def_prop (int, point, y, { return self.y_; }, { self.y_ = value; } ); };
We use value properties as well as normal properties here. Extra caution has to be taken with aggregate initialisers when doing this. It will generally not be an issue, since classes with non-value properties are unlikely to get aggregate initialisation.
The usage of
def_prop
and
def_value_prop
are as follows:
def_prop (type, type of parent, name, accessor code, mutator code); def_value_prop (type, type of parent, name, accessor code, mutator code);
The problem with this macro is that it does not support code like this:
{ int a, b, c; }
, because the commas would be interpreted as macro argument
delimiters. One solution would be to use GCC's variadic macros to implement an
UNPAREN
macro:
#define UNPAREN_(x...) x #define UNPAREN(x) UNPAREN_ x
The gained advantage is that commas can be used outside parentheses inside accessor and mutator code, the disadvantage is that all code needs to be enclosed in parentheses, making the other code plain ugly (uglier than it already is).
A larger example
struct point { def_value_prop (int, point, x, { puts (__PRETTY_FUNCTION__); return property; }, { puts (__PRETTY_FUNCTION__); property = value; } ); int y_; def_prop (int, point, y, { puts (__PRETTY_FUNCTION__); return self.y_; }, { puts (__PRETTY_FUNCTION__); self.y_ = value; } ); }; struct coordinate { def_value_prop (point, coordinate, x, { puts (__PRETTY_FUNCTION__); return property; }, { puts (__PRETTY_FUNCTION__); property = value; } ); point y_; def_prop (point, coordinate, y, { puts (__PRETTY_FUNCTION__); return self.y_; }, { puts (__PRETTY_FUNCTION__); self.y_ = value; } ); }; int main () { // Aggregate initialisers work as expected, even with classes // containing (classes containing properties) as properties // (parentheses added for disambiguation). coordinate p = { { 2, 3 }, { 4, 5 } }; printf ("p.x->x == %d\n", (int) p.x->x); // Accessor and mutator called printf ("p.x->y == %d\n", (int) p.x->y); printf ("p.y->x == %d\n", (int) p.y->x); printf ("p.y->y == %d\n", (int) p.y->y); }
The output would look something like this:
static const point& coordinate::get_x(const coordinate&, const point&) static const int& point::get_x(const point&, const int&) p.x->x = 2 static const point& coordinate::get_x(const coordinate&, const point&) static const int& point::get_y(const point&) p.x->y = 3 static const point& coordinate::get_y(const coordinate&) static const int& point::get_x(const point&, const int&) p.y->x = 4 static const point& coordinate::get_y(const coordinate&) static const int& point::get_y(const point&) p.y->y = 5
Note that you need GCC for
__PRETTY_FUNCTION__
to work.
Performance
As I mentioned in the introduction, I chose to use static member functions for efficiency reasons. Just to give you an idea of how efficient this zero overhead property template is, here is the assembler code for the following simple example:
C++ code:
int main () { coordinate p = { { 2, 3 }, { 4, 5 } }; volatile int x = p.x->x; }
And the assembler output:
main: xorl %eax, %eax ; set return value to 0 movl $2, -4(%rsp) ; set the variable 'x' to 2 ret ; return
You can see, how the compiler completely inlined the calls to the accessors and then constant folded the expression, simply copying an immediate integer value to a memory location on the stack.
Non-POD classes
The more interesting classes are all non-POD ones. I have tested the property
class template with the strangest configurations I could think of. The only
configuration where
offsetof
fails is this:
struct A { virtual ~A () { } }; struct B1 : virtual A { int i; }; struct B2 : virtual A { }; struct D : virtual B1, virtual B2 { }; size_t offset = offsetof (D, i); // This fails size_t offset = offsetof (B1, i); // This succeeds and is correct
so in fact, the property class never fails. However, I have only tested it on
the GNU C++ compiler. Other compilers might mess up, but one thing is for
certain:
prop
being POD allows for 100% standard compliant properties.
(Note that
cfront
, the first C++ compiler, would have done it right, so I
don't really see any reason why any compiler could mess it up.)