Information hiding

This text is about the principle known as information hiding, data hiding or (not entirely correctly) encapsulation. In our examples, we use the C programming language, because it fits the principle best of all. This is a bit ironic, because C of all languages widely used today is, assembly language not counted, said to be the least object oriented language.

Introduction

A simple example of information hiding would be a string object:

struct string
{
  char *rep;
  size_t len;
  size_t capacity.
};

This struct knows its length and has a data pointer pointing at the in-memory string representation. It also knows its maximum capacity, so it can allocate more memory on demand. In order to avoid corruption of the internal state (in this case, the len member), we write accessor functions, for example:

size_t strLength (struct string const *str);
char const *strToCStr (struct string const *str);
char strCharAt (struct string const *str, size_t pos);
void strSetCharAt (struct string *str, size_t pos, char c);
void strAddChars (struct string *str, char const *chars);

These functions operate on the data structure declared above. Functions that take a pointer-to-const promise not to modify the structure.

Information hiding means to hide as much knowledge as possible. As part of this, we could define new type aliases to hide the fact that a string is represented by a pointer to a string structure:

typedef struct string *String;
typedef struct string const *ConstString; // Immutable string

By doing this, we allow the representation to change at any time. For example, we could choose to represent String objects by a simple data pointer not knowing its size or a data pointer prepended by its size.

See <a href="c/strings">C string representations</a> if you are interested in other string representations.

Now, our accessor functions look like this:

size_t strLength (ConstString str);
char const *strToCStr (ConstString str);
char strCharAt (ConstString str, size_t pos);
void strSetCharAt (String str, size_t pos, char c);
void strAddChars (String str, char const *chars);

We did not only abstract away some knowledge, we also reduced our required typing.

Violations of the principle

There are various more or less obvious ways to violate the principle of information hiding. An obvious example would be directly accessing the structure's members:

// Wrong:
void
OnAdded (String str)
{
  strncat (str->rep, " has been added", str->capacity - str->len);
  str->len = strlen (str->rep);
}
// Right:
void
OnAdded (String str)
{
  strAddChars (str, " has been added");
}

The function strAddChars hides the knowledge about String's internal structure and abstracts away the logic needed to add characters to the string. The resulting code is more descriptive and less error prone. Instead of kludging our own string concatenation operation every time, we write a single testable function that does it for us. Our own version in "Wrong" truncates the string, if it does not have enough room for the added text.

A less obvious violation of information hiding would be pointer operations on String objects. Consider our String to be defined as char*. This type allows several operations that are built into C:

typedef char *String;
typedef char const *ConstString;

void
someFunc (String str)
{
  printf ("The second character is `%d'\n", str[1]); // Index
  str += 20; // Addition
  printf ("The twenty-second character is `%d'\n", str[1]); // Index
  printf ("The twenty-first character is `%d'\n", *str); // Dereference
  String newStr = str - 10; // Subtracting integer
  printf ("The difference between newStr and str is %d\n", newStr - str); // Pointer difference
}

All these operations are defined on char*, and therefore also on String. This does not have to be the case, though. If String was defined as the struct string* earlier, all of these operations would cause a compilation error. Using these operations requires knowledge about the internal representation of String and is therefore in violation of the information hiding principle.

Enforcing encapsulation in C

C provides a very good way to enforce information hiding. We call this opaque pointers. Opaque in this sense means, we cannot touch or even see the internal structure. C allows us to forward declare structures and pass around pointers to them:

struct string;
typedef struct string *String;
typedef struct string const *ConstString;
// All of the above accessor functions still work
// The OnAdded function marked with "Right" also still works, but the one
//   marked "Wrong" will cause a compilation error

String is now what we call a pointer to an incomplete type. Incomplete means that it does exist as a type, but its representation is not known to the compiler. Using opaque pointers is a good way of preventing direct member access in client code.

Advantages

There are definite advantages of information hiding and those are good reasons to apply it to your code.

Disadvantages

This all sounds great and we may ask ourselves, why doesn't everybody encapsulate everything. There are also disadvantages to be considered when designing software.

The compromise

In C, there are a few ways to speed up and shrink the code without violating information hiding. One of these is the use of macros. Instead of defining a function:

size_t strLength (ConstString str);

we define a macro doing the same:

#define strLength(s)  (s)->len

This has disadvantages, as well. Now, you cannot use opaque pointers anymore, no longer avoiding ABI breakages and speeding up incremental builds. It is no longer type-safe, as you could pass any structure with a len member to this macro. It still avoids API breakages, though. This way, you can have some form of encapsulation even in speed critical applications.

Information hiding in other languages

I chose C, as I said, because it is most suitable as example for encapsulation. This does not mean that other languages do not have equally good or even better ways to encapsulate data.