2013-01-18

When coming from another language, strings in Objective-C are surprising on one aspect : objects of the string class, NSString are immutable : their value cannot be changed.

Objective-C also has Constant Strings. As C does with string literals, Objective-C embeds, in the binary, objects of the NSString class. Of course, Constant Strings are Immutable.

Let’s look at some of the surprising details of these objects.

In all this post, whenever I display more info about an expression, I use a little macro thap prints :

  • the class of the result object,
  • its adress
  • and its retain count. 1

This macro, as well as test code for this post, is available as a gist.

A closer look at NSString

So what does a Constant NSString look like ?

NSString * a = @"a";
a :
    __NSCFConstantString 0x10c2fe0d8      -1

Constant strings are of the special subclass __NSCFConstantString and have a retain count of “-1” which means “infinite retain count”. These objects are actually part of the executable code : they can never be deallocated.

The compiler is your friend, and gives you the same object when you specify the same string in code :

NSString * a_too = @"a";
a_too :
    __NSCFConstantString 0x10c2fe0d8      -1

In C parlance, the two pointers are equal. In other words :

assert(@"a"==@"a"); // always.

It’s not just about saving a few bytes of memory : when comparing two strings, the first step is to compare the pointers; thanks to this behaviour, it actually happens a lot.

Similarly, -copying a Constant string returns the original object :

[@"a" copy] :
    __NSCFConstantString 0x10c2fe0d8      -1 //same address

Let's see what happens when we make a mutable copy :
[@"a" mutableCopy] :
    __NSCFString         0x7fdf1ac10630   1

A real String object is created, with its proper address on the heap, and a nice, brand-new-object retain count of 1.

Now what if we copy this object :

[[@"a" mutableCopy] copy] :
    __NSCFConstantString 0x7fff768183d0   -1

NSString just returned a Constant String out of nowhere !

Is there something special with @"a"? What if we build the "a" string in a more subtle manner :

[NSString stringWithFormat:@"%s","a"] :
    __NSCFConstantString 0x7fff768183d0   -1
[[[@"path/a" lastPathComponent] mutableCopy] copy] :
    __NSCFConstantString 0x7fff768183d0   -1

We still get the Constant String object !

Let’s try some other :

[[@"b" mutableCopy] copy] :
    __NSCFConstantString 0x7fff7682b9f0   -1
[[@"$" mutableCopy] copy] :
    __NSCFConstantString 0x7fff7682b770   -1
[[@"/" mutableCopy] copy] :
    __NSCFConstantString 0x7fff76822310   -1
[[@"€" mutableCopy] copy] :
    __NSCFString         0x1001076b0      1
[[@"§" mutableCopy] copy] :
    __NSCFString         0x100106e30      1
[[@"ab" mutableCopy] copy] :
    __NSCFString         0x100103230      1

Well, apparently, single-ASCII-character NSStrings are very special :

When you build a “single-ASCII-character” NSString in your code, what you get is the constant from the framework.

Let’s dig a little dipper.

Yummy, memory

Here’s what the memory around 0x7fff768183d0 looks like :

0x7fff768183d0:383E8176FF7F0000 C807000000000000 E4F7688CFF7F0000 0100000000000000 
0x7fff768183f0:383E8176FF7F0000 C807000000000000 E6F7688CFF7F0000 0200000000000000 
0x7fff76818410:383E8176FF7F0000 C807000000000000 E9F7688CFF7F0000 0300000000000000 
0x7fff76818430:383E8176FF7F0000 C807000000000000 EDF7688CFF7F0000 1700000000000000 

What do we have here ? Fist thing to notice, it’s the same structure every 32 bytes. What we see is a bunch of Objective-C objects, each 32 byte in size. The first group of 8 bytes is a pointer (always the same), as well as the third group.

In Objective-C, the first ivar in an object is the isa pointer, it points to the Class of the object. 2

(lldb) po 0x7FFF76813E38
$0 = 140735181569592 __NSCFConstantString
(lldb) po [__NSCFConstantString class]
$5 = 0x00007fff76813e38 __NSCFConstantString

As expected, this is a pointer to our __NSCFConstantString class.

What about the other pointer in our object ?

(lldb) p (char*) 0x7FFF8C68F7E4
(char *) $9 = 0x00007fff8c68f7e4 "a"

Well, here’s our data : a C string. These are the symbols for Foundation’s constant strings.

OK, to sum up

As we’ve seen, NSString can do some tricky optimizations with constant strings. This is actually possible because NSString is immutable. All of this would be much harder if all string objects were mutable.

Since NSString is a Class Cluster, none of the objects we’ve seen is just an NSString or an NSMutableString. 3 Rather than having a big string class that can handle every use, we have a number of specific implementations, tailored for every corner case.

  1. Of course, you should never use -retainCount

  2. It’s a little-endian machine : pointers in memory are “backwards”. 

  3. Actually, it’s very hard, if not pointless, to create an object of the NSString class.