Section 6 - ArrayNames and Pointers

2A.6.1 Memory Management

The most important single concept for any programmer is memory management. If you can safely and effectively control how memory is allocated and where your data really live, you have the power to control your destiny ... or at least your program.

The most important specific topic in the area of memory management is understanding arrays and pointers.  Arrays are covered in CS 2A, and are also in the assigned reading for this week. I won't cover them from an elementary standpoint in this session, but we will now look at arrays and how they relate to pointers.

2A.6.2 The Meaning of arrayName[n]

You probably think of an array as something that is declared like this:

double copeland[100];

I call this a fixed-size array or bracketed array.

With a fixed-size array you should visualize these things:

  1. A block of memory is allocated whose size is determined by the index, e.g. 100 doubles. These are accessed by using the syntax copeland[0] through copeland[99].
  2. The array name without brackets, e.g. copeland, should be thought of as a pointer to the zero'th element of the array. Another way to think of copeland is as the address of copeland[0], that is, copeland is the same as &copeland[0].
  3. The array name cannot be reassigned to anything, e.g., you cannot do this: copeland = gershwin.

Don't let the second item throw you.  You know that array elements are just individual variables.  So copeland[0] is just a double.  You also know what it means to take the address, &, of a variable.  Putting these two things together means that &copeland[0] is the address of the double variable copeland[0].

An executable expression like

copeland[k] = 123;

is interpreted by the compiler as the directive: store the number 123 in the location k doubles past the beginning of the copeland array. Technically this becomes:

*(copeland + k) = 123;

You are possibly puzzled at the concept of adding an int to an array name. I'll walk you through it.

copeland[0] is equivalent to *copeland

As I said a moment ago, the array name, copeland, is the address of the zero'th element.  That means one way to access copeland[0] is through the alternative notation *copeland. After all, if you have an address and you want to get at the variable it points to, that's what you do.  So copeland[0] and *copeland are the same thing and can be used interchangeably.

copeland[1] is equivalent to *(copeland + 1)

Since copeland is the address of the zero'th element of the array, copeland + 1 is one location past that address (not one byte, mind you, but one full location of whatever type the array holds: in this case, doubles). So copeland + 1 is simply the address of copeland[1].  As with any pointer or address, we can use the dereference operator, *, to access that element, itself.  This explains how *(copeland +1) is the same as copeland[1].

copeland[k] is equivalent to *(copeland + k)

Now replace the 1 by k in the last analysis and you will  see that we are accessing the location k items past the beginning of the array, so *(copeland+k) is the same as copeland[k].  We can use either syntax in a program:

copeland[k] = 123;

or

*(copeland+k) = 123;

Professional C++ programmers often revert to this more unreadable pointer notation to access array elements, but I don't advise you to follow their lead.  Instead, understand the pointer notation and be ready to use it in places where it reveals something important about the logic.  Most of the time, however, use the brackets notation in your programs.  (To encourage you to form this good habit, if you use the * notation to access an array element unnecessarily, I will deduct a point).

2A.6.3 Relationship Between Arrays and Pointers

Arrays and pointers bear a close relationship to one another.

The simplest situation which demonstrates the relationship is this:

#include <iostream>
#include <string>
using namespace std;

int main()
{
  int dogs[100], *dogPtr;

  dogs[3] = -21;          // always ok
  // dogPtr[3] = -21;     // compiles but
  // run-time error

  dogPtr = dogs;          // always ok
  dogPtr[3] = -21;        // now OK

  // dogs = dogPtr;       // won't compile -
  // dogs not lval

  // prove that we can use either notation:
  cout << dogs[3] << " " << dogPtr[3] << endl;
  return 0;
}

With output:

console shot

In this example, both dogs and dogPtr have the potential of being used as array names in expressions like dogs [3] = -21 or dogPtr [3] = -21. However, the latter will yield a run-time error (or worse) if it occurs before dogPtr is assigned to point at allocated memory.

Before you can dereference a pointer variable with * or [], you make sure it points to actual allocated memory.  One way is through the new operator (fp = new float;).  Another way is through an assignment statement (dogPtr = dogs).

Declaring a bracketed array like dogs allocates memory. The amount of memory is dictated by the number inside the brackets: int dogs [100] allocates 100 ints.

Declaring a pointer allocates no memory for the ints or floats, but it does always allocate one location for the pointer itself.

On the other hand, a pointer is a "free agent" and may change what it points to at any time. A bracketed array is dedicated to always point to the 0th element of the array it controls. That cannot be changed. One way to say this is that the array name is an illegal l-value, or lval (i.e., it cannot be placed on the LHS, or "left hand side," of an assignment operator)..

The following table summarizes the difference between a bracketed array, and a pointer which potentially points to an array.  This table uses the base primitive type of int for the array.

Bracketed ("static") Array vs. Pointer
int a[100] int *b
100 ints allocated 0 ints allocated
0 pointers allocated 1 pointer allocated
a illegal lval b legal lval
Can be dereferenced, a[k] or *(a+k), always Must first be assigned (like b=a) before it can be dereferenced b[k] or *(b+k)
In an expression, both a and b can be de-referenced using brackets, a[3], b[3], or pointer notation, *a, *b, *(a+4), *b++, and so on.