Arrays and Array Processing

There are many ways to store data. So far, we have covered linear recursive structures, lists, and binary recursive structures, trees. Let's consider another way of storing data, as a contiguous, numbered (indexed) set of data storage elements:

anArray =

TABLE 1
itemA	itemB	itemC	itemD	itemE	itemF	itemG	itemH	itemI	itemJ
0	1	2	3	4	5	6	7	8	9

This "array" of elements allows us to access any individual element using a numbered index value.

DEFINITION 1: array: At its most basic form, a random access data structure where any element can be accessed by specifying a single index value corresponding to that element.
EXAMPLE
anArray[4] gives us itemE. Likewise, the statement anArray[7] = 42 should replace itemH with 42.

NOTE:

Notice however, that the above definition is not a recursive definition. This will cause problems.

Arrays in Java

Arrays...
- are contiguous (in memory) sets of object references (or values, for primitives),
- are objects,
- are dynamically created (via new), and
- may be assigned to variables of type Object or primitives
An array object contains zero or more unnamed variables of the same type. These variables are commonly called the elements of the array.
A non-negative integer is used to name each element. For example, arrayOfInts[i] refers to the i+1st element in the arrayOfInts array. In computer-ese, an array is said to be a "random access" container, because you can directly (and I suppose, randomly) access any element in the array.
An array has a limited amount of intelligence, for instance, it does know its maximum length at all times, e.g. arrayOfInts.length.
Arrays have the advantage that they
- provide random access to any element
- are fast.
- require minimum amounts of memory

More information on arrays can be found in the Java Resources web site page on arrays

REMEMBER:

Arrays are size and speed at a price.

Array Types

An array type is written as the name of an element type followed by one or more empty pairs of square brackets.
- For example, int[] is the type corresponding to a one-dimensional array of integers.
An array's length is not part of its type.
The element type of an array may be any type, whether primitive or reference, including interface types and abstract class types.

Array Variables

Array variables are declared like other variables: a declaration consists of the array's type followed by the array's name. For example, double[][] matrixOfDoubles; declares a variable whose type is a two-dimensional array of double-precision floating-point numbers.
Declaring a variable of array type does not create an array object. It only creates the variable, which can contain a reference to an array.
Because an array's length is not part of its type, a single variable of array type may contain references to arrays of different lengths.
To complicate declarations, C/C++-like syntax is also supported, for example,
```
double rowvector[], colvector[], matrix[][];
```
This declaration is equivalent to
```
double[] rowvector, colvector, matrix[];
```
or
```
double[] rowvector, colvector;
double[][] matrix;
```
Please use the latter!

Array Creation

Array objects, like other objects, are created with new. For example, String[] arrayOfStrings = new String[10];declares a variable whose type is an array of strings, and initializes it to hold a reference to an array object with room for ten references to strings.

Another way to initialize array variables is


int[] arrayOf1To5 = { 1, 2, 3, 4, 5 };
String[] arrayOfStrings = { "array",
                            "of",
                            "String" };
Widget[] arrayOfWidgets = { new Widget(), new Widget() };

Once an array object is created, it never changes length! int[][] arrayOfArrayOfInt = {{ 1, 2 }, { 3, 4 }};
The array's length is available as a final instance variable length. For example,
```
int[] arrayOf1To5 = { 1, 2, 3, 4, 5 };
System.out.println(arrayOf1To5.length);
```
would print ``5''.

Array Accesses

Indices for arrays must be int values that are greater than or equal to 0 and less than the length of the array. Remember that computer scientists always count starting at zero, not one!
All array accesses are checked at run time: An attempt to use an index that is less than zero or greater than or equal to the length of the array causes anIndexOutOfBoundsException to be thrown.
Array elements can be used on either side of an equals sign:
- myArray[i] = aValue;
- someValue = myArray[j];
Accessing elements of an array is fast and the time to access any element is independent of where it is in the array.
Inserting elements into an array is very slow because all the other elements following the insertion point have to be moved to make room, if that is even possible.

Array Processing Using Loops

More information on loops can be found at the Java Resources web site page on loops.

The main technique used to process arrays is the for loop. A for loop is a way of processing each element of the array in a sequential manner.

Here is a typical for loop:


// Sum the number of elements in an array of ints, myArray.
int sum = 0;  // initialize the sum

for(int idx=0; idx < myArray.length; idx++) {  //start idx @ 0; end idx at length-1;
                                               //increment idx every time the loop is processed.
 sum += myArray[idx];  // add the idx'th element of myArray to the sum
}

There are a number of things going on in the above for loop:

Before the loop starts, the index idx is being declared and initialized to zero. idx is visible only within the for loop body (between the curly braces).
At the begnning of each loop iteration, the index idx is being tested in a "termination condition", in this case, idx is compared to the length of the list. If the termination condition evaluates to false, the loop will immediately terminate.
During each loop iteration, the value of the idx's element in the array is being added to the running sum.
After each loop iteration, the index idx is being incremented.

One can traverse an array in any direction one wishes:


// Sum the number of elements in an array of ints, myArray.
int sum = 0; // initialize the sum

for(int idx=myArray.length-1; 0<=idx; idx--) { //start idx @ length-1; end idx at 0;
                                               //decrement idx every time the loop is processed. 
    sum += myArray[idx]; // add the idx'th element of myArray to the sum
}

The above loop sums the list just as well as the first example, but it does it from back to front. Note however, that we had to be a little careful on how we initialized the index and how we set up the termination condition.

Here's a little more complicated loop:


// Find the index of the smallest element in an array of ints, myArray.
int minIdx = 0; // initialize the index. Must be declared outside the loop.

if(0==myArray.length) throw new NoSuchElementException("Empty array!"); // no good if array is empty!
else {
    for(minIdx = 0, int j = 1; j<myArray.length; j++) { //start minIdx @ 0, start index @ 1 ;
                                                        //end index at length-1; increment index every time the loop is processed. 
        if(myArray[minIdx] > myArray[j])
            minIdx = j; // found new minimum
    }
}

Some important things to notice about this algorithm:

The empty case must be checked explicitly — no polymorphism to help you out here!
The desired result index cannot be declared inside the for loop because otherwise it won't be visible to the outside world.
Be careful about using the minIdx value if the array was indeed empty--it's an invalid value! It can't be set to a valid value because otherwise you can't tell the difference between a value that was never set and one that was.
The for loop has two initialization statements separated by a comma.
The loop does work correctly if the array only has one element, but only because the termination check is done before the loop body.
Notice that to prove that this algorithm works properly, one must make separate arguments about the empty case, the one element case and the n-element case. Contrast this to the much simpler list algorithm that only needs an empty and non-empty cases.

For convenience, Java 5.0 now offers a compact syntax used for traversing all the elements of an array or of anything that subclasses type Iterable:


MyType[] myArray;  // array is initialized with data somewhere

for(MyType x: myArray){
    // code involving x, i.e. each element in the array
}

It is important to remember that this syntax is used when one wants to process every element in an array (or an Iterable object) independent of order of processing because Java does not guarantee a traversal order.

Let's look at an algorithm where we might not want to process the entire array:


// Find the first index of a given value in an array

int idx = -1;  // initialize the index to an invalid value.

for(int j=0; j<myArray.length; j++) {  //no initialization ; end index at length-1;
                                       //increment index every time the loop is processed. 
 if(desiredValue == myArray[j]) { // found match!
  idx = j;  // save the index.
  break;  // break out of the loop.
 }
}

Notes:

The only way you can tell if the desired value was actually found or not is if the value of idx is -1 or not. Thus the value of idx must be checked before it is ever used.
The resultant idx variable cannot be used as the index inside the loop because one would not be able to tell if the desired value was found or not unless one also checked the length of the array. This is because if the desired value was never found, idx at the end of the loop would equal the length of the array, which is only an invalid value if you already know the length of the array.
The break statement stops the loop right away and execution resumes at the point right after end of the loop.

There is a counterpart to break called continue that causes the loop to immediately progress to the beginning of the next iteration. It is used much more rarely than break, usually in situations where there are convoluted and deeply nested if-else statements.

Can you see that the price of the compact syntax of for loops is a clear understandability of the process?

While loops

for loops are actually a specialized version of while loops. while loops have no special provisions for initialization or loop incrementing, just the termination condition.

while loops iterate through the loop body until the termination condition evaluates to a false value.

The following for loop:


for([initialization statement]; [termination expr] ; [increment statement]) {

 [loop body]

}

Is exactly equivalent to the following:


{

 [initialization statement];

 while([termination expr]) {

  [loop body]

  [increment statement];

 }

}

Note the outermost curly braces that create the scoping boundary that encapsulates any variable declared inside the for loop.

The Java compiler will automatically convert a for loop to the above while loop.

Here is the above algorithm that finds a desired value in an array, translated from a for loop to a while loop:


// Find the index of the first occurance of desiredValue in myArray, using a while loop.
{
 idx = -1;  // initialize the final result
 int j = 0; // initialize the index

 while(j < myArray.length) {   // loop through the array
  if(desiredValue == myArray[j]) {   // check if found the value
   idx = j;  // save the index
   break;   // exit the loop.
  }
  
  j++;   // increment the index
 }
}

Basically, for loops give up some of the flexibility of a while loop in favor of a more compact syntax.

while loops are very useful when the data is not sequentially accessible via some sort of index. Another useful scenario for while loops is when the algorithm is waiting for something to happen or for a value to come into the system from an outside (relatively) source.

do-while loops are the same as while loops except that the conditional statement is evaluated at the end of the loop body, not its beginning as in a for or while loop.

See the Java Resources web site page on loops for more information on processing lists using while loops.

for-each loops

An exceedingly common for-loop to write is the following;


Stuff[] s_array = new Stuff[n];
// fill s_array with values

for(int i = 0; i < s_array.length; i++) {
 // do something with s_array[i]
}

Essentially, the loop does some invariant processing on every element of the array.

To make life easier, Java implements the for-each loop, which is just an alternate for loop syntax:


Stuff[] s_array = new Stuff[n];
// fill s_array with values

for(Stuff s:s_array) {
 // do something with s
}

Simpler, eh?

It turns out that the for-each loop is not simply relegated to array. Any class that implements the Iterable interface will work. This is discussed in another module, as it involves the use of generics.

Arrays vs. Lists

In no particular order...

Arrays:
- Fast access to all elements.
- Fixed number of elements held.
- Difficult to insert elements.
- Can run into problems with uninitialized elements.
- Minimal safety for out-of-bounds indices.
- Minimal memory used
- Simple syntax
- Must use procedural techniques for processing.
- Often incompatible with OO architectures.
- Difficult to prove that processing algorithms are correct.
- Processing algorithms can be very fast.
- Processing algorithms can be minimally memory intensive
Lists:
- Slow access except to first element, which is fast.
- Unlimited number of elements held.
- Easy to insert elements.
- Encountering uninitialized elements very rare to impossible.
- Impossible to make out-of-bounds errors.
- Not optimized for memory usage.
- More cumbersome syntax.
- Can use OO and polymorphic recursive techniques for processing.
- Very compatible with OO architectures.
- Easy to prove that processing algorithms are correct.
- Processing algorithms can be quite fast if tail-recursive and using a tail-call optimizing compiler.
- Processing algorithms can be very memory intensive unless tail-recursive and using a tail-call optimizing compiler.

BOTTOM LINE:

Arrays are optimized for size and random access speed at the expense of OO design and recursion. If you do not need speed or low memory, do not use an array. If you must use an array, tightly encapsulate it so that it does not affect the rest of your system.

Java

Friday, January 14, 2011