Section 4 - Our First Lab Rat: iTunes
1A.4.1 Our First Sample Data Set - iTunes
Some students and teachers like using ints and names like foo() and BaseClass when doing examples. I prefer something a bit more real. Certainly, when demonstrating a dozen different techniques on a reference page, it makes sense to abstract the idea with classes named BaseClass and SubClass, and methods like void foo(int), but we are trying to do two things in this course:
- Learn data structures, and
- Stay awake.
With Item 2 in mind, I will give you several examples that, while still "toy" in the sense that they are a few clicks simpler than a real-world application, have the basic structure of such an application. Our first example will come from the world of e-Commerce and music: iTunes.
An iTunes Music Library is actually stored in an XML file. Internally, it is an ascii text file that contains a dictionary entry for every "tune" (or video or lecture or streaming radio station). One such entry in the file looks something like this:
<dict><key>Track ID</key> <integer>576</integer> <key>Name</key> <string>Everytime We Say Goodbye</string> <key>Artist</key> <string>Irene Kral</string> <key>Album</key> <string>Kral Space</string> <key>Genre</key> <string>Jazz</string> <key>Kind</key> <string>AAC audio file</string> <key>Size</key> <integer>6287513</integer> <key>Total Time</key> <integer>389932</integer> <key>Track Number</key> <integer>9</integer> <key>Track Count</key> <integer>10</integer> <key>Date Modified</key> <date>2005-01-08T05:17:15Z</date> <key>Date Added</key> <date>2005-01-08T05:16:47Z</date> <key>Bit Rate</key> <integer>128</integer> <key>Sample Rate</key> <integer>44100</integer> <key>Persistent ID</key> <string>D0EFC2D5EBF3CD3A</string> <key>Track Type</key> <string>File</string> <key>File Type</key> <integer>1295270176</integer> <key>File Creator</key> <integer>1752133483</integer> <key>Location</key> <string>file://localhost/09%20Everytime%20We%20Say%20Goodbye.m4a</string> <key>File Folder Count</key> <integer>4</integer> <key>Library Folder Count</key> <integer>1</integer> </dict>
There is a lot of information here. Don't worry, we are not going to engage in an exercise in reading this file or parsing the data. Nevertheless, take a moment to look at the file format and what it contains. Every two lines is a new field, and we can see the field name between the <key> ... </key> tags, followed, on the next line by the field value, between some tags like <string> ... </string> (if the field contains string data), or <integer> ... </integer> (if the field contains int data). Just as newlines and white space are irrelevant to the actual parsing of our C++ programs, they don't play any formal role in the XML iTunes library file, either. But indentation and style are included for readability by human programmers who often need to open the file to do maintenance.
Many of these field name / field value pairs are hard to decipher, but some are pretty obvious.
- Genre: Jazz
- Artist: Irene Kral
- Name: Everytime We Say Goodbye
By the way, Name means the title of the song. Here's one that's kind of interesting:
- Total Time: 389932
389,932 what? Food for thought.
1A.4.2 A Class To Work With: iTunesEntry
We want to read an iTunes Music Library file into our program in order to play with the data, massage it, display it, etc. (Remember, I will do that for you, so don't panic). When we do so, we need a structured data type -- a class -- that mimics the information in one iTunes Music Library entry between the <dict> ...</dict> tags. I can already hear you thinking ... "class ... call it iTunesEntry ... must have private members ... member names should correspond to names of key fields like artist, genre, time, ... type of each member -- probably either string (for string data) ... or int (for int data) ... ". Did I correctly read your mind?
We're not going to use all the data. In fact, to keep things simple, I have selected three fields: Artist, Name and Total Time (which I will call, in my iTunesEntry class artist, title, and tuneTime, respectively). In preparation for reading and manipulating this data, then, we'll need to get familiar with a simple class. I will now list most of the prototype for that class:
class iTunesEntry { public: private: string title, artist; int tuneTime; public: static const unsigned int MIN_STRING = 1; static const unsigned int MAX_STRING = 300; static const int MAX_TIME = 10000000; iTunesEntry(); //mutators bool setTitle(string sArg); bool setArtist(string sArg); bool setTime(int nArg); // accessors string getTitle() const { return title; } string getArtist() const { return artist; } int getTime() const { return tuneTime; } // helpers static int convertStringToSeconds(string sToCnvrt); string convertTimeToString() const; // comparator tools // could use static const int, instead: private: static int sortKey; public: static enum { SORT_BY_TITLE, SORT_BY_ARTIST, SORT_BY_TIME } eSortType; static bool setSortType( int whichType ); bool operator<(const iTunesEntry &other) const; bool operator>(const iTunesEntry &other) const; bool operator==(const iTunesEntry &other) const; bool operator!=(const iTunesEntry &other) const; string getArtistLastName() const; };
As advertised, I have three private instance members, some accessors and mutators for them, and a few static constants to help filter bad arguments passed to mutators. I have not overloaded the default constructor which implies we will be instantiating objects with no arguments, thereby getting default values into the members and writing to them after instantiation (using the mutators, of course). I will provide you with a .h header file that contains the above prototype and a .cpp source file that contains the implementations of these methods. The behavior of the methods, however, should be pretty clear from their names and the comments above, so you won't really need to study the implementation.
Peruse this carefully since you will be using it over the next couple weeks.
We are going to add more to this class before the week is out, since we need to make these objects comparable. Thus, I'll eventually overload the < operator so we can compare two iTunesEntry objects. That will be an interesting task since we need to be able to use < to mean different things at different times depending on what we want to compare (tuneTime or title or something else?).
Here is a global-scope function that will send an iTunesEntry object to the screen so we can keep our client, main(), clean. I will often keep the output functions at global scope in an effort to emphasize the idea of separating I/O from data manipulation.
void displayOneTune(const iTunesEntry & tune) { cout << tune.getArtist() << " | "; cout << tune.getTitle() << " | "; // cout << tune.getTime() << " | "; cout << " " << tune.convertTimeToString() << endl; }
This is our first of many examples where we will be creating classes and using all the vocabulary of CS 2A/2B (CIS 15A/B) without explanation. There will be no time in this course to teach concepts like static consts, mutators, default constructors, inheritance and the like, which you are assumed to already know, cold. I'll do a quick review, however, of a few CS 2B concepts later this week
1A.4.3 A Sample Client For Our Experiments
You will be given a pre-processed data file called itunes_file.txt that contains just the data we are interested in. It will be in a non-XML, easy-to-read form if you decide to open it manually. I provide all the file input code for you. The few functions that read in the data are encapsulated in a second class that I won't discuss in detail. As you can see from the next sample it is almost invisible to you, as the client writer. The class is called iTunesEntryReader, and you can simply use the code, and feel secure that it gets the data for you. This "reader class" allows you to
- get the data from the file, and
- have a means to move it, one object-at-a-time into any array, ArrayList or other data structure that is built of iTunesEntry objects.
It can be dereferenced using the brackets operator using indices from 0 to getNumTunes() - 1 as you will see below.
The client template
Here is an example main() that reads the data from the file and displays it to the screen. It is this template that we will be using when we try out different algorithms or data structures throughout the course, even when we switch to other kinds of databases.
// Main file for iTunes project. See Read Me file for details // CS 2C, Foothill College, Michael Loceff, creator #include <iostream> using namespace std; #include "iTunes.h" // for timing our algorithms #include <time.h> // ----------- prototypes ------------- void displayOneTune(const iTunesEntry & tune); // --------------- main --------------- int main() { // how we read the data from files iTunesEntryReader tunesInput("itunes_file.txt"); int arraySize; // how we test the success of the read: if (tunesInput.readError()) { cout << "couldn't open " << tunesInput.getFileName() << " for input.\n"; exit(1); } cout << tunesInput.getFileName() << endl; cout << tunesInput.getNumTunes() << endl; // create an array of objects for our own use: arraySize = tunesInput.getNumTunes(); iTunesEntry *tunesArray = new iTunesEntry[arraySize]; for (int k = 0; k < arraySize; k++) tunesArray[k] = tunesInput[k]; // how we time our algorithms ------------------------- clock_t startTime, stopTime; startTime = clock(); // do something interesting like search or sort or build a hash-table, then... // how we determine the time elapsed ------------------- stopTime = clock(); // show the array for (int k = 0; k < arraySize; k++) displayOneTune(tunesArray[k]); cout << endl << endl; // report algorithm time cout << "\nAlgorithm Elapsed Time: " << (double)(stopTime - startTime)/(double)CLOCKS_PER_SEC << " seconds." << endl << endl; delete[] tunesArray; return 0; } void displayOneTune(const iTunesEntry & tune) { cout << tune.getArtist() << " | "; cout << tune.getTitle() << " | "; // cout << tune.getTime() << " | "; cout << " " << tune.convertTimeToString() << endl; }
1A.4.4 Style
Now that you've studied the code for content, look at it again for style. If you have had me in prior courses, you know I take off points for bad style. That's because in the real world, bad style wastes time and money, and in well managed software groups you can't even get your code accepted for logic/debug review if it does not have every character placed correctly, all tabs removed and variable names properly formed.
I have a few rules about indentation that you need to know. I'm not quite so strict about variable names, although I have a couple rules for those, as well. While there is a style handout, you can just look up and you'll get the main idea. If your code looks like this, you won't lose points for bad style.
1A.4.5 Instructions For Download
All the iTunes code, as well as code for two other application examples we'll cover, can be downloaded from here:
Unzip the archive and you will see an CS_2C_Client_Support folder, which contains three sub-folders, one of which is an iTunes Folder. Go into that folder and open the Read Me file, which explains how to incorporate the data file into your project. You'll also have a sample main() to get you started.