Problem 1 (50%): The first part of this problem set will excersize your knowledge of objects and strings (as defined in the C++ standard library).
For this problem you will convert part of the RDF dump of the Open Directory Project into an HTML page suitable for viewing. (I am using the TOP/Computers/Computer_Science branch, since the whole thing now stands at 250MB.) Basically, you will convert this file (also available in the class directory), into this file (use the "View, Source" option of your browser to view the HTML), by running the main program shown below.
You will achieve this by implementing a URL class which implements: operator>>, operator<<, and the appropiate constructors and destructor. Even thought the main program below does not use them, you also need to implement, a copy constructor, a constructor that takes only one string parameter (the url), and a constructor that takes three string parameters (the url, the title, and the description). A URL object stores the actual url, the title, and the description associated with that url. You should use the string class since it will make your life much easier and full of joy.
operator>> will be the hardest to implement since it needs to parse the RDF file. Take a look at the input file. You will notice Topic blocks and ExternalPage blocks. You can completely ignore the Topic blocks and anything that is in them. Each call to operator>> needs to read exactly one ExternalPage block into a URL object---this includes the actual URL, the Title and the description (if any). To put it more succintly, the call to operator>> should return only right after it has read a whole ExternalPage block, not before, and not later (unless, of course, you reach the EOF). Notice that each element in the RDF file is in a different line. You might find the getline(istream &, string &) function useful. Also, note that the string find function can look for arbitrary length strings. That is, you can say string s ="abracadabra"; s.find("cada"), and it will return the starting position of substring "cada" within s.
#include "URL.h" #include <iostream> #include <fstream> #include <string> using namespace std; //Note: having operator>> read in one format (RDF) // and operator<< write in another format (HTML) // is almost always a bad idea. We are just doing it // as an excersice. In this case you would want to // give these functions different names (e.g. writeAsHTML) int main(int argc, char* argv[]){ ifstream opendir("//Engr_asu/ECE352/computer_science.txt"); if (opendir == 0) { cout << "Could not open input file" << endl; } //Change this filename to write to your own folder. ofstream fout("//Engr_asu/ECE352/ps2out.html"); if (fout == 0) { cout << "Could not open output file" << endl; } fout << "<html><head><title>PS2 Output</title></head><body>" << endl << "<h1>PS2 Output</h1>" << endl << "<ul>" << endl; URL next; while (opendir >> next){ fout << "<li>" << next << endl; } fout << "</ul>" << endl << "</body></html>" << endl; fout.close(); return 0; }
In this example you can see the utility of objects. The URL knows how to read itself from a stream and write itself out. If we needed more I/O methods we would just need to define more functions. We could also define other operators on the URL such as operator== which could do a smart comparison of two urls (i.e. ignoring trailing "index.htm", or "index.html").
Problem 2: (50%)