EECE 352: PS 8

Due 26 October 1999

Problem 1 (100%) I hope you remember the RDF file from PS2 because we are going to use it again. This time however, we are going to be a bit more elegant. You will turn the same RDF file into a much nicer set of html files, like this, where the root node is Computer_Science.html.

You will do this by implementing a tree that can read the RDF file and output it in the needed form. You are free to use any STL containers that you want, as long as you still implement a tree "from scratch". That is, your tree needs to have a root node, and that node needs to be of a class that you defined. The node will have pointers to the first child, and to the next sibling, as explained in class.

Parsing the RDF file is not as hard as you might think. Notice that the file has been printed in preorder. Notice also that all the information you need to extract from the file, in order to build the tree, is found within the "Topic" sections (you can completely ignore the ExternalPage sections, you will see that the main.cpp reads them for you). That is, between the "<Topic" and "</Topic>". The value right after "r:id=" is the title of the node, which tells us where in the hierarchy this node fits. As you can expect from a preorder listing, the first node is the root, which is "Top/Computers/Computer_Science". The next node is a child of this node. In general, the next node is always either a child of the last node we read, or a child of one of the ancestors of the last node we read.

The best way to attack the parsing problem is to first implement a function that reads one and only one topic section into a node. Once you have this then you can start reading other nodes and figure out how to place them in the tree. You will know that one node is a child of another node because the title of the parent is a subset (substr) of the title of the child. That is "Top/Computers/Computer_Science" is the parent of "Top/Computers/Computer_Science/People". Also, since they all start with "Top/Computers/" you can just get rid of that to make the output nicer.

Hint: The following code replaces all "/"s in string s, with "-". In general, review the solutions to PS2 and become reacquainted with strings.

  string s;
  string::size_type start;
  while ((start = s.find("/")) != string::npos){
   s.replace(start,1, "-");
  }
The output: Each node in your tree will represent a Topic section, each node will also be written out in its own file. The name of the file is the title (the "r:id" without the "Top/Computers/" part) of the Topic section, with all the "/"s replaced with "-" and with ".html" appended at the end. Each file should also have pointers to all the node's children's pages, as well as pointers to the urls that belong to that node. You will use the URL class, along with the map (as seen in the main.cpp below) to print these urls. The URL object already does all the pretty-printing of the urls so you do not have to worry about that.

This is the main.cpp I used to generate the desired output.

#include<iostream>
#include<fstream>
#include<string>
#include<map>

#include"YahooTree.h"
#include"URL.h"

using namespace std;

int main() {
  ifstream opendir("//Engr_asu/ECE352/computer_science.txt");
  if (opendir == 0) {
    cout << "Could not open input file" << endl;
  }
  
  YahooTree cs;
  opendir  >> cs;	//reads the whole file into the tree.
  opendir.close();
  
  
  ifstream od("//Engr_asu/ECE352/computer_science.txt");
  if (od == 0) {
    cout << "Could not open input file" << endl;
  }
  map<string, URL> m; //the string the the url text (e.g. "http://me.com")
  // while the URL object holds the title and description.
  URL next;
  while (od >> next) {
    m[next.getURL()] = next;
  }
  
  cs.printAsHTML(m); //this one does all the work. We pass it a map that
  //matches url strings to URL objects (so you can print them easily).
  od.close();
  return 0;
}
This is the new URL.h. I only added one member function. URL.cpp stays the same as in the PS2 solutions.
// URL.h: interface for the URL class.
//
//////////////////////////////////////////////////////////////////////

#if !defined(AFX_URL_H__E9240792_614A_11D3_933D_0060674E1056__INCLUDED_)
#define AFX_URL_H__E9240792_614A_11D3_933D_0060674E1056__INCLUDED_

#if _MSC_VER > 1000
#pragma once
#endif // _MSC_VER > 1000

#include<string>
#include<iostream>

using namespace std;
class URL;
ostream & operator<<(ostream & os, const URL & u);
istream & operator>>(istream & is, URL & u);

class URL  
{
	string url;
	string description;
public:
	string title;
	URL(const URL &u);
	URL(string & u, string &t, string &d);
	URL();
	URL(string &u);
	virtual ~URL();
	friend ostream & operator<<(ostream & os, const URL & u);
	friend istream & operator>>(istream & is, URL & u);
	int operator==(const URL &u) const;
	int operator<(const URL &u) const;
	string getURL(){
		return url;}; //added this;

};

#endif // !defined(AFX_URL_H__E9240792_614A_11D3_933D_0060674E1056__INCLUDED_)


Jose M. Vidal
Last modified: Thu Oct 14 19:07:50 EDT 1999