Internet Basics and HTML

CSCE 242

University of South Carolina

José M. Vidal [1] [2]

This talks provides a brief history of the Internet and distributed applications. For more information read:

1 Timeline

Year Event
1970 Work on Unix begins. VT 05 introduced.
1971 First FTP and Telnet implementations.
1973 TCP development begins.
1978 VT 100 introduced.
1982 Sun founded (now). Sun 1
1983 BSD Unix ships TCP-IP stack. TCP-IP becomes a government standard. Sun hires Schmidt . [4]
1984 Work begins on X-windows. X terminal. Apple MacIntosh. Steves, with stache
1985 MS Windows, Founders, Gates, modeling.
1989 Object Management Group founded.
1990 Object Linking and Embedding (OLE).
COM provided infrastructure.
Tim Berners-Lee develops WWW.
1991 CORBA 1.1 released.
1992 Internet opened to commercial traffic.
1993 Mosaic released.
1994 Netscape navigator released. Yahoo
1995 CORBA 2.0 released. IIOP standardized. Gosling introduces Java.
1996 COM is renamed ActiveX. Brin and Page develop Backrub, in a tub?. Hotmail.
1997 Java RMI (JDK 1.1).
1998 DCOM. Google. XML 1.0.
1999 RSS. Netscape dies.
2000 Microsoft announces .NET initiative.
Chooses SOAP for distributed programming.
2004 Gmail, Firefox
2005 Ajax coined

The history of the Internet is interlaced with the history of Unix as it was on Unix that all the early tools (programs) for inter-machine communication were developed. First we saw tools for transferring data from one machine to another as that was the most pressing need. Once you could transfer data, it was a simple matter to make this data a text file and call it email. Programs first began to talk to each other using Remote Procedure Calls (RPC). RPCs are a simple extension on functional languages which, at that time, were dominant. As object-oriented languages (OOLs) became popular, with the advent of C++, the programmers needed a way to do what RPC did, but with an OOL. This need ushered lead the development of CORBA. Later on, Java re-implemented the same ideas, but in a simplified manner, in Java RMI.

Microsoft's OS was only used in low-end personal computers which were rarely networked. As such, it largely ignored these developments. However, Microsoft faced another problem, it needed to develop a way for many programs to share the same functionality without having it replicated in each program. That is, it needed shared libraries which they named Dynamic Link Libraries (DLL). DLLs were used to create components which were tied together by following specific guidelines, a process dubbed Object Linking and Embedding (OLE). OLE went thru several revisions and was then renamed COM. COM is a component model which allows one component to be used by other programs running in the same machine. Microsoft realized that they could add some networking infrastructure and distribute the invocation of component, and idea that gave rise to Distributed COM (DCOM). DCOM is very complicated to learn and many programmers shun it. In 2000 Microsoft announced the .NET initiative which replaces all the functionality of DCOM with a much simpler system based on open standards.

1.1 The Future

Number of Hosts [5]

2 Abstraction Layers

Application. Application.
Transport. TCP, UDP. Transport. TCP, UDP.
Internet. IP. Internet. IP.
Host-to-network. Ethernet, FDDI.

In this class we will focus on the application layer.



Of course, you do not need to memorize the particular bit positions. However, it is important to know what information is stored in a packet since this tells you what can and cannot be done at the network layer. For example, at the network layer one cannot filter on content (data) without also knowing how the content is represented.

3.1 TCP and UDP

4 Firewalls and Proxies


We care about firewalls mostly when they prevent us from deploying a distributed program. If you are building a large distributed system you will need to coordinate with the network administrators to make sure that all the packets can get thru at all times.


5.1 Whois

6 Internet Services

7 Example

  1. Blossom wants to send a packet to buttercup at port 80.
    1. Blossom queries the local DNS server for buttercup's IP address.
    2. Blossom broadcasts an IP datagram with destination address of on the local Ethernet.
    3. No other computer on the local Ethernet reads the packet because its not addressed to them (unless they are running a sniffer).
    4. The router recognizes that this IP will not find its destination of that subnet. It the re-broadcasts the packet on the appropriate sub-net. The source and destination fields are kept the same.
    5. Buttercup sees the packet addressed to it and reads it.
  2. Blossom wants to send a packet to bubbles on port 25.
    1. Bubbles does not have a real IP number, so it does not exist on the Internet.
    2. Blossom somehow (offline) determines that it should instead send a packet to on port 25, hoping that the NAT will do the right thing.
    3. Repeat the first four steps of the previous scenario, except that this time the other router picks it up and forwards it to the Internet.
    4. The packet goes thru any number of routers on the Internet until it is seen by who then changes the destination IP of the packet to and places it on the local subnet.
    5. Bubbles sees the modified packet and reads it.
NAT router

The computers "outside" an NAT see it as just one machine. The computers inside the NAT think that they are on the open Internet. The NAT has the job of remembering who on the inside is talking to who on the outside and re-write the Destination-IP fields of all the packets.

One drawback of using an NAT is that a computer on the outside cannot make first contact with a computer on the inside, computers on the outside can only reply to messages sent from the inside. That is unless the NAT is set up to specifically forward new packets to some machine inside. For example, it could be set up to forward all new packets to port 80 to a particular machine which servers as the company's web server.

8 Internet Standards

Internet Engineering Task Force (IETF [11]) World Wide Web Consortium (W3C [12])
Democratic. Open to anyone. Vendor organization led by dues-paying corporations.
After-the-fact. Request for Comments (RFC) Before-the-fact.
  1. Experimental
  2. Proposed standard
  3. Draft standard
  4. Standard
  5. Informational- not required
  6. Historic- obsolete
  1. Note
  2. Working draft
  3. Candidate recommendation
  4. Proposed recommendation
  5. Recommendation

9 Uniform Resource Identifier

10 HTML, SGML, and XML


12 Hyper Text Transfer Protocol


The fact that HTTP is stateless was both one of the reasons for its widespread early adoption as well as one of the biggest headeaches when using it for complex applications. Because it is stateless it is very easy to implement, so basic web servers could be written in a page of code. However, it also means that if the user is involved in an interaction that requires more than one step then some sort of cheat must be used. The first attempts extended the URL with state information. This created some large ugly URLs and gave rise to problems if the user decided to bookmark that URL. The next attempt was the standardization by Netscape of Cookies.

12.1 HTTP Request Methods

GET URI Retrieve information pointed to by URI.
HEAD URI Identical to GET but server must not return a message body.
POST URI data Request that the server accept the following data as a subordinate of the resource pointed to by URI.
PUT URI data Replace the data pointed to by URI with the following data
DELETE URI Delete the data pointed to by URI.

12.2 HTTP Request Headers

Accept: Specify media types which are acceptable for response (e.g. pdf, gif, png, flash, etc.) Can also give ordered preferences using the q quality value: 1 is most preferred, 0 is least.
Accept: audio/*; q=0.2, audio/basic
Accept-Charset: Specify what charsets are acceptable for response.
Accept-Charset: iso-8859-5, unicode-1-1;q=0.8
Accept-Encoding: Restricts the content encodings that are acceptable.
Accept-Encoding: compress, gzip
Accept-Language: Restrict the language.
Accept-Language: da, en-gb;q=0.8, en;q=0.7
Authorization: Include the credentials (username and password) needed to authenticate with the server.
From: Contains the email address of the human issuing the request.
Host: Specifies the Internet host and port number of the resource being requested.
GET /pub/WWW/ HTTP/1.1
If-Modified-Since: Only return the document if it has been modified after the given date.
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
User-Agent: Specifies the user agent (web browser) being used.
Mozilla/4.0 (compatible; MSIE 5.5;Windows 98)

12.3 HTTP Response

Accept-Ranges: Server specifies its acceptance of range (parts of a file) requests.
Accept-Ranges: bytes.
Age: Time since the response was generated by the server. Useful for changing data such as weather and stock quotes. Age is in seconds.
Age: 600
ETag: Value of the entity tag for the requested variant. ETags are meant to function as unique identifiers for documents.
Location: Used to re-direct the user to another location.
Server: Information about the server.
Server: CERN/3.0 libwww/2.17

12.4 Cookies

13 CGI

13.1 GGI URL

13.2 CGI Problems

14 Applets

15 References



  1. José M. Vidal,
  3. RFC 2616:,
  4. .,
  6. Comic Strip,
  7. RFC 761,
  8. RFC 768,
  9. series of RFCs,
  10. whois,
  11. IETF,
  12. W3C,
  13. RFC 2396,
  14. RFC 2141,
  15. doi,
  16. RFC 1738,
  17. RCF 2045,
  18. RFC 1945,
  19. RFC 2616,
  20. RFC 2109,
  21. CGI Specifications,
  23. wikipedia:Hypertext_Transfer_Protocol,

This talk available at
Copyright © 2009 José M. Vidal . All rights reserved.

19 December 2008, 03:27PM