<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Grid Computing</title>
<link rel="stylesheet" type="text/css" href="../xslides-summary.css"></link>
<meta name="AUTHOR" content="Jose M Vidal"></meta>
<meta name="GENERATOR" content="xslides.el, written by J.M. Vidal"></meta>
</head>
<!--HEADER LEFT "http://jmvidal.cse.sc.edu/talks/gridcomputing/" --> 
<body>  
<h1 class="slidelist">Grid Computing</h1><div class="slidebody">
      We give a very high-level overview of grid computing and the
      Globus toolkit. This talk is based on:
      <ul>
	<li class="index">Ian Foster, Carl Kesselman, Jeffrey M. Nick, and Steven
	  Tuecke. <span><a href="http://www.globus.org/research/papers/ieee-cs-2.pdf">
	    Grid Services for Distributed System Integration</a> [1]</span>,
	    <em>IEEE Computer</em>, 35(6), 2002.</li>

	<li class="index">Ian Foster, Carl Kesselman, and Steven Tuecke. <span><a href="http://www.globus.org/research/papers/anatomy.pdf">The
	    Anatomy of the Grid: Enabling Scalable Virtual
	    Organizations</a> [2]</span>, 2001.</li>

	<li class="index">Ian Foster, Carl Kesselman, Jeffrey M. Nick, and Steven
	  Tuecke. <span><a href="http://www.globus.org/research/papers/ogsa.pdf">The
	    Physiology of the Grid: An Open Grid Services Architecture for
	    Distributed Systems Integration</a> [3]</span>, January, 2002.
	</li>

	<li class="index"><span><a href="http://www.globus.org/training/grids-and-globus-toolkit/index.html">Introduction
	    to Grid Computing and the Globus Toolkit</a> [4]</span></li>

	<li class="index">M. Mitchell Waldrop. <span><a href="http://jmvidal.cse.sc.edu/library/waldrop02a.pdf">Grid
	      Computing.</a> [5]</span> <i>Technology Review,</i> May; 105(4):30--37,
	    2002.</li>


      </ul>
    </div><h2 id="introduction">1 Introduction</h2><div class="slidebody">
      <ul>
	<li>Computing power as a utility. </li>
	
	<li>A grid ties many computers into a virtual
	  machine.</li>
	
	<li>A true grid must work
	  <ul>
	    <li>Across institutional barriers.</li>
	    <li>Indifferent to platform differences.</li>
	    <li>Robust in the face of network failures.</li>
	    <li>Provide security and authentication. </li>
	  </ul>
	</li>
      </ul>
    </div><b>Note:</b><div class="note">
      <p>
	As you read in the papers, interest on the Grid stems largely
	from scientists with large computational problems which cannot
	be solved on any of the machines they can afford to buy. The
	Grid allows one person or institution to easily use the
	computational power of many others. This ability is alluring
	because the fact is that many of these high-end machines sit
	idle for large periods of time (wasted CPU power). Also, in a
	similar way, the Grid allows the sharing of other resources,
	namely research instruments. For example, it can allow a
	research group to access a particle accelerator, telescope,
	seismic sensor, etc. that is located at the other side of the
	world.
      </p>
    </div><h2 id="use">2 Existing Grids</h2><div class="slidebody">
      <ul>
	<li>Many grids exist already. Mostly for scientific computing.
	  <ul>
	    <li><span><a class="remote" href="http://www-fp.mcs.anl.gov/fl/accessgrid">Access
		Grid</a> [6]</span>. ANL.</li>

	    <li><span><a class="remote" href="http://www.ipg.nasa.gov">Information Power
		Grid</a> [7]</span>- Nasa. </li>

	    <li><span><a class="remote" href="http://www.unicore.de">Unicore</a> [8]</span> </li>

	    <li><span><a class="remote" href="http://www.teragrid.org">TeraGrid</a> [9]</span>- NSF. </li>
	  </ul>
	</li>
	
	<li><em>"TeraGrid is a multi-year effort to build and deploy
	the world's largest, most comprehensive, distributed
	infrastructure for open scientific research. By 2004, the
	TeraGrid will include <b>20 teraflops</b> of computing power
	distributed at five sites, facilities capable of managing and
	storing nearly <b>1 petabyte</b> of data, high-resolution
	visualization environments, and toolkits for grid
	computing. Four new TeraGrid sites, announced in September
	2003, will add more scientific instruments, large datasets,
	and additional computing power and storage capacity to the
	system. All the components will be tightly integrated and
	connected through a network that operates at<b> 40 gigabits
	per second</b>."</em></li>

	<li><span><a class="remote" href="http://www.cse.sc.edu/~kcameron/">Dr. Kirk
	Cameron</a> [10]</span> heads a <span><a class="remote" href="http://scape.cse.sc.edu/" title="SCAPE Laboratory">laboratory</a> [11]</span> which has <span><a class="remote" href="http://uscnews.sc.edu/engr211.html" title="USC joins
	Grid Announcement">recently joined the Grid</a> [12]</span>.</li>
      </ul>
    </div><h2 id="definitions">3 Definitions</h2><div class="slidebody">
      <ul>
	<li>A <dfn>resource</dfn> is an entity to be shared: computer,
	storage, data, software. Defined by its interface. </li>

	<li>A <dfn>network enabled service</dfn> is a network protocol
	that provides us with some services: FTP server, Web
	server.</li>

	<li>A <dfn>virtual organization</dfn> (VO) is a dynamic
	grouping of resources and people in order to solve a
	problem. The Grid is meant to support VOs.</li>

	<li><dfn>Virtualization</dfn> enables consistent resource
	access across heterogeneous platforms. </li>

	<li>The <span><a class="remote" href="http://www.globus.org">Globus Toolkit</a> [13]</span> is
	the de-facto standard implementation of a Grid. </li>
      </ul>
    </div><h2 id="gridarchitecture">4 Grid Architecture</h2><div class="slidebody">
      <div class="floatright" style="width:320px">
	<img class="float" src="gridarch.png" alt="Grid Architecture"/>
      </div>
      <ul>
	<li>The <a href="allslides.xml#resource">resource protocol layer</a>
	facilitates sharing of resources. </li>

	<li>The <a href="allslides.xml#collective">collective</a> layer has services that involve
	the coordinate use of multiple resources. </li>

	<li>The <a href="allslides.xml#connectivity">connectivity</a> protocols enable
	connections. </li>

	<li>The <a href="allslides.xml#fabric">fabric</a> layer refers to a set
	of resource types. This layer interfaces to local
	control. </li>
      </ul>
    </div><h3 id="fabric">4.1 Fabric Layer</h3><div class="slidebody">
      <ul>
	<li>Provides the resources that are shared by the Grid: CPU
	time, storage, sensors.</li>

	<li>Fabric allows user to <em>ask</em> about available
	resources and provides <em>resource management</em>
	mechanism. For example
	  <ul>
	    <li><em>Computational resources:</em> How fast is the
	    hardware? Start, monitor, stop.</li>

	    <li><em>Storage:</em> Can I read/write? how much? how
	    fast? Read, write. </li>

	    <li><em>Network:</em> What is the load? Reserve
	    bandwidth. </li>

	    <li><em>Code repositories:</em> CVS. </li>

	    <li><em>Catalogs:</em> relational databases.</li>
	  </ul>
	</li>

	<li><b>Globus</b> uses whatever is already on the machine, but
	provides its own software if there is none. </li>
      </ul>
    </div><h3 id="connectivity">4.2 Connectivity</h3><div class="slidebody">
      <ul>
	<li>Defines the core communication and authentication
	protocols required for Grid-specific network
	transactions. </li>

	<li>Communication uses TCP/IP, DNS. </li>

	<li>Authentication demands:
	  <ul>
	    <li>Single sign on.</li>

	    <li>Delegation: let this program access those files. </li>

	    <li>Integration: with Kerberos and Unix. </li>

	    <li>User-based trust relationships: If I can use A and B
	    then my program should be able to use both together. </li>
	  </ul>
	</li>

	<li><b>Globus</b> implements the public-key Grid Security
	Infrastructure (GSI) protocols which extend TLS. </li>
      </ul>
    </div><h3 id="resource">4.3 Resource Layer</h3><div class="slidebody">
      <ul>
	<li>There are <dfn>information protocols</dfn> that tells us
	about the state of the resource and <dfn>management
	protocols</dfn> that negotiate access to a resource. </li>

	<li><b>Globus</b> provides: </li>
      </ul>
      <ol>
	<li>The Grid Resource Information Protocol (GRIP), based on <span><a class="remote" href="http://www.wikipedia.org/wiki/LDAP" title="wikipedia:LDAP">LDAP</a> [14]</span>, provides information about
	resources. GRRP is for resource registration.</li>

	<li>The Grid Resource Access and Management (GRAM) protocol is
	HTTP-based and used for allocation of computational resource
	and monitoring and controlling them. </li>

	<li>The GridFTP is an extension of <span><a class="remote" href="http://www.wikipedia.org/wiki/Ftp" title="File Transfer Protocol">FTP</a> [15]</span> allows partial file
	access and management of parallelism for high-speed
	transfers.</li>

	<li>LDAP. </li>
      </ol>
    </div><h3 id="collective">4.4 Collective Layer</h3><div class="slidebody">
      <ul>

	<li>These might include:
	  <ol>
	    <li>Directory services. eg., resource-level GRRP and GRIP protocols.</li>

	    <li>Co-allocation, scheduling, and brokering
	    services. eg., DRM broker. </li>

	    <li>Monitoring and diagnostics services. </li>

	    <li>Data replication services. </li>

	    <li>Grid-enabled programming systems. eg., Grid-enabled
	    implementations of the Message Passing Interface and
	    CORBA.</li>

	    <li>Workload management systems and collaboration
	    frameworks. </li>

	    <li>Software discovery services. </li>

	    <li>Community authorization servers. </li>

	    <li>Community accounting and payment services. </li>

	    <li>Collaboratory services. </li>
	  </ol>
	</li>

      </ul>
    </div><h2 id="gsi">5 Grid Security Infrastructure</h2><div class="slidebody">
      <ul>
	<li>GSI provides: </li>

	<li>Uniform authentication, authorization, and message
	protection mechanisms. </li>

	<li>Single sign-on, delegation, identity mapping. </li>

	<li>Uses public key technology (SSL). </li>

	<li>Uses a supporting infrastructure of: certificate
	authorities, certificate and key management software. </li>
      </ul>
    </div><h2 id="howtoprogram">6 How to Program?</h2><div class="slidebody">
      <ul>
	<li>There is no language or simple library to link to!
	Instead you must obey all the protocols or, at least, the ones
	you need (no security?).</li>

	<li>Program must already be distributed, for example:
	  <ul>
	    <li>Grid-enabled CORBA.</li>

	    <li>MPICH-G2: grid enabled implementation of Message Passing Interface.</li>

	    <li>Condor-G: workflow management. </li>

	    <li>Legion: object models for grid computing. </li>

	    <li>Cactus: grid-aware numerical solver framework. </li>
	  </ul>
	</li>

	<li>Then, use it along with the globus toolkit. </li>
      </ul>
    </div><h2 id="gram">7 Grid Resource Allocation Management</h2><div class="slidebody">
      <ul>
	<li>The GRAM protocol allows programs to be started on remote
	resources, even if different. </li>

	<li>The Resource Specification Language (RSL) communicates
	requirements. RSL is similar to LDAP filters: <pre>
    &amp; (executable=myprog)
          (| (&amp;(count=5)(memory &gt;= 74))
             (&amp;(count=10)(memory &gt;= 32)))</pre>
	  Create 5 instances of myprog on a machine that has at least
	  64MB memory or 10 instances on a machine with at least 32MB.
	</li>

	<li>We can also specify multiple resource needs:<pre>
     + (&amp;(count=5)(memory&gt;=64)
             (executable=p1))
       (&amp;(network=atm)(executable=p2))</pre> Execute 5 instances
       of p1 on a machine with 64MB or more, and execute p2 on a
       machine with an ATM connection. </li>

	<li>To submit the programs you use the commands:
	  <ul>
	    <li>globus-job-run: interactive jobs.</li>

	    <li>globus-job-submit: batch/offline jobs. </li>

	    <li>globusrun: flexible scripting infrastructure. </li>
	  </ul>
	</li>

	<li>GRAM-1 used HTTP-based RPC: request that a job be
	started/stopped. Returns a job contract. </li>

	<li>GRAM-2 uses SOAP. </li>
      </ul>
    </div><h2 id="informationservices">8 Information Services</h2><div class="slidebody">
      <ul>
	<li>The Grid Resource Information Service (GRIS) implements
	<abbr title="Grid Resource Information Protocol">GRIP</abbr>
	and <abbr title="Grid Resource Registration
	Protocol">GRRP</abbr>. </li>

	<li>Use LDAP. </li>

	<li>GRIS tells us: machine load, process information, storage
	space, etc. It also lets us find specific machines: find
	machine with shortest queue. </li>

	<li>You can use the command <pre>
grid-info-host-search [options] filter [attributes]</pre> to find
suitable hosts. </li>
      </ul>
    </div><h2 id="gridftp">9 GridFTP</h2><div class="slidebody">
      <ul>
	<li>A superset of FTP. </li>

	<li>Implements often unused features of FTP: GSS binding, extended
	directory listing, simple restart. </li>

	<li>Extends FTp by adding: stripe/parallel data channels,
	partial file, automatic and manual TCP buffer setting,
	progress monitoring, extended restart. </li>

	<li>Globus also implements custom libraries, clients, and
	servers. All optimized for high-performance (big files
	fast).
	  <ul>
	    <li>gsi-ncftp</li>

	    <li>gsi-wuftpd </li>
	  </ul>
	</li>
      </ul>
    </div><h2 id="conclusion">10 Conclusion</h2><div class="slidebody">
      <ul>
	<li>Globus toolkit is a long way from power grid analogy. Not
	surprising since automatic parallelization is unsolved
	problem.</li>

	<li>Mostly a collection of services: aggregation of existing
	technologies with some extra glue. </li>

	<li>New Open Grid Services Architecture (OGSA) hopes to snarf
	in more Web Services (SOAP, UDDI, WSDL) technologies. </li>

	<li>Joining a Grid requires personal invitation. </li>

	<li>Simple API is possible, but Grid rejects it because of
	need to support legacy software. <em>The Grid could be very
	simple if we could design it from scratch!</em></li>

	<li>Hard to set up, according to Kirk. </li>
      </ul>
    </div><h2>URLs</h2><ol><li>
	    Grid Services for Distributed System Integration, <a href="http://www.globus.org/research/papers/ieee-cs-2.pdf">http://www.globus.org/research/papers/ieee-cs-2.pdf</a></li><li>The
	    Anatomy of the Grid: Enabling Scalable Virtual
	    Organizations, <a href="http://www.globus.org/research/papers/anatomy.pdf">http://www.globus.org/research/papers/anatomy.pdf</a></li><li>The
	    Physiology of the Grid: An Open Grid Services Architecture for
	    Distributed Systems Integration, <a href="http://www.globus.org/research/papers/ogsa.pdf">http://www.globus.org/research/papers/ogsa.pdf</a></li><li>Introduction
	    to Grid Computing and the Globus Toolkit, <a href="http://www.globus.org/training/grids-and-globus-toolkit/index.html">http://www.globus.org/training/grids-and-globus-toolkit/index.html</a></li><li>Grid
	      Computing., <a href="http://jmvidal.cse.sc.edu/library/waldrop02a.pdf">http://jmvidal.cse.sc.edu/library/waldrop02a.pdf</a></li><li>Access
		Grid, <a href="http://www-fp.mcs.anl.gov/fl/accessgrid">http://www-fp.mcs.anl.gov/fl/accessgrid</a></li><li>Information Power
		Grid, <a href="http://www.ipg.nasa.gov">http://www.ipg.nasa.gov</a></li><li>Unicore, <a href="http://www.unicore.de">http://www.unicore.de</a></li><li>TeraGrid, <a href="http://www.teragrid.org">http://www.teragrid.org</a></li><li>Dr. Kirk
	Cameron, <a href="http://www.cse.sc.edu/~kcameron/">http://www.cse.sc.edu/~kcameron/</a></li><li>SCAPE Laboratory, <a href="http://scape.cse.sc.edu/">http://scape.cse.sc.edu/</a></li><li>USC joins
	Grid Announcement, <a href="http://uscnews.sc.edu/engr211.html">http://uscnews.sc.edu/engr211.html</a></li><li>Globus Toolkit, <a href="http://www.globus.org">http://www.globus.org</a></li><li>wikipedia:LDAP, <a href="http://www.wikipedia.org/wiki/LDAP">http://www.wikipedia.org/wiki/LDAP</a></li><li>File Transfer Protocol, <a href="http://www.wikipedia.org/wiki/Ftp">http://www.wikipedia.org/wiki/Ftp</a></li></ol><hr class="bottom"/>
<p class="author">This talk available at <a href="http://jmvidal.cse.sc.edu/talks/gridcomputing">http://jmvidal.cse.sc.edu/talks/gridcomputing/</a><br />
Copyright &copy; 2004 <a href="../../index.html">Jos&eacute; M. Vidal</a>
<a href=" http://validator.w3.org/check?uri=http://jmvidal.cse.sc.edu/talks/gridcomputing/allslides.xml">.</a>
 All rights reserved.</p>
<p class="pagenumber">02 March 2004, 12:22PM</p>
</body>
</html>