ARSC HPC Users' Newsletter 259, December 6, 2002

ARSC Storage Upgrade

ARSC will soon be transitioning from maintaining permanent storage on the Crays to twin Sun storage servers. One of the Suns will be dedicated to serving DoD-sensitive data to the HPCMP resources (machines requiring NACs) and the other will serve ARSC-sensitive data to systems not requiring NACs.

Watch for more announcements and details soon.

Batch Scripting Essentials 1 : Job Chaining

[Thanks to Jeff McAllister of ARSC]

The motivation for this article and several following is the upcoming storage upgrade at ARSC. This will move us from a relatively simple mode of getting work done in the direction the rest of the world is going -- where everyone has to be storage aware. These concepts are part of supercomputing everywhere.

With storage "abstracted" away from individual systems (i.e., where data is stored is not where results are computed), getting work done involves at least two extra steps:

  1. copy required files (executables, inputs) from long-term storage to workspace local to the machine
  2. run the job
  3. copy outputs to long-term storage

Thus any job runnable with one script should become a multi-step job if transfer times are more than a few seconds. Of course any work that can be set up as steps, such as jobs which write restart files so the next step can pick up where the last left off, could benefit from this type of automation.

The easiest, most portable way to do multistep work is by chaining jobs together. "Chaining" occurs when

Step A executes step B Step B executes step C

and so on. (For a detailed discussion of chaining on the T3E, see issue #176: /arsc/support/news/t3enews/t3enews176/index.xml ). Many cases of runaway recursive submissions have shown that it's best to create scripts which explicitly submit the next step in the chain.

I've written a batch queueing script generator, "scriptgen," to facilitate the creation of chaining scripts. Since it's a lot of Perl and will be updated when the ARSC storage upgrade comes on line, the code isn't included in this newsletter, but can be copied from:

http://www.arsc.edu/~mcallist/Examples/scriptgen

scriptgen detects which system you are running it from (currently chilkoot, yukon, icehawk, iceflyer) and creates an output file with the minimum instructions necessary for a batch script on that system. You may use it as a starting point, or as is.

Required/optional parameters:


  iceflyer 8% ./scriptgen
  usage: scriptgen stepfile ntasks extime executable [nextstep] [msize]
  
  stepfile        the name of the batch script the scriptgen will generate
  ntasks          number of processors (number of nodes on icehawk)
  extime          execution time limit, can be any time format acceptable 
                  on target system (i.e. '1800' or '00:30:00')
  executable      program to run (./a.out)
  nextstep        if included, script will add system-specific command to
                  submit a script by this name -- if not included 
                  execution ends when stepfile finishes
  msize           SV1 only -- memory allocation required

Some examples:

This creates the UNICOS/mk qsub batch script "stepA." This will run on two processors with a time limit of 3 minutes and execute ./a.out. "stepA" submits the follow-on script "stepB," and ensures (via qalter) that other user's jobs will get a chance to start between the steps:


  yukon 45% ./scriptgen stepA 2 180 ./a.out stepB
  yukon 46% cat stepA
  
  #!/bin/ksh
  #QSUB -q mpp
  #QSUB -l mpp_p=2
  #QSUB -l mpp_t=180
  #QSUB -o stepA.out
  #QSUB -e stepA.err
  
  cd $QSUB_WORKDIR
  mpprun -n 2 ./a.out
  
  qalter -l mpp_p=0
  qsub stepA.tmp
  sleep 20
  qsub stepB

The following creates the sequence of scripts, "stepA" - "stepD" on chilkoot, where scriptgen requires an extra parameter for memory size. It then submits "stepA," and thus the entire chain, for execution.


  chilkoot 253% ./scriptgen stepA 6 1800 ./a.out stepB 16MW
  chilkoot 254% ./scriptgen stepB 6 1800 ./a.out stepC 16MW
  chilkoot 255% ./scriptgen stepC 6 1800 ./a.out stepD 16MW
  chilkoot 256% ./scriptgen stepD 6 1800 ./a.out "" 16MW
  chilkoot 257% qsub stepA

Chaining can also be helpful when you're getting ready to leave on vacation and don't want to leave pages and pages of jobs queued.

--

In the next article I'll expand more on the concepts of storage abstraction and file staging as well as make scriptgen a little more sophisticated to help automate this.

Many thanks to Harper Simmons for providing the original icehawk version of scriptgen.

SC2002 Review

We like to print some observations about relevant conferences, for the benefit of the those who didn't get to go. SC2002 was held in Baltimore last month, and we have these notes from Guy Robinson and Jeff McAllister of ARSC:

Guy Robinson:

As usual, SC was a hectic time for all involved. What were the key points which attracted my attention?

  • The Earth Simulator was a great topic of conversation. Many papers and two panels spent a lot of time talking about why it was a success and what lessons could be learnt from the project. The statement that "with this system we get to concentrate on the science" was made by several users.

  • The GRID is coming of age. People are coming up with projects and success stories. There was a single day seminar on the Monday, Grid Computing-GRID2002 which can be found online at:

    http://link.springer.de/series/lncs/

  • "What makes a supercomputer" is always an interesting questions to address at Supercomputing. Is it fast processors, fast interconnects, an easy programming language, the necessary skills, plenty of storage? I think the answer really is a mixture of all these. The variety of technologies on show demonstrates how exciting and dynamic our field is. All papers from SC are online, and clearly demonstrate this variety:

    http://www.sc2002.org/

Jeff McAllister:

SC2002 was amazing. There were so many cool things it's hard to know where to start. Being able to wander the booths and talk to people during the tutorials, BOFs, and panels allowed a much richer level of synergy and serendipity than I could have ever found just reading or attending a meeting on a more focused issue.

Some highlights:

  • Hardware always seems to dominate anything related to supercomputing, but software is at least getting some interest. The common component architecture ( http://www.cca-forum.org ) is an intriguing approach to supporting complex application development, though I think it needs more work before it can take off and reach a the critical mass required for many developers to supply and use this style of reusable components.

    I hope to do more with PAPI (performance API, http://icl.cs.utk.edu/papi/ ) and TAU (tuning and analysis utilities, http://www.cs.uoregon.edu/research/paracomp/tau ) -- these both seem to be maturing towards the goal of providing a standardized, meaningful way to compare performance.

  • Clusters are getting easier to manage. Many of the initial time-consuming ramifications of working with lots of nodes not specially designed to work together have been dealt with by the large community working with these problems. (Some say that clusters have actually become easier to set up and customize than designer systems!) OSCAR and Scyld seemed the most popular open-source approaches. Unlimited Scale, partnered with HP, has many ex-Cray employees including big names from the T3E days -- their goal is to produce the software infrastructure to make a T3E-like system out of commodity parts where the hardware can be upgraded every few years to keep up with Moore's law at a minimum cost while still retaining high usability and maintainability.

  • Myrinet and Quadrics together provide the interconnects for what is approaching 40% of the top 500 list. Three of the top five use interconnects which while not cheap could be installed in any cluster. Infiniband still needs some work, but promises to be a lower cost, higher bandwidth, and lower latency solution. Ohio Supercomputer Center showed off some promising infiniband benchmarks (approaching 1GB/sec MPI bandwidth). Since floating point performance and memory sizes are already quite comparable between commodity and designer systems, interconnects and latencies are just about the only thing left differentiating proprietary systems from those anyone could put together from essentially "off the shelf" components. (Off the shelf meaning purchased separately, not necessarily inexpensive.) While there was a palpable level of FUD emanating from some vendors and panels, personally I don't see these developments as anything but good for supercomputing as a whole.

ARSC Web Site Changes

ARSC's web site has gotten a new "look and feel."

The Newsletter archive, which includes all T3D, T3E, and HPC newsletters, dating from 1994; a handy index to all Quick-Tips; and a search function, has moved...

So... in case you have us bookmarked, here's our new address:

http://www.arsc.edu/support/news/HPCnews.shtml

Quick-Tip Q & A


A:[[ Arrrgggghhhhh!!!!  Over quota again!
  [[
  [[ Is there some easy way to locate the largest files in my account, so
  [[ I can figure out what to delete?


  # Thanks to Richard Griswold:
  
  This ought to do the trick:
  
    find ~ -type f -size +1024k -exec du -k {} \;  
 sort -nr
  
  This will find all files over 1MB.  You can change the size value to
  expand or limit the number of files returned, or you can eliminate it
  all together.  



Q: I use a Fortran-callable C function, for example, something like this:

        void Simple (int *i) {
          (*i)++;
        }

   But Fortran is a case-insensitive language and while some Fortran
   compilers (SGI, SUN, NEC, IBM) shift everything to lower case, others
   (CRAY) shift to upper case. Some (SGI, SUN, NEC) also append an
   underscore to function names. Thus, attempting to link a Fortran
   program which has a "call Simple (I)" to the above would give an
   "unsatisfied external" on any system.

   If I change the name of the C function to "simple_", it links
   successfully on an SGI, SUN, or NEC, but I'd rather make the rotten
   thing portable!  Suggestions?

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top