ARSC HPC Users' Newsletter 216, March 23, 2001

SV1e Upgrade

During facility downtime on Mar 31 - Apr 1, 2001, the processors on chilkoot will be upgraded to Cray's new SV1e processors.

Chilkoot will be the first SV1e in production

The SV1e will have a faster clock and enhancements to cache and scalar processing.

ARSC users, note the extended downtime, the weekend of Mar 31-Apr 1: see "news downtime" and "news PE3.5" for more details and scheduling.

We encourage you to test your codes under PE3.5 and UNICOS 10.0.1.0 (which is to be installed, Monday) as soon as possible. This will help verify this software upgrade, prior to the processor upgrade.

As noted in "news PE3.5", to use PE3.5 effectively, you MUST set the environment variable, TARGET , to cray-sv1 .

UAF Colloquium Series: Burton Smith, March 29

The UAF Department of Mathematical Sciences and ARSC are jointly sponsoring a Mathematical Modeling, Computational Science, and Supercomputing Colloquium Series. The schedule and abstracts for the '00-'01 academic year are available at:

http://www.dms.uaf.edu/dms/Colloquium.html

Next presentation:

How Shall We Program High Performance Computers? Dr. Burton Smith Chief Scientist Cray Incorporated

Date: Thursday, March 29, 2001 Time: 1:00-2:00 PM Location: Butrovich 109

ABSTRACT:

Uniprocessor computer architecture has traditionally been motivated by programming languages and operating systems, with benchmarks written in the usual languages also having some influence. In high performance computing the situation is curiously reversed, with architecture determining the principal characteristics of programming languages, operating systems, and benchmarks. The result has been chaos; a "software crisis" has been declared, and better tools for the development of parallel software have been demanded. The outlook for good tools is bleak without a new approach to the problem, which should include the engineering of computer systems with both system and application software in mind and the development of programming abstractions that are both effective and efficient on hardware we can build.

THE SPEAKER:

Burton Smith is Chief Scientist of Cray Inc. He received the BSEE from the University of New Mexico in 1967 and the Sc.D. from MIT in 1972. From 1985 to 1988 he was Fellow at the Supercomputing Research Center of the Institute for Defense Analyses in Maryland. Before that, he was Vice President, Research and Development at Denelcor, Inc. and was chief architect of the HEP computer system. Dr. Smith is a Fellow of both the ACM and the IEEE, and winner of the IEEE-ACM Eckert-Mauchly award in 1991. His main interest is general purpose parallel computer architecture.

Parallel Programming Course, Next Wednesday

ARSC Training: Parallel Computing Concepts Wednesday, March 28, 2001, 2-4pm

In this course, Jeff McAllister will introduce parallel computing concepts, and message passing algorithms (using MPI) for new and existing codes.

For details and registration:

http://www.arsc.edu/user/Classes.html

Using "segldr" to Specify Cray-Optimized Routines

While working with a user code on the SV1, we discovered, by scanning the loader's cautionary messages, that it contained explicit code for several standard scientific subroutines that are also available in Cray's optimized libraries. The subroutines may have been copied from Numerical Recipes, and presumably their inclusion guarantees portability.

The reason people use Crays, however, is performance.

Once you've ported your code, and it's running correctly, you can worry about performance. An important step is to realize that generically coded routines just don't beat tuned vendor libraries. We've mentioned this several times in the past. For example, in T3D Newsletter issue #96 , appeared the article

18:1 Speedup Demonstrated: Free for Using System Libraries

Here's a result from that article, where the times are in seconds for performing an identical matrix multiply on the T3D:


  
  
  
 Timings for the code are as follows:
  
  
  
   Time for naive matrix multiplication:    46.09
  
   Time for structured matrix multiplication:  34.33
  
   Time for BLAS1 matrix multiplication:    11.18
  
   Time for BLAS3 SGEMM matrix multiplication:  2.52
  
  

Given a large code, the practical questions become:

  1. How can I determine if subroutines are duplicated?
  2. If so, how can I tell the loader to use the vendor versions?

The Cray loader, "segldr," cautions you if a module which appears in one of your object, or ".o" files, is duplicated in either another object file or in a library (".a" file). Thus, you simply need to review the the warnings already issued by the loader.

For a manageable test program, I've used a Fortran77 version of the linpacks benchmark, with the array size increased to 400x400. The source is a single file, and it includes hand-written versions of several common matrix/vector operations.

Here's the compile command:

CHILKOOT$ f90 -o linpacks400 linpacks400.f

And the warnings from segldr:


 ldr-290 f90: CAUTION 
     Duplicate entry point 'SGEFA' was encountered.
     Entry in module 'SGEFA' from file 'linpacks400.o' has been used.
     Entry in module 'SGEFA' from file '/opt/ctl/craylibs/3.5.0.1/libsci.a' has
     been ignored.
 ldr-290 f90: CAUTION 
     Duplicate entry point 'SAXPY' was encountered.
     Entry in module 'SAXPY' from file 'linpacks400.o' has been used.
     Entry in module 'SAXPY' from file '/opt/ctl/craylibs/3.5.0.1/libsci.a' has
     been ignored.
 ldr-290 f90: CAUTION 
     Duplicate entry point 'SDOT' was encountered.
     Entry in module 'SDOT' from file 'linpacks400.o' has been used.
     Entry in module 'SDOT' from file '/opt/ctl/craylibs/3.5.0.1/libsci.a' has
     been ignored.

  [[ ...4 similar warnings cut... ]]

To reload with the libsci version, you need to issue directives to segldr. The cleanest method I've found is to create a directives file. It takes two directives, LIB and MODULES.

"LIB=<library file>" seems to be required, even if "-l" is used on the command line. It tells segldr what libraries to load.

"MODULES=<module name>:<library file>" tells segldr to load <module name> from <library file> , even if it encounters duplicate copies of <module name> .

Here's the file of directives I created for the linpacks test:

File: segldr.dir

LIB=/opt/ctl/craylibs/3.5.0.1/libsci.a
MODULES=SDOT:/opt/ctl/craylibs/3.5.0.1/libsci.a
MODULES=SGEFA:/opt/ctl/craylibs/3.5.0.1/libsci.a
MODULES=SGESL:/opt/ctl/craylibs/3.5.0.1/libsci.a
MODULES=SMXPY:/opt/ctl/craylibs/3.5.0.1/libsci.a
MODULES=SSCAL:/opt/ctl/craylibs/3.5.0.1/libsci.a
MODULES=SAXPY:/opt/ctl/craylibs/3.5.0.1/libsci.a
LIB=/opt/ctl/craylibs/3.5.0.1/libu.a
MODULES=ISAMAX:/opt/ctl/craylibs/3.5.0.1/libu.a

Having said this is "clean", I realize it's pretty ugly. When processing the files specified in the MODULE directives, segldr doesn't honor search paths. Thus, complete paths are required.

To use the segldr directive file, you must pass the option, "-i <directives file name>" to segldr. Here's one approach, showing how to invoke segldr directly:


  $ f90 -c linpacks400.f
  $ segldr -i segldr.dir linpacks400.o -o linpacks400.libsci

Here's another approach, which uses f90's "-Wl..." option to pass options through to segldr:


  $ f90 -o linpacks400.libsci linpacks400.f -Wl"-i segldr.dir"

Either approach will work, and you'd probably chose that which introduces the fewest changes into an existing makefile.

When executed, segldr now issues different warnings:


 ldr-162 f90: CAUTION 
     The loader has ignored duplicate module 'SGEFA' from file
     'linpacks400.o'.
 ldr-162 f90: CAUTION 
     The loader has ignored duplicate module 'SGESL' from file
     'linpacks400.o'.
 ldr-162 f90: CAUTION 
     The loader has ignored duplicate module 'SAXPY' from file
     'linpacks400.o'.
 ldr-162 f90: CAUTION 
     The loader has ignored duplicate module 'SDOT' from file 
     'linpacks400.o'.
 ldr-162 f90: CAUTION 
     The loader has ignored duplicate module 'SSCAL' from file
     'linpacks400.o'.
 ldr-162 f90: CAUTION 
     The loader has ignored duplicate module 'ISAMAX' from file
     'linpacks400.o'.
 ldr-162 f90: CAUTION 
     The loader has ignored duplicate module 'SMXPY' from file
     'linpacks400.o'.

These warnings tell us all we need for this simple program: the hand-coded routines from linpacks400.f have been ignored, which implies they were loaded, as we specified explicitly in the directives, from libsci.

If you want more information, and to blow your mind, tell segldr to dump a load map. This option, "-M loadmap.out,epxrf" , will dump entry point cross references to the file "loadmap.out" , which can be browsed or grepped for specific modules, to find out from whence they came.

---

So... did it speed things up to substitute the cray libraries? Quite a bit. Here's the hpm output for both versions:

Hand-coded matrix/vector routines (linked without segldr directives):

  Group 0:  CPU seconds   : 8.34398      CP executing     :  2503194174
  
  Million inst/sec (MIPS) :   66.21      Instructions     :   552483791
  Avg. clock periods/inst :    4.53
  % CP holding issue      :   70.92      CP holding issue :  1775227364
  Inst.buffer fetches/sec :    0.02M     Inst.buf. fetches:      166592
  Floating adds/sec       :   69.10M     F.P. adds        :   576527734
  Floating multiplies/sec :   67.73M     F.P. multiplies  :   565107088
  Floating reciprocal/sec :    0.00M     F.P. reciprocals :       20792
  Cache hits/sec          :   83.61M     Cache hits       :   697681792
  CPU mem. references/sec :  217.35M     CPU references   :  1813602060
  
  Floating ops/CPU second :  136.82M
Cray libsci routines (with segldr directives to use libsci):

  Group 0:  CPU seconds   : 3.48794      CP executing     :  1046381505
  
  Million inst/sec (MIPS) :   65.88      Instructions     :   229774910
  Avg. clock periods/inst :    4.55
  % CP holding issue      :   71.06      CP holding issue :   743524588
  Inst.buffer fetches/sec :    0.03M     Inst.buf. fetches:      114008
  Floating adds/sec       :  165.95M     F.P. adds        :   578839265
  Floating multiplies/sec :  161.54M     F.P. multiplies  :   563429095
  Floating reciprocal/sec :    0.01M     F.P. reciprocals :       30984
  Cache hits/sec          :   10.44M     Cache hits       :    36414633
  CPU mem. references/sec :   99.10M     CPU references   :   345654132
  
  Floating ops/CPU second :  327.50M
 

More information on segldr is available in "man segldr" and in Cray's on-line documentation:

http://www.arsc.edu:40/

Quick-Tip Q & A


A:
[[ What's up with this?
  [[ 
  [[ chilkoot% rm ldat.199801.*
  [[ Arguments too long.
  [[ chilkoot% 
  [[ chilkoot% ls ldat.199801.*
  [[ Arguments too long.
Thanks to Terry Jones:
  =====================================================

  Check to see how many files with names beginning with ldat.199801:
  (do an "ls 
 grep ldat.199801 
 wc -l" or something equivalent).

  If it is a very large number of files, the problem is likely to be
  that with the wildcard (*), the list of files that this expression
  matches is simply too large.  I cannot say how large is too large,
  but the shell will expand the wildcard and generate a list of files.

  The rm and ls commands will get called with this list attached to the
  end of the command line (e.g., "ls file.*"  might expand to "ls file.1
  file.2 file.3", and so on).  If the list is longer than the shell can
  handle (I am sure there is a max_list parameter in there somewhere!),
  then the error message above is what is displayed.   I've had this
  happen before, when manipulating large quantities of files all named the
  same, located in the same directory.


  
Thanks to Richard Griswold:
  =======================================================

  When ldat.199801.* is expanded, there are too many characters for the
  shell to handle.  Instead you can do something like this and avoid the
  limitations of the shell:

    \ls 
 grep "^ldat\.199801\..*"

  Since I alias ls to "ls -al", I use a backslash in front of the ls command
  to bypass the alias.  This will get the default behavior instead of the
  aliased behavior.

  This command will get you a list of the files.  If you want to delete
  them, simply pipe the output to "xargs rm":

    \ls 
 grep "^ldat\.199801\..*" 
 xargs rm

  If you want a long listing, you can pipe the output to "ls -l":
    \ls 
 grep "^ldat\.199801\..*" 
 xargs ls -l

  You can even rename the output:

    \ls 
 grep "^ldat\.199801\..*" 
 sed "s/\(.*\)/mv '\1' 'myfiles.\1'/" \
    
 (sh)

  Of course, you'll need some additional code inside the sed script to
  handle single quotes in your file names.




Q: Is there a way to peek at my NQS job's stdout and stderr files 
   (.o and .e files), while the job is still running?  I'm in debug
   mode, here, and wasting a lot of CPU time because these jobs must 
   run to completion before I can see ANY output.

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top