ARSC T3D Users' Newsletter 46, August 4, 1995

The 1.2.2 Release of the Programming Environment

At ARSC, we now have the 1.2.2 PE and we are testing and timing it. One of the timing tests brought out a problem that users of the T3D should be aware of. One of the single PE tests we run is an old EISPACK regression test, where the eigenvalue routines are tested with matrices that are read in from a file. Here is a comparison of the times (in seconds) for different eigenvalue routines on the 1.2.1 and the 1.2.2 release:

  (    new results(1.2.2 PE) >  )
  (  < old results(1.2.1 PE)    )
  <    dcg passed,time =    0.701
  <    dch passed,time =    0.202
  <   drbl passed,time =    0.248
  <    drg passed,time =    0.884
  <   drgg passed,time =    0.411
  <    drl passed,time =    0.173
  <    drs passed,time =    0.377
  <   drsb passed,time =    1.011
  <   drsg passed,time =    0.669
  >    dcg passed,time =   29.501
  >    dch passed,time =   17.045
  >   drbl passed,time =    0.239
  >    drg passed,time =   18.506
  >   drgg passed,time =   35.593
  >    drl passed,time =   16.945
  >    drs passed,time =   17.205
  >   drsb passed,time =    0.961
  >   drsg passed,time =   14.822
  < drsgba passed,time =    0.640
  <   drsp passed,time =    0.371
  <   drst passed,time =    0.124
  <    drt passed,time =    0.168
  > drsgba passed,time =    0.660
  >   drsp passed,time =    0.362
  >   drst passed,time =    0.145
  >    drt passed,time =    0.169
Why are the new results so much slower that the old results? I couldn't believe the new Fortran compiler had slowed down so much! So I ran the tests again and got this diff between 1.2.1 PE and the 1.2.2 PE:


  <    dcg passed,time =    0.701
  <    dch passed,time =    0.202
  <   drbl passed,time =    0.248
  <    drg passed,time =    0.884
  <   drgg passed,time =    0.411
  <    drl passed,time =    0.173
  <    drs passed,time =    0.377
  <   drsb passed,time =    1.011
  <   drsg passed,time =    0.669
  < drsgab passed,time =    0.640
  < drsgba passed,time =    0.640
  <   drsp passed,time =    0.371
  <   drst passed,time =    0.124
  >    dcg passed,time =    0.702
  >    dch passed,time =    0.227
  >   drbl passed,time =    0.301
  >    drg passed,time =    0.903
  >   drgg passed,time =    0.451
  >    drl passed,time =    0.155
  >    drs passed,time =    0.403
  >   drsb passed,time =    1.008
  >   drsg passed,time =    0.702
  > drsgab passed,time =    0.665
  > drsgba passed,time =    0.662
  >   drsp passed,time =    0.357
  >   drst passed,time =    0.145
This seemed the more reasonable result that nothing much has happened. The problem was that the timing program looked something like:

         program main

         t1 = second()
            call dcg()
         t2 = second()
            call dch()
         t3 = second()

         subroutine dcg()

         open( 10, FILE="FILE33", ... )

  c compute ...

The reason for the disparity in the first and second runs on the same PE with the same PE was that the input files to generate the matrices had been migrated off the user disk and it took awhile for them to be restored. The time for the Y-MP agent to restore the files was counted as wall clock time on the T3D. On the T3D, we usually have that:

  wall clock time = cpu time
because there is no multiprogramming on the T3D. So the problem can be "solved" by running the problem twice or being more careful about what you are timing.

The AC Compiler at ARSC

For the brave and resourceful, I have built and installed the AC compiler on denali. The introduction of the AC compiler was described at the Spring CUG and was summarized in T3D Newsletter #28 (03/24/95):

  >  AC and the CRAY T3D, Jesse Draper and Bill Carlson, SRC and
  >  IDA. This was a description of the port of the GNU C compiler,
  >  gcc, to the T3D. The performance was in some cases 3 times
  >  better than the CRI C compiler and the compiler has a single
  >  extension 'dist' (for distributive) which allows shared arrays
  >  in C. In one application the AC compiler produced an
  >  executable 3 times faster than the CRI C compiler. I am trying
  >  to get a copy of this compiler for use at ARSC. I have a copy
  >  of the report and the slides.
I'll have more details in future newsletters but the minimal instructions for using this compiler are:
  1. Read the documentation supplied by Bill Carlson on denali in:
      /usr/local/examples/mpp/AC/README.install (I have done this)
    I will mail copies of the slides and the report mentioned above to those who request it.
  2. Add to your search path /usr/local/examples/mpp/bin ahead of /bin
  3. Change your makefile to look something like:
      CC = "/usr/local/examples/mpp/bin/ac"
      CLD= $(CC)
      CFLAGS= "-I/usr/local/examples/mpp/AC/include -O"
      CLDFLAGS= -L/usr/local/examples/mpp/AC
      OBJS = main.o second.o
              $(CC) -c $(CFLAGS) $<
      a.out:         $(OBJS)
              $(CLD) $(CLDFLAGS) $(OBJS) 
This is a start, but I found the AC compiler to be much different than /mpp/bin/cc and it took some effort to get what ran under the CRI product to compile and run with the AC compile. I found the speed to be better than the CRI compiler but never by a factor of 3 for my single PE timings tests. More details in future newsletters.

As a sample of the problems I had, the declaration:

  float dex[ 10 ];
works as a local declaration but aborts the AC compiler as a global declaration.

Accessing the LPAC HPC Articles Archive

From Roland Piquepaille of CRI-France, I got the following hint on speeding up access to the London Parallel Applications Centre High Performance Article Archive:

  > To avoid your readers wasting some time browsing at the
  > London Parallel Application Centre, the full URL of the
  > article database (with the search box) is:
  > Roland.

Installation of a New Version of MPICH

As of July 31, 1995, I have installed the entire 1.0.10 version of MPICH in the directory:

It is much more code than was in the preliminary version "Alpha Version 0.1a" that was announced in Newsletter #34 (05/05/95). Using the installation provided, the libraries are now in:

where previously they were in:

I moved all of the old version of MPICH to:

I encourage users to try it and possibly compare it to the Edinburgh/CRI version of MPI which was described in Newsletters #39 (6/9/95) and #41 (6/23/95). If you have any problems using this new version please contact Mike Ess.

List of Differences Between T3D and Y-MP

The current list of differences between the T3D and the Y-MP is:
  1. Data type sizes are not the same (Newsletter #5)
  2. Uninitialized variables are different (Newsletter #6)
  3. The effect of the -a static compiler switch (Newsletter #7)
  4. There is no GETENV on the T3D (Newsletter #8)
  5. Missing routine SMACH on T3D (Newsletter #9)
  6. Different Arithmetics (Newsletter #9)
  7. Different clock granularities for gettimeofday (Newsletter #11)
  8. Restrictions on record length for direct I/O files (Newsletter #19)
  9. Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
  10. Missing Linpack and Eispack routines in libsci (Newsletter #25)
  11. F90 manual for Y-MP, no manual for T3D (Newsletter #31)
  12. RANF() and its manpage differ between machines (Newsletter #37)
  13. CRAY2IEG is available only on the Y-MP (Newsletter #40)
  14. Missing sort routines on the T3D (Newsletter #41)
I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top