ARSC T3D Users' Newsletter 26, March 10, 1995

ARSC T3D Future Upgrades

We are testing the upgrade to the T3D 1.2 Programming Environment (libraries, tools and compilers.) If all goes well it will be on the system in two weeks.

We are also planning to install CF90 and C++ for the T3D. This will come after the upgrade to the 1.2 P.E. I am interested in hearing from users who want to use the CF90 and C++ products as soon as they are available.

More Performance on Linpack from the Loader

Last week's timings on Linpack and Lapack brought this response from Ed Anderson of CRI:

  > Mike,
  >    I'm glad you were satisfied with the performance of LAPACK on
  > the modified LINPACK benchmark with the PE 1.1 libraries on the T3D.
  > I have two comments:
  > 
  >    First, you can improve your speed when using libsci by turning
  > read-ahead on for the entire program.  Libsci BLAS routines enable
  > read-ahead for larger problems, but then have to flush the cache on exit
  > to avoid problems with cache coherence.  By turning read-ahead on at
  > link time, you get the benefits of read-ahead for smaller problem sizes,
  > and the benefits of cache reuse for the larger problem sizes, since the
  > libsci routines don't need to flush the cache.  The option to use is
  > 
  >     mppldr -D "rdahead=on"
  > 
  > This is one of the examples in the chapter I wrote for the T3D
  > optimization guide.  I found an improvement in the LINPACK 100 benchmark
  > program (with libsci BLAS 1) of 50%.
  >    Second, the PE 1.2 library includes an improved version of SGEMM,
  > among other things.  You should see an improvement in your "LAPACK
  > benchmark" with the new library.
  > 
  >    --Ed Anderson
Using Ed Anderson's suggestion I reran my timing routines with the -D rdahead=on switch on the loader, mppldr. Here are the results:

  Results (in MFLOPS) for the Linpack problem on various sizes

  Matrix   Linpack Source  Linpack Source    Same problem solved
  size                    with libsci blas1  with libsci's LAPACK

      default rdahead=on  default rdahead=on  default rdahead=on

     1      .19      .20      .13      .14      .04      .04
     2      .45      .45      .30      .30      .15      .15
     3      .90      .91      .30      .63      .30      .35
     4     1.50     1.52     1.04     1.06      .60      .64
     5     2.21     2.20     1.28     1.33      .90      .94
    10     5.04     5.11     2.92     3.02     2.55     2.64
    20     8.72     8.90     6.07     6.33     5.70     5.86
    40    11.04    11.35    10.72    11.32    13.48    14.00
    50    10.93    11.81    12.29    13.37    15.95    16.86
    60    11.10    12.15    13.49    15.27    20.13    20.86
    70    11.21    12.36    13.38    16.47    21.12    21.88
    80    11.43    12.67    13.31    17.92    24.89    25.61
    90    11.55    12.84    13.46    19.01    25.15    25.88
   100    11.64    12.99    13.75    19.97    27.73    28.69
   200    12.08    13.46    17.30    25.24    36.82    37.79
   300    12.03    13.28    19.66    27.19    41.51    42.26
   400    11.84    12.92    21.20    28.02    44.12    44.72
   500    11.60    12.51    22.26    28.32    45.40    45.87
   600    11.36    12.12    23.03    28.39    46.73    47.11
   700    11.21    11.89    23.61    28.46    47.80    48.11
   800    11.14    11.75    24.14    28.60    48.57    48.83
   900    10.97    11.51    24.40    28.45    48.90    49.11
  1000    10.83    11.30    24.69    28.35    49.47    49.64
  1500    10.15    10.32    25.62    27.49    51.04    51.10
  2000     9.83     9.88    26.18    27.12    51.87    51.88
  2500     9.70     9.70    24.61    26.99    52.30    52.30
  2600     9.70     9.69    26.49    27.02    52.42    52.41
Sure enough there is a big improvement in the programs that use the libsci blas1 routines. The LAPACK versions don't improve much because LAPACK is written to optimize cache use. Using the Fortran versions of the BLAS is a loser no matter what. We'll get the new 1.2 PE timings in the next few weeks.

Future Newsletters

Next week I will be at the CUG meeting in Denver and I plan to distribute what I learn about the T3D when I get back. I am still collecting the FMlib timings and I'll have those in another two weeks.

List of Differences Between T3D and Y-MP

The current list of differences between the T3D and the Y-MP is:
  1. Data type sizes are not the same (Newsletter #5)
  2. Uninitialized variables are different (Newsletter #6)
  3. The effect of the -a static compiler switch (Newsletter #7)
  4. There is no GETENV on the T3D (Newsletter #8)
  5. Missing routine SMACH on T3D (Newsletter #9)
  6. Different Arithmetics (Newsletter #9)
  7. Different clock granularities for gettimeofday (Newsletter #11)
  8. Restrictions on record length for direct I/O files (Newsletter #19)
  9. Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
  10. Missing Linpack and Eispack routines in libsci (Newsletter #25)
I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top