ARSC T3D Users' Newsletter 37, May 26, 1995

New T3D Batch PE Limits

In the past week all active users of the ARSC T3D had their batch PE limit increased to 128. This allows these users access to the 128-PE 8-hour queues that run on the weekends. If you need your T3D UDB limits changed please contact Mike Ess.

New Fortran Compiler

An upgrade version of the cf77 compiler is available on Denali with the path:

  /mpp/bin/cft77new and /mpp/bin/cf77new
For the default versions we have:

  /mpp/bin/cf77 -V
  Cray CF77_M   Version 6.0.4.1 (6.59)   05/25/95 13:36:39
  Cray GPP_M    Version 6.0.4.1 (6.16)   05/25/95 13:36:39
  Cray CFT77_M  Version 6.2.0.4 (227918) 05/25/95 13:36:39
and for this new version:

  /mpp/bin/cf77new -V
  Cray CF77_M   Version 6.0.4.1 (6.59)   05/25/95 13:37:26
  Cray GPP_M    Version 6.0.4.1 (6.16)   05/25/95 13:37:26
  Cray CFT77_M  Version 6.2.0.9 (259228) 05/25/95 13:37:27
This new compiler fixes a potential race condition in shared memory accesses and also fixes an inlining problem with the F90 intrinsics, MINLOC and MAXLOC.

This compiler will become the default after we finish testing it and users will be notified before that happens. I encourage users to try this compiler before it becomes the default.

Random Number Generation on the T3D and Y-MP

In newsletter #29 (3/31/95), I announced the availability of benchlib on the ARSC T3D. The sources for these libraries are available on the ARSC ftp server in the file:

  pub/submissions/libbnch.tar.Z
The compiled libraries are also available on Denali in

  /usr/local/examples/mpp/lib/lib_32.a
  /usr/local/examples/mpp/lib/lib_scalar.a
  /usr/local/examples/mpp/lib/lib_util.a
  /usr/local/examples/mpp/lib/lib_random.a  
  /usr/local/examples/mpp/lib/lib_tri.a 
  /usr/local/examples/mpp/lib/lib_vect.a
and the sources are available in: /usr/local/examples/mpp/src. In previous newsletters, I've described the contents of some of the libraries:

  #30 (4/7/95)  - the "pref" routine of lib_util.a
  #33 (4/28/95) - the fast scalar math routines in lib_scalar.a
  #34 (5/05/95) - the fast vector math routines in lib_vector.a
  #35 (5/12/95) - the tridiagonal solvers in lib_tri.a
In this newsletter, I will describe the routines in lib_random.a and compare them to the other random number generators on the T3D and Y-MP. This is the last library from benchlib. I welcome any user comment or experience with these libraries and I will pass it on to readers of the ARSC T3D newsletter.

Random Number Generators

Of course, a 'random' generator doesn't actually produce random numbers but a sequence of pseudorandom numbers that have characteristics of a sequence of random numbers. These sequences are necessarily reproducible so that computer experiments can be run over and over. As in most areas of computing, there is always of tradeoff between speed and quality and so it is with these pseudorandom number generators (RNG). The easiest to measure is their speed and that is what is presented here. (Analyzing the quality of their random sequences is left for some Ph.D. thesis.)

On the T3D there are 3 available random number generators:


  rand:   rand() is supplied with most implementations of C in
          libc.a. It usually produces a 16 bit integer, that
          can be converted to a double in the range 0.0 to 1.0,
          i.e.:

              random_real = rand() / (double)RAND_MAX;

          where RAND_MAX is defined in <stdlib.h>. Because
          only 16 bits can change from call to call it's
          usually not considered "random" enough. But its
          implementation is the same on probably all machines.
          It is the same on both the Y-MP and the T3D. There
          is a man page on Denali that describes rand(). (The
          division to obtain a random real number is not the
          same on each machine.)
  ranf:   RANF is the random number generator on the Y-MP. It
          exists in both scalar and vector versions in libm.a
          and is written in highly optimized assembly language.
          This routine is described in a man page on Denali and
          in that manpage there is Fortran version that mimics
          the assembly language. That Fortran version does not
          run on the T3D because of differences in normalizing
          floating point numbers, but the T3D does have a
          version in /mpp/lib/libm.a that produces results
          similar to those on the Y-MP. It's a little
          inconsistent to have the common manpage for the Y-MP
          and T3D to have a program describing the function
          run only on the Y-MP.
  rantom: The versions in benchlib's lib_random.a are different
          than both of the above options but have been written
          for FAST execution on the T3D processor. In
          lib_random.a are both Fortran and assembly language
          versions and a manpage describing the algorithm and
          its speed is in /usr/local/examples/mpp/src/random

Timing Routines

Below is the program I used to time the T3D routines:

  #include <stdlib.h>

  main()
  {
    int a[ 1000000 ], b[ 1000000 ], c[ 1000000 ], d[ 1000000 ];
    int nlog, n, i;
    double t1, second(), t2, t3, t4;
    int rand();
    fortran double RANF();
    fortran double RANTOM();
    double denom;


    denom = (double)RAND_MAX;
    printf( " RAND_MAX = %d %f\n", RAND_MAX, denom );

    n = 1;
    for( nlog = 0; nlog < 7; nlog++ ) {
      t1 = second();
        for( i = 1; i <= n; i++ ) { a[ i ] = rand() / denom; }
      t1 = second() - t1;
      t2 = second();
        for( i = 1; i <= n; i++ ) {
          b[ i ] = RANF();
        }
     t2 = second() - t2;
      t3 = second();
        for( i = 1; i <= n; i++ ) {
          c[ i ] = RANTOM();
        }
      t3 = second() - t3;
      t4 = second();
        for( i = 1; i <= n; i++ ) {
          d[ i ] = RANTOMS();
        }
      t4 = second() - t4;
      printf("%3d %10d %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f\n"
      ,nlog,n,t1,n/(t1*1000000),t2,n/(t2*1000000),
              t3,n/(t3*1000000),t4,n/(t4*1000000));
      n = n * 10;
    }
  }

  double second()
  {
    fortran irtc();
    return( irtc( ) / 150000000.0 );
  }
  </pre>
  The timing program used on the Y-MP was:
  <pre>
  #include <stdlib.h>

  main()
  {
    int a[ 1000000 ], b[ 1000000 ], c[ 1000000 ], d[ 1000000 ];
    int nlog, n, i;
    double t1, SECOND(), t2, t3, t4;
    int rand();
    fortran double ranf();
    fortran double RANTOM();
    fortran double SECOND();
    double denom;
    int zero = 0;


    denom = (double)RAND_MAX;
    printf( " RAND_MAX = %d %f\n", RAND_MAX, denom );

    n = 1;
    for( nlog = 0; nlog < 7; nlog++ ) {
      t1 = SECOND();
        for( i = 1; i <= n; i++ ) {
          a[ i ] = rand() / denom;
        }
      t1 = SECOND() - t1;
      t2 = SECOND();
        for( i = 1; i <= n; i++ ) {
          b[ i ] = RANFF();
        }
     t2 = SECOND() - t2;
     RANSET( &zero );
     t3 = SECOND();
        for( i = 1; i <= n; i++ ) {
          c[ i ] = ranf();
        }
     t3 = SECOND() - t3;
     RANSET( &zero );
     t4 = SECOND();
        for( i = 1; i <= n; i++ ) {
          d[ i ] = _ranf();
        }
     t4 = SECOND() - t4;
     for( i = 0; i <= n; i++ ) {
       if( c[ i ] != d[ i ] ) {
         printf( "diff in ranf %f %f\n", i, c[ i ], d[ i ] );
       }
     }
     printf("%3d %10d %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f\n"
     ,nlog,n,t1,n/(t1*1000000),t2,n/(t2*1000000),t3,n/(t3*1000000),t4,n/(t4*1000000));
      n = n * 10;
    }
  }

Timing Results

Usually the the results of a RNG are given in terms of millions of random numbers per second and that is how I've arranged the table below. I also like to time a loop's worth of results and then divide by the length of the loop. This gives some feel for the overhead of the loop compared to the work of the loop body and also shows the asymptotic speed for a large number of calls.

  The speed of random generators on the T3D and Y-MP 
  ==================================================
      (in millions of random numbers per second)
 
  
T3D routines:


       RNG ->  rand()  ranf()     rantom()
                              Fortran Assembler
               ------  ------ ------- ---------
  loops     1   0.2     0.2     0.3     0.3
           10   1.1     0.9     1.1     1.5
          100   2.3     1.2     1.5     2.3
         1000   2.6     1.3     1.6     2.4
        10000   2.6     1.3     1.6     2.5
       100000   2.6     1.3     1.6     2.4
      1000000   2.6     1.3     1.6     2.4

  
Y-MP routines:


       RNG ->  rand()  ranf()     ranf()
                      Fortran library routines
                               Scalar  Vector
               ------ -------  ------  ------
  loops     1   0.2     0.1     0.1     0.2
           10   0.6     0.3     0.6     1.5
          100   0.8     0.3     0.9    10.5
         1000   0.9     0.3     0.9    18.5
        10000   0.9     0.3     0.9    19.3
       100000   0.9     0.3     0.9    19.1
      1000000   0.9     0.3     0.9    19.4
Observations:
  1. Rand() runs faster on a single PE of the T3D than on the Y-MP. The price of portability often obscures performance differences.
  2. The difference between rand() and ranf() on both machines shows that the quality of a random sequence does not come without cost.
  3. That the difference between the Fortran version of rantom() and the assembler version on the T3D is not as great as the difference as the ranf() versions on the Y-MP may be a sign that the days of assembly language writing are on the wane.
  4. The last three columns show a range of Y-MP performance, all computing the same sequence of random numbers. The performance follows the effort:
    
      Fortran -> assembler -> vectorized assembler
    
  5. The last two loops for the Y-MP timing program show: "What a difference an underscore makes!" (The underscore invokes the vectorized version.)

List of Differences Between T3D and Y-MP

The current list of differences between the T3D and the Y-MP is:
  1. Data type sizes are not the same (Newsletter #5)
  2. Uninitialized variables are different (Newsletter #6)
  3. The effect of the -a static compiler switch (Newsletter #7)
  4. There is no GETENV on the T3D (Newsletter #8)
  5. Missing routine SMACH on T3D (Newsletter #9)
  6. Different Arithmetics (Newsletter #9)
  7. Different clock granularities for gettimeofday (Newsletter #11)
  8. Restrictions on record length for direct I/O files (Newsletter #19)
  9. Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
  10. Missing Linpack and Eispack routines in libsci (Newsletter #25)
  11. F90 manual for Y-MP, no manual for T3D (Newsletter #31)
  12. RANF() and its manpage differ between machines (Newsletter #37)
I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top