ARSC T3D Users' Newsletter 77, March 8, 1996

More on SGEMV

Ed Anderson was rightfully not satisfied with my results for SGEMV in the last ARSC T3D newsletter. He sent in this addition:


> (Here is a better response to your last newsletter.  --ed)
> 
> 
> > For my timings, I probably didn't get to 50 MFLOPS because we
> > haven't moved up to the 2.0 PE.
> 
> The quoted single-PE performance rate for SGEMV on the CRAY T3D is a
> benchmark number, and so of course your actual performance may be
> somewhat less.  With two minor modifications to the benchmark program
> listed in Newsletter 75, I was able to attain 55 Mflops from SGEMV on
> our in-house T3D (running Programming Environment 2.0).  These changes
> were:
> 
> 1)  Inserted the line
> 
> cdir$   cache_align a, b, c, d
> 
> to ensure initial cache-line alignment of all the arrays.
> 
> 2)  Changed the leading dimension of a to 1024.  This happens to favor
> the current implementation of SGEMV, but it might not be a good idea in
> general because each successive column will map to exactly the same
> place in the cache as the previous column.  In fact, the performance of
> some of the other variants was degraded, but SGEMV improved.
> 
> Here are the results (with maxtrips = 3):
> 
> case size sgemm  SGEMV   mxma   call    call fsaxpy fsaxpy1 fsaxpy2 fsaxpy3 fsdot
>                                 saxpy   sdot
>   1    0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
>   2    1    0.1    0.1    0.1    0.1    0.1    0.2    0.3    0.3    0.2    0.3
>   3    2    0.4    0.5    0.9    0.5    0.6    1.7    1.6    1.5    1.0    1.2
>   4    3    1.0    1.3    2.0    1.2    1.2    3.4    3.3    3.1    2.1    2.2
>   5    4    1.5    1.8    3.3    2.1    1.7    4.4    4.8    4.5    2.7    3.0
>   6    5    1.7    2.1    3.6    2.5    2.2    4.7    5.5    5.1    3.4    3.6
>   7    6    2.5    2.9    5.3    3.1    2.8    6.4    6.7    6.1    4.2    4.2
>   8    7    3.5    4.2    6.2    4.0    3.3    7.0    7.7    6.9    4.9    4.5
>   9    8    3.5    5.5    7.1    4.7    3.7    8.4    8.3    7.2    4.9    4.9
>  10    9    3.6    6.0    6.7    4.7    4.1    8.0    8.1    7.1    5.4    5.1
>  11   10    4.5    7.1    8.0    5.1    4.4    3.8    8.8    7.7    5.8    5.2
>  12   16    8.7   14.8    9.4    9.5    5.8   10.8   11.0    9.3    6.7    6.0
>  13   20    9.9   14.5   10.3   11.0    6.2   11.2   11.4    9.8    6.8    5.0
>  14   30   13.6   22.7   10.6   13.0    7.0   11.0   11.3    9.8    6.4    6.4
>  15   32   17.2   32.3   10.9   14.7    7.2   11.4   11.6   10.1    7.2    6.4
>  16   40   17.6   30.7   11.2   16.0    7.4   11.4   11.7   10.1    7.2    6.4
>  17   50   19.0   34.4   11.0   15.6    7.6   10.5   11.4    9.9    7.1    6.5
>  18   60   19.6   39.2   11.0   16.7    7.7   11.0   11.4    9.9    7.2    6.5
>  19   63   20.6   37.7   10.4   16.2    7.6   10.9   10.8    9.7    7.3    6.6
>  20   64   22.1   42.2   10.9   12.4    5.1   10.6   11.3    9.5    7.1    6.6
>  21   65   20.5   38.9   10.6   12.2    5.1   10.6   10.9    9.2    7.3    6.5
>  22   70   18.7   34.8   10.6   12.5    5.1   10.7   11.0    9.5    7.1    6.5
>  23   80   22.9   40.8   10.7   14.1    5.3   10.7   11.0    9.4    7.1    6.5
>  24   90   22.1   40.9   10.5   14.6    5.3   10.3   10.5    9.4    7.2    6.6
>  25  100   22.9   40.4   10.3   15.2    5.5   10.4   10.6    9.3    7.1    6.7
>  26  128   24.0   47.5   10.2   17.2    5.6   10.2   10.6    9.2    7.1    6.8
>  27  200   24.4   51.3   10.0   19.5    5.9   10.0   10.3    9.0    7.0    6.8
>  28  256   25.2   52.0    9.9   20.7    5.9    9.9   10.2    8.9    7.0    6.9
>  29  300   25.0   53.1    9.9   21.3    6.0    9.9   10.2    8.9    7.0    6.9
>  30  400   25.2   53.5    9.8   22.6    6.1    9.8   10.1    8.8    7.0    6.9
>  31  500   25.1   53.3    9.8   23.3    6.1    9.7   10.1    8.8    7.0    6.9
>  32  512   25.3   54.5    9.7   23.4    6.1    9.7   10.0    8.8    7.0    6.9
>  33  600   25.2   54.4    9.7   23.8    6.2    9.7   10.0    8.8    7.1    6.9
>  34  700   25.3   54.8    9.7   24.0    6.2    9.7   10.0    8.7    7.1    6.9
>  35  800   25.3   55.1    9.7   24.4    6.2    9.7   10.0    8.7    7.2    6.9
>  36  900   25.2   54.6    9.7   24.7    6.2    9.7   10.0    8.7    7.2    7.0
>  37 1000   25.3   55.2    9.6   24.9    6.2    9.6    9.9    8.7    7.3    7.0

More Collaboration Between PGI and CRI

From CRI, by way of Jay Boisseau, who now works at the San Diego Supercomputer Center, I got the following announcement. (ARSC's gratis license to use the Portland Group's High Performance Fortran compiler has not yet been renewed.)

> Cray/Media:                     Steve Conway, 612/683-7133
> The Portland Group:             Denney Cole, 503/682-2806
> The Bernhardt Agency:           Cheri Maniscalco, 503/226-6452
> 
> CRAY RESEARCH AND THE PORTLAND GROUP FINALIZE
> AGREEMENTS FOR HPF ON CRAY SYSTEMS, PGI TO DEVELOP 
> NEW MERGED PROGRAMMING PRODUCT BASED ON 
> CRAY/PGI TECHNOLOGY
> 
> Eagan, Minn., March 7, 1996 -- Cray Research, Inc. (NYSE:CYR)
> and The Portland Group, Inc. (PGI) announced today that they
> have finalized a development and reseller agreement to
> initially make PGI's pghpf(tm) High Performance Fortran (HPF)
> compiler available on Cray(R) systems and for PGI to develop
> and support a merged programming product based on Cray's
> CRAFT flexible programming model and PGI's pghpf, creating
> the most complete and powerful implicit programming model
> available on the market, according to the two companies. 
> 
> Late last year the two companies announced their intent to
> make PGI's HPF product available on Cray systems.  Today's
> announcement extends that intended relationship with a
> reseller arrangement and an important development program to
> combine Cray's programming model with PGI's technology to
> create a comprehensive programming solution for both Cray
> and PGI customers.  The combined product is expected to be
> available in late 1996. 
> 
> The agreement is important because implicit programming
> models are easiest to use on today's highly scalable,
> distributed memory systems, and the combined model
> announced today is the most capable in the world, the
> companies said. 
> 
> Under terms of the development agreement, to ensure cross-
> platform portability, the merged programming model will be
> supported by PGI on a variety of parallel systems, including all
> Cray Research shared memory systems, as well as the scalable
> CRAY T3D and CRAY T3E distributed memory supercomputers.
> Under the terms of the reseller agreement, Cray will resell
> PGI's HPF, as well as the merged programming product on all
> Cray(R) systems.  
> 
> According to Mike Booth, vice president of Cray's software
> division, "We've been working with PGI for several months
> toward a concrete plan and schedule for integrating the HPF
> and CRAFT programming models.  We are pleased to announce
> that we'll be providing Cray users with an HPF environment
> that is consistent across all of our product lines, provides
> maximum power and flexibility on both shared- and
> distributed-memory systems, and is based on PGI's industry-
> leading HPF product."
> 
> According to Douglas Miles, director of marketing at PGI,
> "We're very pleased Cray will be offering our HPF products
> directly to their customers.  Expanding the relationship to
> incorporate the power of Cray's CRAFT model will be of
> substantial benefit to many existing Cray customers, as well
> as PGI's customers.  CRAY T3D/T3E users will be able to
> migrate their applications toward HPF compatibility with the
> freedom of using the CRAFT model where appropriate and the
> assurance that their code will continue to be portable across a
> variety of parallel systems."
> 
> Under the separate development agreement, PGI will make
> enhancements to its HPF product to incorporate the
> functionality of Cray's CRAFT programming model.  Cray
> Research developed the CRAFT parallel programming model to
> provide CRAY T3D(tm) customers with a flexible model that
> combines the functionality of several programming styles --
> data parallel, work sharing, and message-passing.  The key
> components of the CRAFT model will be embedded within a new
> HPF standard-conforming extrinsic type called
> EXTRINSIC(HPF_CRAFT) to provide a multi-threaded execution
> environment.  This combined Cray/PGI solution will give
> programmers all the flexibility of CRAFT incorporated in a
> standard fashion within a portable language.
> 
> HPF is the leading implicit parallel programming model for
> shared- and distributed-memory parallel systems.  It is the
> first widely accepted programming language suitable for
> writing portable applications in the data-parallel programming
> style.  PGI provides the leading commercial implementation of
> HPF on a variety of parallel systems.
> 
> Cray Research provides the leading high-performance tools and
> services to help solve customers' most challenging problems.
> 
> PGI is a leading independent vendor of software compilers and
> tools for parallel computing.  PGI provides high performance,
> retargetable, production quality compilers and software
> development tools to the high performance and parallel
> computing industries.
>                            # # #

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top