ARSC T3D Users' Newsletter 97, July 26, 1996

IHPSTAT Provides Run-Time Stats on Available Memory

"IHPSTAT" is a UNICOS library routine which you can call from your programs (see "man IHPSTAT"). It returns statistics about the heap.

You can use IHPSTAT to check the availability of memory, at run-time. You might want to do this even if your code itself does not allocate memory dynamically: it might call library routines which do. If you get unexpected out-of-memory conditions, you can instrument your code with IHPSTAT to help locate and avoid these problems.

What follows is a makefile, a sample program, and some output to demonstrate IHPSTAT. It causes the system library routine, OPEN, to fail on an out-of-memory condition.

The program allocates a chunk of static memory, then dynamically allocates ten blocks, one at a time, until failure or done. At every step, it prints all of IHPSTAT's available information. This allows us to watch the heap approach critical limits. After the allocation loop, the program attempts to open a file, write to it, and close it.

For the sample run, I have requested more memory than the system can provide. I trap this error in HPALLOC (another UNICOS library routine), abort the allocation loop, and, although at this point the program is still running, it crashes in the "OPEN" call and gives this message:


  mpplib-5010 ./tst0: UNRECOVERABLE library error 
     A request for more memory has failed.
The program is, of course, designed to test memory limits. If it were a real program, there are a couple of ways the programmer might avoid a crash of this nature. The first line of defense is to trap the error in the call to OPEN:

  Replace the dangerous code:

  
100   open (unit=7, file='junk', status='unknown', access='direct',
  
     &      recl=128, form='unformatted')
  


  With an error trapping version:

  

  
100   open (unit=7, file='junk', status='unknown', access='direct',
  
     &      recl=128, form='unformatted', err=200, iostat=ios)
  

  ....
  
      goto 10000 
  

  
200   print*,"Error in open: iostat=", ios
  
      stop
  

  
10000 end
  

A second approach is to use the output of IHPSTAT(11) to watch for critically low memory. An advantage of this approach is that it can be used to protect any library calls -- even those which, unlike OPEN and HPALLOC, do not have built-in exception handling.

  ############################################################
  # makefile
  ############################################################
  FC=/mpp/bin/cf77
  # FC=/mpp/bin/f90

  hpstats.o: hpstats.f
          TARGET=cray-t3d $(FC) -X 1 -c hpstats.f

  tst0:        tst0.F hpstats.o makefile
          TARGET=cray-t3d $(FC) -X 1 -Wp" -F -D BLKSZ=20000 -D STATSZ=7200000" \
          tst0.F hpstats.o -o tst0
          TARGET=cray-t3d MPP_NPES=1 tst0
  ############################################################

  tst0.F:
  =======
  cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
        program heap_tst

        intrinsic MY_PE

        ! Use gpp (generic pre-processor) to define STATSZ and BLKSZ.
        ! See the makefile...
        real*8 static_array (STATSZ) 
        integer addr_list(0:9)

        if (MY_PE() .NE. 0) stop

        write (6,1001) "Static memory allocated",  
       & "  (words): ",  STATSZ

        do i = 0, 9
          write (6,1002) i, "   *** Dynamic mem alloced: ", 
       &   BLKSZ * i, "  Total: ", STATSZ + BLKSZ * i

          call dump_hpstats

          ! Alloc a block, if possible. Success if ierr==0.
          call HPALLOC (addr_list (i), BLKSZ, ierr, 0)
          print*, "HPALLOC err code: ", ierr

          ! Get out of loop if alloc failed
          if (ierr .NE. 0) goto 100       
        enddo

        ! Attempt to open a file.
  100   open (unit=7, file='junk', status='unknown', access='direct',
       &      recl=128, form='unformatted')
        write (7,rec=1) 1.0
        close (unit=7)


        ! So compiler won't optimize away the array
        call dummy (static_array)


  1001  format (2a,i15)
  1002  format (i2,a,i15,a,i15)

        end
  cc
        subroutine dummy
        end
  cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc


  hpstats.f:
  ==========
  cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
  c 
  c Dumps all possible output of IHPSTAT in readable form.
  c
        subroutine dump_hpstats

        write (6,1000) "IHPSTAT code=1 ",
       &  "Current heap length                   ==> ", IHPSTAT(1)
        write (6,1000) "IHPSTAT code=4 ",
       &  "Number of allocated blocks            ==> ", IHPSTAT(4)
        write (6,1000) "IHPSTAT code=10 ",
       &   "Size of the largest free block       ==> ", IHPSTAT(10)
        write (6,1000) "IHPSTAT code=11 ",
       &   "Amount by which the heap can shrink  ==> ", IHPSTAT(11)
        write (6,1000) "IHPSTAT code=12 ",
       &   "Amount by which the heap can grow    ==> ", IHPSTAT(12)
        write (6,1001) "IHPSTAT code=13 ",
       &   "First word address of the heap       ==> ", IHPSTAT(13)
        write (6,1001) "IHPSTAT code=14 ",
       &   "Last word address of the heap        ==> ", IHPSTAT(14)

        call FLUSH (6)

  1000  format (2a,i15)
  1001  format (2a,z15)

        end
  cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc


  Sample output:
  ========================

  denali$ make tst0"
  TARGET=cray-t3d /mpp/bin/cf77 -X 1 -c hpstats.f
  TARGET=cray-t3d /mpp/bin/cf77 -X 1 -Wp" -F -D BLKSZ=20000 -D STATSZ=7200000" \
         tst0.F hpstats.o -o tst0
  TARGET=cray-t3d MPP_NPES=1 tst0


  Static memory allocated  (words):         7200000

   0   *** Dynamic mem alloced:               0  Total:         7200000
  IHPSTAT code=1 Current heap length                   ==>           98296
  IHPSTAT code=4 Number of allocated blocks            ==>               6
  IHPSTAT code=10 Size of the largest free block       ==>           92806
  IHPSTAT code=11 Amount by which the heap can shrink  ==>           92808
  IHPSTAT code=12 Amount by which the heap can grow    ==>           13968
  IHPSTAT code=13 First word address of the heap       ==> 000004000024BB0
  IHPSTAT code=14 Last word address of the heap        ==> 0000040000E4B78
   HPALLOC err code: 0

   1   *** Dynamic mem alloced:           20000  Total:         7220000
  IHPSTAT code=1 Current heap length                   ==>           98296
  IHPSTAT code=4 Number of allocated blocks            ==>               9
  IHPSTAT code=10 Size of the largest free block       ==>           72478
  IHPSTAT code=11 Amount by which the heap can shrink  ==>           72480
  IHPSTAT code=12 Amount by which the heap can grow    ==>           13968
  IHPSTAT code=13 First word address of the heap       ==> 000004000024BB0
  IHPSTAT code=14 Last word address of the heap        ==> 0000040000E4B78
   HPALLOC err code: 0

   2   *** Dynamic mem alloced:           40000  Total:         7240000
  IHPSTAT code=1 Current heap length                   ==>           98296
  IHPSTAT code=4 Number of allocated blocks            ==>              10
  IHPSTAT code=10 Size of the largest free block       ==>           52470
  IHPSTAT code=11 Amount by which the heap can shrink  ==>           52472
  IHPSTAT code=12 Amount by which the heap can grow    ==>           13968
  IHPSTAT code=13 First word address of the heap       ==> 000004000024BB0
  IHPSTAT code=14 Last word address of the heap        ==> 0000040000E4B78
   HPALLOC err code: 0

   3   *** Dynamic mem alloced:           60000  Total:         7260000
  IHPSTAT code=1 Current heap length                   ==>           98296
  IHPSTAT code=4 Number of allocated blocks            ==>              11
  IHPSTAT code=10 Size of the largest free block       ==>           32462
  IHPSTAT code=11 Amount by which the heap can shrink  ==>           32464
  IHPSTAT code=12 Amount by which the heap can grow    ==>           13968
  IHPSTAT code=13 First word address of the heap       ==> 000004000024BB0
  IHPSTAT code=14 Last word address of the heap        ==> 0000040000E4B78
   HPALLOC err code: 0

   4   *** Dynamic mem alloced:           80000  Total:         7280000
  IHPSTAT code=1 Current heap length                   ==>           98296
  IHPSTAT code=4 Number of allocated blocks            ==>              12
  IHPSTAT code=10 Size of the largest free block       ==>           12454
  IHPSTAT code=11 Amount by which the heap can shrink  ==>           12456
  IHPSTAT code=12 Amount by which the heap can grow    ==>           13968
  IHPSTAT code=13 First word address of the heap       ==> 000004000024BB0
  IHPSTAT code=14 Last word address of the heap        ==> 0000040000E4B78
   HPALLOC err code: 0

   5   *** Dynamic mem alloced:          100000  Total:         7300000
  IHPSTAT code=1 Current heap length                   ==>          106488
  IHPSTAT code=4 Number of allocated blocks            ==>              13
  IHPSTAT code=10 Size of the largest free block       ==>             638
  IHPSTAT code=11 Amount by which the heap can shrink  ==>             640
  IHPSTAT code=12 Amount by which the heap can grow    ==>            5776
  IHPSTAT code=13 First word address of the heap       ==> 000004000024BB0
  IHPSTAT code=14 Last word address of the heap        ==> 0000040000F4B78
   HPALLOC err code: -2

  Make: "TARGET=cray-t3d MPP_NPES=1 tst0" terminated due to signal 6
  mpplib-5010 ./tst0: UNRECOVERABLE library error 
    A request for more memory has failed.

  Encountered during an OPEN of unit 7
  Fortran unit 7 is not connected
  Error initiated at line 583 in routine '_f_open'.
  Abort

  Beginning of Traceback (PE 0):
    Started from address 0x2000035594 in routine 'NAME'.
    Called from line 18 (address 0x2000007630) in routine 'raise'.
    Called from line 92 (address 0x2000008c4c) in routine 'abort'.
    Called from line 103 (address 0x200009e474) in routine '_ferr'.
    Called from line 583 (address 0x20000b00fc) in routine '_f_open'.
    Called from line 344 (address 0x20000b1020) in routine '__OPN'.
    Called from line 25 (address 0x20000007a0) in routine 'HEAP_TST'.
    Called from line 307 (address 0x2000005040) in routine '$START$'.
  End of Traceback.

  Agent printing core file information:
  user exiting after receiving signal 6
  Exit message came from virtual PE 0, logical PE 0xc
  Register dump

    pa0: 0x00000060fc90f688       pa1: 0x0000000000005b21  pa2: 0x0000000000000001
  ...
  ... etc ...
  ...
     f30:0x0000000000000000                          fpcr:0x8900000000000000


  Agent finished printing core file information.
  User core dump completed (./mppcore)
  ############################################################
There is a man page for IHPSTAT, but its "See Also" section is pretty brief. So...

See Also:


  IHPLEN
  IHPVALID
  HPALLOC 
  HPCHECK
  HPCLMOVE
  HPDEALLC
  HPSHRINK
  HPDUMP
  HPNEWLEN

Porting Network PVM Codes to the T3D

[ Contributed by Richard Barrett of LANL. ]

(This article discusses low-level issues involved in porting network PVM codes to the T3D. It makes a nice companion to the articles in Newsletters #90 and #91 on porting heterogeneous, master-slave codes.)

The Cray T3D message passing library uses a PVM interface. (Functionality is accomplished using the shmem library.) So for the most part programs written in PVM will run properly on the T3D. However, there are some exceptions, which we list below. Also included are some performance tips and cautions.

  1. There is no spawning on the T3D. You simply "tell it" how many processes you want, and they are started up. So to make your code portable, simply put #ifndef _CRAYMPP around your code that does the spawning. Then to get the task ids (tids) of participating processes, each process can execute the following code:
    
      for ( i=0; i< nprocs; i++ )
         tids[i] = pvm_gettid( NULL, i );    /* #ifdef'd for the T3D */
    
      (The NULL group is discussed below.)
    
  2. Unlike the network version of PVM, only one process per processor is allowed. The processes are numbered 0 to NPROCS-1, and there is a one to one correspondence between process ids and processor numbers. (Process 0 is the parent, and runs on processor 0.)
  3. Because processes/processors are numbered 0 to NPROCS-1 on the T3D, you can avoid the use of tids by using processor numbers. (Returned using pvm_get_PE( taskid ).) However, this isn't portable, so use tids as usual. Still, the processor numbers are convenient for debugging, so we set them using a global variable. Of course this is possible on all platforms by assigning processor numbers to PVM tasks according to their position in the tids array (i.e. order of spawning), so pvm_get_PE is not necessary anyway. (So use TIDS(I) to send a message to process I.)
  4. Be careful with the use of PvmDataInPlace. It is a documented T3D feature that this mode of data management may allow pvm_send/pvm_psend to return before the data is safely on its way to the target process. So it's possible to overwrite the data you think you are sending. And Cray doesn't provide a polling function to find out if/when the data is safe. Therefore, the safe way to send data is to use PvmDataRaw and incur a data copy/performance penalty.
  5. Group operations. With the exception of the default global group (designated "NULL"), the performance of group operations is poor. This is due to hardware constraints, so a software fix probably won't improve the situation.
  6. Fast parallel I/O is possible using pvm_channels. However, this functionality is specific to the T3D, so it is not portable. On the other hand, efficient parallel I/O is not yet available using a common interface, so pvm_channels is worth investigating when large amounts of data are involved.
  7. pvm_pack/unpack functions perform poorly on this machine, or are at least more noticeable if you make repeated calls to pvm_pack before a pvm_send. In doing our own data management (i.e. packing our own array and making one call to pvm_pack), we got a 25 times speedup on a hydro code, and this even though our code is not (at least with this method) communication intensive.
  8. For small messages, the T3D specific functions pvm_fastsend will transfer the data in about half the time of a regular send. "Small" is a user definable environment variable (the default size is 256 bytes). Note that this functionality is accomplished by including the data in the message header (which is used in all T3D message passing) rather than in the subsequent data packets, so setting it "too high" can have an adverse effect on your larger messages.
A note on MPI:

If you are starting from scratch, you should consider using MPI. Most of the above problems disappear, and you might find programming simpler and performance better (especially in light of the PvmDataInPlace problem) than with PVM. Additionally, the MPI I/O group has joined more closely with the MPI Forum, so parallel I/O may one day come easier. I will add a caution here also, however. The MPI-2 effort (to be announced at SuperComputing '96 in November) will probably include one sided communication (a la T3D shmem). However, although this will be portable, the performance on other platforms is in question. And although not wanting to be a member of the flat-earth society, I believe this performance degradation will not easily go away since it is perhaps due to hardware constraints.


  Richard Barrett
  rbarrett@lanl.gov
  Distributed Computing Group
  Los Alamos National Lab

Local User Group Meeting -- Next Thursday

ARSC's T3D User Group will be meeting on Thursday, August 1 at 3:00 PM. The location will be Butrovich, room 106A.

We would like to get to know our local users a little better, and seek opinions on possible configurations for the T3E. The agenda is:

  1. Attendees describe the work they are doing (or planning to do) on the T3D/E. With that in mind, it would be helpful for members of the group to give a quick summary of their work. This would allow ARSC to gain deeper insight into the activities of local users and provide you with an opportunity to find out what your peers at UAF are doing.
  2. We stimulate conversation about possible configurations for the "E."
Please drop by!

On the Web: Current Status of ARSC's T3D

Next time you wake up with that nagging question, "I just wonder how many grayling I could catch, up there in Alaska..." Well, we wouldn't be much help. But if you're wondering how many PEs you could get, go to ARSC's welcome page:
http://www.arsc.edu/
and click the "Current Status" button at the bottom of the screen. We keep our graphics to a minimum, so it shouldn't take long at all.
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top