ARSC HPC Users' Newsletter 408, November 13, 2009

ARSC at SC09

ARSC will be exhibiting in two booths at Supercomputing 2009 in Portland, Oregon:

  • Booth 1709 - University of Alaska Fairbanks
  • Booth 1321 - DoD High Performance Computing Modernization Program

Additionally, we will have a table at the SC09 Student Job Fair.

Fifteen ARSC representatives will be attending this year. Please stop by to say hello, ask any HPC questions you may have, or inquire about the weather back in Alaska. Mention this newsletter at Booth 1709 for a free "Alaska Supercomputer Repair Kit."

Return of Letters from Santa

As I type this, it's 21 degrees Fahrenheit and snowing at the North Pole - North Pole, Alaska that is. Which reminds us of an HPC Newsletter holiday tradition. If you know someone who would be thrilled by a letter postmarked "North Pole", your editors can help make that happen.

Put that stamped, addressed letter in another envelope and mail it to us. We will mail the enclosed letter from North Pole, Alaska, which uses a special North Pole Christmas postmark during the Christmas season. We plan to send these letters around December 11, so mail them to us by the end of the first week of December and they should have plenty of time to get here and then over the next sleigh out of town.

Send to:

  • Ed Kornkven and Craig Stephenson
  • Arctic Region Supercomputing Center
  • University of Alaska Fairbanks
  • P.O. Box 756020
  • Fairbanks, AK 99775-6020

Fine-Tuning Memory for IDV

[ By Patrick Webb ]

The Integrated Data Viewer (IDV) is a useful and powerful visualization tool that is installed on all ARSC workstations. The default installation works well for many applications, but there are some tricks to achieving greater performance for large, complex data sets. The biggest limiting factor when using the IDV is the size of the Java heap space, which is restricted to the amount of memory that is allocated to it when IDV launches. The default is 512 MB of memory, which I will show you how to increase, as well as a couple of other neat tricks.

The most basic way of using the IDV is to load the module of the current version, currently 2.5, and then typing 'runIDV' to start:


    % module load idv-2.5
    % runIDV

What's going on behind the scenes here is that "runIDV" is not in fact the IDV executable, but a script that sets up a Java command for running the IDV. By making your own run script, you will have a good deal more control over the IDV and your work.

The first step is to create your own version of the runIDV script. It does not need to be called "runIDV", it can be called anything you like. I will call my own version "myRunIDV". You can either copy the run script from the IDV directory (currently /usr/local/pkg/idv/idv-2.5). Or, since it is rather short, you can copy it from this article. This is the default IDV run script:


    dirname=`dirname $0`
    command="java -Xmx512m -Didv.enableStereo=false -jar ${dirname}/idv.jar $*"

    if test -f jre/bin/java; then
        # We are in the installer directory
        ./jre/bin/${command}
    else
        # Try using the dirname of this script
        if test -f ${dirname}/jre/bin/java; then
           ${dirname}/jre/bin/${command}
        else 
           if test  ${JAVA_HOME}; then
              # Try using JAVA_HOME
              ${JAVA_HOME}/bin/${command}
           else 
              # Try just using java
              ${command}
          fi
        fi
    fi

Only the first two lines of this script are relevant to our purposes. In the first line, the "dirname" variable is set to the directory where the IDV .jar file resides. By default, the script uses its own directory path. The second line is the Java command that will start the IDV. Included in this command is the option that controls the amount of memory allocated to the IDV, and by default an option that turns off stereo rendering. This script allocates 512 MB, which is enough for some tasks, but we want more memory.

The first thing to do is to point the run script to the idv.jar directory. If you have your own personal installation of the IDV, you can change the dirname variable assignment from dirname=`dirname $0` to dirname="/path/to/your/IDV". To point to the ARSC installation, for example, use:


    dirname="/usr/local/pkg/idv/idv-2.5"

(NOTE: The same run script may work with other IDV versions, so it is possible to change IDV versions by merely changing the path.)

With the dirname variable now explicitly pointing an IDV installation directory, we are free to store the script anywhere.

The second line is where we can customize our IDV session. The option "-Xmx512m" defines the amount of memory allocated. The "m" in "512m" stands for megabytes, so this option is allocating 512 megabytes. Change the "-Xmx512m" to something larger, like "-Xmx1024m", or maybe even higher if you have more memory available. (NOTE: It is a bad idea to allocate more memory than is available on your system.) Most ARSC workstations have 8 GB of memory available. It is generally reasonable to allocate 75-80% of a workstation's total physical memory if you are the only user. But for this example, we will use 1024MB.


    command="java -Xmx1024m -Didv.enableStereo=false -jar ${dirname}/idv.jar $*"

Voila! Now the IDV will start with 1 GB of memory available to it. This can be very handy when working with large or numerous files.

Valgrind's Massif Heap Profiler

[ By Craig Stephenson ]

The previous two articles in this series showed how to use Valgrind's Memcheck and Cachegrind tools. They can be seen here:

Properly interpreting the output of each of these two commands required a lot of explanation. On the other hand, Massif, Valgrind's heap profiler, is fairly straightforward.

Massif records and plots usage of heap memory during runtime. Examples of what its graphs look like can be seen in the Massif manual, at the following address:

Although the examples given in the Massif manual are written in C, it works just as well for profiling Fortran codes. Let's take a look at the following very simple Fortran 90 code, named "allocate.f90":


    PROGRAM allocate
    IMPLICIT NONE

      INTEGER, ALLOCATABLE :: A(:)
      ALLOCATE(A(1000000))
      DEALLOCATE(A)

    END PROGRAM allocate

As with other Valgrind tools, codes profiled with Massif need to be compiled with the -g option to include debugging information:


    % pgf90 -g -o allocate allocate.f90

Now let's run this code through Massif. Because this code is very simple, I followed the manual's suggestion of using the --time-unit=B (bytes allocated/deallocated) option, "which is useful for very short-run programs, and for testing purposes, because it is the most reproducible across different machines." The full command looks like this:


    % $PET_HOME/bin/valgrind --tool=massif --time-unit=B ./allocate

When I ran this command, it produced a file named massif.out.19254. Massif output files take the form of massif.out.#####, where ##### is the process ID for the particular run. This file can be viewed with the accompanying ms_print command:


    % $PET_HOME/bin/ms_print massif.out.19254

    --------------------------------------------------------------------------------
    Command:            ./allocate
    Massif arguments:   --time-unit=B
    ms_print arguments: massif.out.19254
    --------------------------------------------------------------------------------


        MB
    4.825^                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
                                                                       #
         
               :                                                       #
         
               :                                                       #
         
               :                                                       #
         
               :                                                       #
       0 +----------------------------------------------------------------------->MB
         0                                                                   4.825

    Number of snapshots: 7
     Detailed snapshots: [5 (peak)]

    --------------------------------------------------------------------------------
      n        time(B)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
    --------------------------------------------------------------------------------
      0              0                0                0             0            0
      1         10,248           10,248           10,240             8            0
      2      1,058,832        1,058,832        1,058,816            16            0
      3      5,058,872        5,058,872        5,058,848            24            0
      4      5,059,272        5,059,272        5,059,240            32            0
      5      5,059,272        5,059,272        5,059,240            32            0
    100.00% (5,059,240B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
    ->79.27% (4,010,664B) 0x404DAD: __hpf_malloc_without_abort (in /lustre/wrkdir/user/allocate)
    
 ->79.06% (4,000,032B) 0x40260A: __hpf_alloc (in /lustre/wrkdir/user/allocate)
    
 
 ->79.06% (4,000,032B) 0x402B06: pgf90_alloc (in /lustre/wrkdir/user/allocate)
    
 
   ->79.06% (4,000,032B) 0x401E12: MAIN_ (allocate.f90:5)
    
 
     ->79.06% (4,000,032B) 0x401D4E: main (in /lustre/wrkdir/user/allocate)
    
 
       
    
 ->00.21% (10,632B) in 1+ places, all below ms_print's threshold (01.00%)
    
 
    ->20.73% (1,048,576B) 0x402198: allhdr (in /lustre/wrkdir/user/allocate)
      ->20.73% (1,048,576B) 0x402C64: pgf90_alloc (in /lustre/wrkdir/user/allocate)
        ->20.73% (1,048,576B) 0x401E12: MAIN_ (allocate.f90:5)
          ->20.73% (1,048,576B) 0x401D4E: main (in /lustre/wrkdir/user/allocate)
            
    --------------------------------------------------------------------------------
      n        time(B)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
    --------------------------------------------------------------------------------
      6      5,059,672        5,058,872        5,058,848            24            0

Of the two bars in the graph above, the first bar corresponds to snapshot 2, when 1,058,832 bytes of memory are allocated onto the heap. Another 4,000,440 bytes are allocated, corresponding to one million 4-byte integers (plus some overhead), bringing the total up to 5,059,272 bytes. This is represented by the second bar in the graph, which is drawn with '#' characters, indicating that it is the peak memory usage of the program. As indicated on the "detailed snapshots" line, snapshot 5 shows details of the program's peak memory usage, including a complete breakdown of memory allocation via a function callgraph.

This is all well and good, but I felt that this example was too basic to get a good feel for Massif. I wrote the next example, "multiple.f90", to see the results of a long series of memory allocations/deallocations:


    PROGRAM multiple
    IMPLICIT NONE

      INTEGER :: i
      INTEGER, ALLOCATABLE :: A(:)
      DO i = 1, 10000
        ALLOCATE(A(i*100))
        DEALLOCATE(A)
      END DO

    END PROGRAM multiple

We will follow the same steps as the previous example, except there is no longer a need for the --time-unit=B option. This code performs far more memory operations than the previous example, and in this sense may not qualify as a "very short-run program":


    % pgf90 -g -o multiple multiple.f90
    % $PET_HOME/bin/valgrind --tool=massif ./multiple
    % $PET_HOME/bin/ms_print massif.out.32322

This code allocates/deallocates 100 integers, then 200 integers, then 300 integers, and so on. Hence, we should expect a graph of memory allocations to look very linear. This seems to be the case indeed, although there are various gaps in the graph corresponding to deallocate statements rather than allocate statements. Whether a particular bar represents an allocation or a deallocation seems to be entirely dependent on timing:


    --------------------------------------------------------------------------------
    Command:            ./multiple
    Massif arguments:   (none)
    ms_print arguments: massif.out.606
    --------------------------------------------------------------------------------


        MB
    4.803^                                                                       #
         
                                                                       #
         
                                                                 @     #
         
                                                                 @     #
         
                                                     , @         @     #
         
                                                , ,@ @ @         @     #
         
                                            ,,@ @ @@ @ @         @     #
         
                                         ,@@@@@ @ @@ @ @         @     #
         
                                    ,,@ @@@@@@@ @ @@ @ @         @     #
         
                                , @ @@@ @@@@@@@ @ @@ @ @         @     #
         
                            , @ @ @ @@@ @@@@@@@ @ @@ @ @         @     #
         
                        .: @@ @ @ @ @@@ @@@@@@@ @ @@ @ @         @     #
         
                    . :@:: @@ @ @ @ @@@ @@@@@@@ @ @@ @ @         @     #
         
                .:: : :@:: @@ @ @ @ @@@ @@@@@@@ @ @@ @ @         @     #
         
             :@ ::: : :@:: @@ @ @ @ @@@ @@@@@@@ @ @@ @ @         @     #
         
      ,      :@ ::: : :@:: @@ @ @ @ @@@ @@@@@@@ @ @@ @ @         @     #
         
      @ :: :::@ ::: : :@:: @@ @ @ @ @@@ @@@@@@@ @ @@ @ @ :::::: :@: : :#
         
      @ :: :::@ ::: : :@:: @@ @ @ @ @@@ @@@@@@@ @ @@ @ @ :::::: :@: : :#
         
      @ :: :::@ ::: : :@:: @@ @ @ @ @@@ @@@@@@@ @ @@ @ @ :::::: :@: : :#
         
      @ :: :::@ ::: : :@:: @@ @ @ @ @@@ @@@@@@@ @ @@ @ @ :::::: :@: : :#
       0 +----------------------------------------------------------------------->Mi
         0                                                                   4.858

    Number of snapshots: 50
     Detailed snapshots: [2, 8, 14, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
    29, 30, 31, 32, 33, 34, 35, 36, 37, 45, 49 (peak)]

    --------------------------------------------------------------------------------
      n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
    --------------------------------------------------------------------------------
    ...
     49      5,093,990        5,036,712        5,036,688            24            0
    100.00% (5,036,688B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
    ->79.18% (3,988,112B) 0x404DAD: __hpf_malloc_without_abort (in /lustre/wrkdir/user/multiple)
    
 ->78.98% (3,977,872B) 0x40260A: __hpf_alloc (in /lustre/wrkdir/user/multiple)
    
 
 ->78.98% (3,977,872B) 0x402B06: pgf90_alloc (in /lustre/wrkdir/user/multiple)
    
 
   ->78.98% (3,977,872B) 0x401E22: MAIN_ (multiple.f90:7)
    
 
     ->78.98% (3,977,872B) 0x401D4E: main (in /lustre/wrkdir/user/multiple)
    
 
       
    
 ->00.20% (10,240B) in 1+ places, all below ms_print's threshold (01.00%)
    
 
    ->20.82% (1,048,576B) 0x402198: allhdr (in /lustre/wrkdir/user/multiple)
      ->20.82% (1,048,576B) 0x402C64: pgf90_alloc (in /lustre/wrkdir/user/multiple)
        ->20.82% (1,048,576B) 0x401E22: MAIN_ (multiple.f90:7)
          ->20.82% (1,048,576B) 0x401D4E: main (in /lustre/wrkdir/user/multiple)

As in the previous example, the bar drawn with '#' characters indicates peak memory usage. Bars drawn with '@' characters are "detailed snapshots," meaning the ms_print output shows a function callgraph for these particular snapshots similar to the function callgraph for the peak snapshot (49) above.

Massif takes an interesting approach, developing a visual profiler tool without going the extra step of making it X based. But it does seem to strike a good balance between ease of use and highlighting the information in which a user is probably most interested. Plus, there are a handful of options that make Massif fairly flexible, including options to increase the number of snapshots, the frequency of detailed snapshots, the size of the graph, etc. There is also an option to enable profiling of the stack.

Quick-Tip Q & A


A:[[ At the start of each semester, I need to show a new wave of students
  [[ the basics of using Linux.  I usually start with commands like cd,
  [[ cat, cp, vi, etc., but I know they would benefit from learning dozens
  [[ of other commands I use frequently but can never remember off the top
  [[ of my head.  Do you have any tips or tools I could use to help
  [[ brainstorm what commands to show them?

#
# The most fitting suggestions might be a single command away, as 
# Rahul Nabar points out:
#

    Use the history feature to keep track of your commands.  Then run a sort 
    to see what commands you've used most often.

    Perhaps make an alias "addlast" that adds the last command from the 
    history onto a special stack.  Use addlast whenever you find yourself 
    typing something good that you'd like to pass on.

    Other commands that come to mind:

    man, more, less, apropos, various pipes, sed, awk, tr, sort, cut

#
# Rich Griswold provided the following book recommendation:
#

    I found Unix Power Tools (http://oreilly.com/catalog/9780596003302) to
    be really useful in getting beyond the basics.  It opened my eyes to
    the power of the shell and standard Unix commands.

#
# And Greg Newby's thorough response is nearly a book of its own:
#

    0. Mention that Unix/Linux is case sensitive.  That the "correct"
    response to most successful commands is nothing.  That there is a
    command search PATH that might not include your current directory.

    1. Talk about files.  How to access, view, remove, etc.  mv, cp, rm,
    cat, less, more, touch

    2. man; navigation of man pages, search with /; apropos or "man -k"

    3. Talk about basic username characteristics and files: UID, GID,
    /etc/passwd, "whoami".  Also, "w"

    4. Understanding elements of "ls -l" output, cover "ls -a" and "ls -d"
    and "ls -R", basics of chmod.  pwd, cd

    5. Shells and environment variables.  .cshrc, .profile and others.
    Your $PATH, prompt.

    By the way: I help people set a prompt that shows current directory.
    It bypasses many problems people have of knowing what directory
    they're in

    6. Filesystems and filesystem hierarchy.  $HOME, /, "cd ..", df, du

    7. The very basics of vi, and that you sometimes find yourself
    in vi by surprise.  How to get in and out of vi.

    8. Shell job control.  ps, jobs, bg, fg, ^z, kill

    9. Customization.  An alias.  Shell prompt customization or other
    settable shell behaviors.  Writing a simple shell script and adding it
    to your $PATH

    I prefer if people can actually try it while you are teaching it.
    Provide a link to a written command reference.  Mention differences or
    characteristics of the particular systems the students are likely to
    use.  Find out whether people have different shells, and make sure
    they know what might be different.


Q: I have an input file with space and newline delimited ASCII input.  
   The first few lines have 1 to 5 values each, but the next several
   thousand lines should all have 20 values per line.  I recently found
   a file that had the wrong number of values on one of those lines so I
   need to start checking these files.  Obviously, visual inspection is
   not my preferred option.  How can I do this check and find any lines
   that don't have 20 values?  There's got to be an easy way to do this!
 

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top