## ARSC HPC Users' Newsletter 262, January 24, 2003

### User's Introduction to ARSC Supercomputers

Upcoming training:

Date:       Weds., Feb 12th, 2pm
Location:   Butrovich 109
Instructor: Kate Hedstrom, ARSC

Topics:
=======

Architectures and capabilities of the Cray SV1ex, Cray SX-6,
Cray T3E, IBM SP cluster, IBM Regatta, and linux cluster

Programming models

Programming Environments:
Compilers
Debuggers
Performance analysis tools

Running jobs
Interactive and batch
Submitting batch jobs
Checking job status

Model output
Storing files
Visualizing results

For now, e-mail training@arsc.edu to register, or with questions. Our training web pages and regular registration forms, are not up yet but we'll have another announcement later.

### A Note On Vectorization and Inlining On The SX-6

[ Thanks to Ed Kornkven of ARSC. ]

Here's a pair of Fortran routines:

implicit none
integer j, n
real radii(n), colsum, area

colsum = 0.0
do 50 j=1, n
colsum = colsum + area(radii(j))
50 continue
return
end

real function area(r)
implicit none
real r, pi
data pi /3.141592653589/

area = pi * r * r
return
end

When these are compiled (on the SX-6 front-end, using the sxf90 cross-compiler), as follows:

sxf90 -c -Cvopt -Wf,'-pvctl fullmsg' ex2.f

We get these messages from the compiler:

f90: vec(3): ex2.f, line 8: Unvectorized loop.
f90: opt(1025): ex2.f, line 9: Reference to this function
inhibits optimization.
f90: vec(10): ex2.f, line 9: Vectorization obstructive
procedure reference.  :area

The function call in the middle of the DO loop prevents it from being vectorized. The solution is to have the compiler "inline" the function, thus removing the function call and the obstacle to vectorization. Adding the option "-pi exp=area" requests inline expansion of "area," as follows:

sxf90 -c -Cvopt -pi exp=area -Wf,'-pvctl fullmsg' ex2.f

And we get:

f90: vec(3): ex2.f, line 8: Unvectorized loop.
f90: opt(1025): ex2.f, line 9: Reference to this function
inhibits optimization.
f90: vec(10): ex2.f, line 9: Vectorization obstructive
procedure reference.  :area

What happened? It turns out that the presence of the DATA statement inside the function prevents the optimizer from doing the inlining. A simple solution is to replace the DATA statement with an assignment:

real function area(r)
implicit none
real r, pi
!      data pi /3.141592653589/

pi = 3.141592653589
area = pi * r * r
return
end

Recompiling again gives both inlining and vectorization, as desired:

f90: vec(1): ex2.f, line 8: Vectorized loop.
f90: vec(24): ex2.f, line 8: Iteration count is assumed.
Iteration count=5000
f90: opt(1222): ex2.f, line 9: Procedure expanded inline.
f90: vec(26): ex2.f, line 9: Macro operation Sum/InnerProd.

### Don't forget stdlib.h

Definitions for many utility functions used in C programs, including "malloc," appear in stdlib.h. If you use any of them, be sure to include stdlib.h in the appropriate files.

This example shows a problem encountered this week at ARSC, porting a program to the SX-6 that had failed to include stdlib.h:

/*------------------------------*/
/* Program: ptrsz.c             */
/*------------------------------*/

#include <stdio.h>

/* #include <stdlib.h> */     /*** COMMENTED OUT FOR ERROR ***/

main () {
char *p;

printf ("sizeof(p): %ld\n", (long) sizeof (p));

if ((p = (char*) malloc (sizeof(char) * 1)) == NULL) {
printf ("malloc failed\n");
exit (1);
}
else {
printf ("malloc succeeded\n");
}

*p = 'x';
printf ("p= \'%c\'\n", *p);
}
/*------------------------------*/
/* end of ptrsz.c               */
/*------------------------------*/

This won't work on the SX-6:

rime\$ cc -o ptrsz ptrsz.c
"ptrsz.c", line 9: warning: improper pointer/integer precision : op "CAST"
rime\$ ./ptrsz
sizeof(p): 8
malloc succeeded
core dumping
Bus error(coredump)

We can create a similar situation on the IBM p690:

iceflyer 293% cc -q32 -o ptrsz ptrsz.c
iceflyer 294% ./ptrsz
sizeof(p): 4
malloc succeeded
p= 'x'

iceflyer 295% cc -q64 -o ptrsz ptrsz.c
iceflyer 296% ./ptrsz
sizeof(p): 8
malloc succeeded
Segmentation fault(coredump)

The p690 can issue a warning like the SX-6, if we ask for it:

iceflyer 297% cc -qwarn64 -q64 -o ptrsz ptrsz.c
"ptrsz.c", line 9.12: 1506-745 (I) 64-bit portability: possible
incorrect pointer through conversion of int type into pointer.

iceflyer 298% ./ptrsz
sizeof(p): 8
malloc succeeded
Segmentation fault(coredump)

We'll take our explanation from CSIRO's HPCCC Users' FAQ:

http://www.hpccc.gov.au/External/faq/

======================================================================
What is the meaning of the compiler message: warning: improper
pointer/integer precision : op "CAST"?

If you find the warning associated with "calloc" or "malloc"... the
memory allocation routines are defined in stdlib.h and if this is not
included, the return type of the alloc routines automatically becomes
int (default). NEC's cc IS ANSI compliant.  However the sizes of data
types are NOT prescribed by the standard. See C Programmer's Guide,
Chapter1, 1.2.7 Data Types and Sizes showing (SX5 has IEEE - float0
only):

Type        Size(in bits)
----        -------------
char        8
short       16
int         32
long        64
long long   64
pointer     64
float       32
double      64
long double 128
enum        32

On machines/compilers where sizeof(int) == sizeof(void*), it will of
course work. But ANSI C does not require this. In fact many references
on ANSI C specifically warn against this assumption. The NEC warning
is because it may or may not work, but since cc has found a sizeof
problem (smaller to larger) it puts out the message. Up to the
programmer to see if it matters.  ;-(

The moral: .. add #include <stdlib.h> to the .c files or appropriate
======================================================================

Following this advice, if we restore "#include <stdlib.h>" to the test program, no more core dump! The p690 case is slightly more interesting, so here you go:

iceflyer 288% cc -q32 -o ptrsz ptrsz.c
iceflyer 289% ./ptrsz
sizeof(p): 4
malloc succeeded
p= 'x'

iceflyer 290% cc -qwarn64 -q64 -o ptrsz ptrsz.c
iceflyer 291% ./ptrsz
sizeof(p): 8
malloc succeeded
p= 'x'

You'll generally find these routines (at a minimum) declared in stdlib.h:

abs        labs
div        ldiv
atof       atoi
atol       strtod
strtol     strtoul
calloc     malloc
realloc    free
abort      exit
atexit
system     getenv
bsearch    qsort
rand       srand

### 9,984 Miles Per Gallon

An ARSC staffer stumbled on this while looking around at CFD examples. Gasoline powered "cars" approaching 10,000 mpg (on short flat test tracks, but still...):

http://www.eco-marathon.net/

Come on car buyers... let's demand better efficiency!

### Quick-Tip Q & A

A:[[ Perl should make this easy... but it's driving me nuts!
[[
[[ In this example, I want to use search and replace to eliminate
[[ the bold html tags from some lyrics I've been working on, replacing
[[ the formerly emboldened text with the same text, prefaced by the
[[ word "really".  E.g.:
[[
[[        The weather is here, I <b>wish</b>
[[        you were mine, the sky is <b>so
[[        cloudy</b>, I sleep and I pine.
[[
[[ Here's my perl script:
[[
[[    #!/usr/local/bin/perl -w
[[
[[    \$all = join '', <>;
[[    \$all =~ s
<b>(.*)</b>
really \$1
gsi;
[[    print "\$all";
[[
[[ The script puts the entire file into one string so it can search
[[ across line breaks. The modifiers to "s" are:
[[
[[    g : match every occurance (not just the first)
[[    s : match newline characters with "."
[[    i : ignore case
[[
[[ Here's the output:
[[
[[        The weather is here, I really wish</b>
[[        you were mine, the sky is <b>so
[[        cloudy, I sleep and I pine.
[[
[[ You can see for yourself what happened.  Has anyone else ever had
[[ this problem?  What can I do?

#
# Thanks to Olivier Golinelli, Rich Griswold, and Steve Deitz.  Here
# are two of the three explanations.
#

The symbol * matches the MAXIMAL number of characters.  You must
replace the MINIMAL number of characters with *?

Then :
\$all =~ s
<b>(.*?)</b>
really \$1
gsi;

#####
#####
#####

The problem is that .* is greedy.  From the perlre manpage:

By default, a quantified subpattern is "greedy", that is, it will match
as many times as possible (given a particular starting location) while
still allowing the rest of the pattern to match.  If you want it to
match the minimum number of times possible, follow the quantifier with
a "?".  Note that the meanings don't change, just the "greediness":

*?     Match 0 or more times
+?     Match 1 or more times
??     Match 0 or 1 time
{n}?   Match exactly n times
{n,}?  Match at least n times
{n,m}? Match at least n but not more than m times

Simply replacing .* with .*? in your regex will do the trick.

Q: What's an easier way to get the value of pi into my C/C++ or Fortran
programs?

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
 Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
E-mail Subscriptions:
Archives:
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.