ARSC HPC Users' Newsletter 406, August 19, 2009

Challenges 2009

With an ability to solve 30 trillion arithmetic calculations a second, the newest supercomputer at the Arctic Region Supercomputing Center is helping researchers develop advanced tools for arctic-specific, high- resolution weather forecasting to include models of smoke dispersion from wildfires.

Supercomputers at ARSC are also being used to build models that risk managers in coastal communities can use to better understand how melting ice sheets affect sea levels worldwide.

These stories and more are featured in the 2009 edition of Challenges, the annual magazine of scientific discovery, analysis and prediction published by the Arctic Region Supercomputing Center at the University of Alaska Fairbanks. The magazine is online at:

link no longer available

and features web extras, including a time-lapse movie of the installation of Pingo, a 3,456-processor Cray XT5 supercomputer.

Supercomputers are an essential tool in addressing, understanding and solving some of Alaska's and the nation's most important challenges.

ARSC is the sole provider of open research computing capabilities for the Defense Department's High Performance Computing Modernization Program. There are six DoD Supercomputing Resource Centers throughout the country: two in Mississippi and one each in Maryland, Ohio, Alaska and Hawaii. ARSC distinguishes itself by conducting computational scientific analysis and research in polar regions.

High-latitudes environmental modeling seeks to better understand and predict phenomena ranging in size from microns to thousands of miles, and ranging in time from fractions of a second to millennia.

Geophysical phenomena of interest include oceans, the atmosphere, hydrology and ice. Within ecosystems, fisheries and other living things are examined, as well as the complex changes to and interactions among ecosystems over time.

Adventures in Running a Script from Java

[ By Brys Sepulveda ]

As an intern at ARSC, I found myself applying knowledge learned at my home university in new ways that challenge me to search for answers to new questions. But in this case, the answer turned out to be more confusing than the question.

Java, a nice programming language developed by Sun, is good for multi-person projects while being fairly simple to use. Chances are good that for any operation you want to perform, there is probably a module somewhere in a Java library that will make it straightforward.

People have even created whole languages based on Java such as Processing, a language geared towards easy data visualization. Processing acts as a huge Java module, providing a comprehensive API that allows Java to be interpreted. Previously lengthy code segments can be reduced to one or two lines in Processing. Also, anything available in Java is available in Processing, which makes it rather powerful. When I found myself needing to call a Python script using a Processing script I had written, I turned to the Java API to help me out. What I hadn't realized is that the documentation for this is sketchy at best.

To start, the approach is convoluted, requiring several helper classes before even attempting to call the script. An instance of a Process object needs to be created and used to call an instance of a Runtime object that calls a native method, getRuntime(), which retrieves the current runtime environment.


  Runtime runMe = Runtime.getRuntime();
  Process py;

If all of that wasn't confusing enough, according to the exec() method documentation, there are many different types of exec() methods that accept both a single string or a string array. The documentation for using a string array says:


  // Executes the specified command and arguments in a separate process.
  exec(String[] cmdarray);

That, together with everything I read in every online forum I could possibly find, told me that exec() simply executes its argument as if it were a shell command. A full two hours of testing various inputs, formats, and ways of calling the method revealed that fact to be incorrect. When I passed exec() the path to my Python script, it simply opened the script in a text editor like IDLE:


  py = runMe.exec(path/pythonscript.py);  // Opens IDLE :(

So, how do I call this script from Java? I needed the script to execute, not just open. I discovered that for exec() to work, I had to run a shell and pass it arguments. I also discovered that if I didn't parse the standard output and standard error of the program called with exec(), it would appear to do nothing. If an error occurred, it would not return an error message.

I tried passing the following string array to exec():


  String[] temp = new String[3];
  temp[0] = "/bin/bash"; 
  temp[1] = "python"; 
  temp[2] = "path/pythonscript.py"
  py = runMe.exec(temp);

It almost worked how I intended, but instead of instructing bash to run my Python script, this string array tried to pass "python" and "path/pythonscript.py" as parameters to bash itself, not as a command for bash to run. After more research, I learned that the -c flag is needed to pass a command to bash. With this, I was finally able to run my Python script inside an exec() call:


  Runtime runMe = Runtime.getRuntime();
  Process py;
  try {
    // Send command to the command line and execute
    String[] temp = new String[3];
    temp[0] = "/bin/bash"; // Set up terminal
    temp[1] = "-c"; // Prepare for command input
    temp[2] = "python path/pythonscript.py"; // Pass this command
    py = runMe.exec(temp);
            
    // Handle error output if any
    InputStream err = py.getErrorStream();
    InputStreamReader isr = new InputStreamReader(err);
    BufferedReader buff = new BufferedReader(isr);
  }

I hope this helps anyone trying to program something similar in Java.

Introduction to Gnuplot, Part I

[ By Anton Kulchitsky ]

This article is the first part in a series of articles about gnuplot, an excellent tool for data visualization. To make this series of articles more fun for both readers and, no less important, the author, I decided to cover some advanced topics.

Getting Started

If given the choice, use the CVS version of gnuplot. Despite being 30 years young, gnuplot has a very fast release cycle. Moreover, some very interesting tools were added recently that are not in version 4.2, the default version on most new systems. On many HPC systems, however, the default version is 4.0 or even 3.7.

The following command will show what version of gnuplot is installed:


  $ gnuplot --version

Let's start by generating some data to work with. I wrote a simple Python script, "datamaker.py", that will create the text file "data.dat", which will be used by the examples in this article. The data.dat file contains data in a gnuplot-compatible 3D format of an interesting looking function in the [0:1]x[0:1] interval.

The contents of data.dat are of this form, where sets are separated by an empty line:


  x0  y0  z00
  x1  y0  z10
  x2  y0  z20
  ...
  xN  y0  zN0

  x0  y1  z01

The source code of datamaker.py is as follows:


#!/usr/bin/env python
from math import *

def f(x,y):
   '''interesting looking function'''
   return sin(2*x)*cos(3*y) + x*y

fin = open( 'data.dat', 'w' )

for j in range(101):
   for i in range(101):
       x, y = i/100.0, j/100.0
       fin.write( "%f %f %f\n" % ( x, y, f(x,y) ) )
   fin.write("\n")

fin.close()

Run this script with the following command to create data.dat:


  $ python datamaker.py

PostScript

All the examples in this article produce PostScript images. They are good for both publications and visualization. I personally use either gv (a favorite tool of many academic wizards like Donald Knuth) or evince (a Gnome application for visualizing many different formats) as front-ends to Ghostscript.

Example #1 - PM3D

Use your favorite text editor to create the following file, named "example01.gp":


#
# example01.gp, pm3d plotting of data.dat; 
#

### Terminal/output options

# first we set the "eps" output format, "enhanced", because we would
# like to use special commands for fonts, "color" due to we like color
set terminal postscript eps enhanced color

# set the output file name
set output 'data.eps'

### PM3D options

# we would like nice color map, not a 3D plot
set view map

# we do not want the surface to be plot over
unset surface

# pm3d is the color plot, pm stands for "palette-mapped";
# we use both data and function to be plotted using pm3d in this example
set style data pm3d
set style function pm3d

# put labels over the graphs
set ticslevel 0
set pm3d implicit at b

### Grid and Key

# we unset grid and key for plots read for publication
unset grid
unset key

### Labeling the axes
set xlabel "x"
set ylabel "y"

### PLOT: use 'splot' for 3D and 'plot' for 2D
splot 'data.dat'

Now, let us just run gnuplot with these commands:


  $ gnuplot example01.gp

We get the file "data.eps", which can be viewed with either evince or gv:


  $ evince data.eps

or


  $ gv data.eps

It lacks a title, good labels and the fonts are unsatisfactory. However, it is good enough to see a quick visualization of the data. Now, let us add these missing elements to make the plot ready for publication.

High Quality Fonts

Example #2 - Fonts

The "Blue Sky" type 1 PostScript fonts are public domain and of great quality. You can read about them at the following web page:

http://www.math.utah.edu/~beebe/fonts/bluesky.html

These are Computer Modern fonts designed by Blue Sky Research and improved by Y&Y. They were commercial products, but thanks to the efforts of Y&Y and the American Mathematical Society, they were released for public domain in 1997. You might have them already installed on your system. They usually come with Live TeX, the default TeX distribution on Linux or Mac systems.

If you do not have TeX installed, or the distribution you use does not have these PostScript fonts, you can simply download these fonts from CTAN. As a property of the American Mathematical Society, they are located together with all other free AMS fonts. You can download them from this URL:

http://tug.ctan.org/get/fonts/amsfonts/amsfonts.zip

For this article, we will extract amsfonts.zip into:


  ~/usr/share/fonts/amsfonts

The following directory now contains the fonts we need:


  ~/usr/share/fonts/amsfonts/fonts/type1/public/amsfonts/cm

We will also need the cm-super font family, which is a very good alternative to standard fonts like Helvetica. This can be downloaded here:

http://www.ctan.org/get/fonts/ps-type1/cm-super.zip

Let's unzip this file into the following directory:


  ~/usr/share/fonts 

Now we can modify example01.gp to use these fonts.

In the script below, please replace "DIR" with your home directory, where the "usr" subdirectory is located. This should be an absolute path. You can determine the absolute path of your home directory with the following command:


  echo $HOME

This step is necessary because gnuplot is unable to reference the $HOME environment variable itself.


#
# example02.gp, pm3d plotting of data.dat:
#
# improved with PostScript Type 1 free fonts
#

# first we set the "eps" output format, "enchanced", because we would
# like to use special commands for fonts, "color" due to we like color
set fontpath \
"DIR/usr/share/fonts/amsfonts/fonts/type1/public/amsfonts/cm" \
"DIR/usr/share/fonts/cm-super/pfb"

### Terminal/output options

set terminal postscript eps enhanced color \
fontfile 'cmmi10.pfb' \
fontfile 'cmti10.pfb' \
fontfile 'sfrm1000.pfb'

# set the output file name
set output 'data.eps'

### PM3D options

# we would like nice color map, not a 3D plot.
set view map

# we do not want the surface to be plot over
unset surface

# pm3d is the color plot, pm stays for "palette-mapped"; we use both
# data and function to be plotted using pm3d in this example

set pm3d
set style data pm3d
set style function pm3d

# That are a couple of tricks to put labels over the graph. Well, we
# do not use this in this particular example. However, it does not
# matter. We always want labels to be over the graph usually set
# ticslevel 0 set pm3d implicit at b

### Grid and Key

# we unset grid and key for publication ready plots.
unset grid
unset key

### Defaults fonts: big enough for professors
set ytics   font "SFRM1000,34" 
set xtics   font "SFRM1000,34"
set ylabel  font "SFRM1000,34"
set xlabel  font "SFRM1000,34"
set cblabel font "SFRM1000,34"
set cbtics  font "SFRM1000,34"

### Labeling the axes
set xlabel "{/CMMI10 \013}"
set ylabel "{/CMMI10 \014}"

## Labeling the plot
set label "T" at graph 0.8, graph 0.2 font "CMTI10,46" front

### PLOT:
splot 'data.dat'

First, we set the path where gnuplot should search for the fonts. For PostScript fonts, these are the directories where pfb files are located. Then we load the fonts we want to use. Font names are the same as their file names, except capitalized and without the ".pfb" extension. The SFRM1000 font is set as a default for all labels and legends. We use the mathematical fonts CMMI10 and CMTI10 for alpha, beta, and T. We also put a label on the plot "T" using graph coordinates. These are the coordinates alpha and beta in our notation.

Generally, titles and labels on the plot should be the same font size used in the paper or bigger. In this example, we used size 34.

Run example02.gp the same way as in the previous example:


  $ gnuplot example02.gp

And again, to view the plot using gv:


  $ gv data.eps

(Note: TrueType fonts can also be used with gnuplot.)

As an additional reference on this topic, I suggest the following links:

Gnuplot documentation about fontfile option:

http://www.gnuplot.info/docs/node413.html

A small article about using PostScript in gnuplot by Harald Harders. This article also contains all of the octal codes for mathematical symbols from Computer Modern Blue Sky PostScript fonts and describes the basic usage of these fonts with gnuplot. It comes with the gnuplot distribution in:


  docs/psdoc/ps_fontfile_doc.tex

In the next article, I will discuss more PostScript features and different aspects of color representation, contours, and legend customization. After this, I will cover the usage of gnuplot from Python scripts, plotting data from NetCDF files, and compare gnuplot and ncl.

Quick-Tip Q & A


A:[[ I am developing a script that reads in a formula from an input file
  [[ to be used with a series of values (qp and et).
  [[ 
  [[ E.g.
  [[ val=qp + 0.000277777 * et
  [[ 
  [[ Is there a way to use this formula within my script without writing
  [[ my own formula parser?  Here is an example of how I want to use this.
  [[ 
  [[ % cat formula
  [[ val=qp + 0.000277777 * et
  [[ 
  [[ % cat myscript
  [[  ...
  [[ for all qp
  [[    for all et
  [[       <execute formula and print val>
  [[  ...
  [[ 
  [[ Then the output would be something like:
  [[ 
  [[ qp    et    val
  [[ 0.0   0.0   0.0
  [[ 0.0   1.0   0.000277777
  [[  ...
  [[ 1.0   0.0   1.0
  [[ 1.0   1.0   1.000277777
  [[  ...
  [[ 
  [[ The script is currently written in Perl, but I wouldn't be opposed to
  [[ using Python.


#
# Greg Newby shows how the bc command can be used to evaluate this
# formula from a shell:
#

The bc command might be a good choice for these type of formulas.  It
can operate on standard input, or you can write a little script ("man
bc" has some detailed usage guidelines).  Based on your question (and
ignoring the setup of data for the loop, which may be accomplished a few
different ways):

qp=10
et=20
val=`echo "$qp + 0.000277777 * $et"
bc -l`

However, if you already have a little Perl or Python to do this, I don't
see much advantage in using a shell command.  If you're doing
computation on many values of qp and et, the shell script will likely be
quite a bit shorter.  But if your goal is to work with flexible
formulas, even formulas that are read from a file or standard input,
without writing your own parser, then bc could be quite helpful.

A few related standard Unix/Linux commands are dc (reverse polish
notation) and expr (doesn't handle floating point).

#
# In his response, Tom Baring explains how this can be achieved with a
# clever combination of Perl's eval command and regular expressions ...
#

Here's a perl answer...

The magic is in the "e" modifier of the "s///" command which enables
evaluation of the substitution string.

%  cat formula
$val = $qp + 0.000277777 * $et

%  cat eval_it.prl
#!/usr/bin/perl -w

open IN, "< formula";
$formula = <IN>;
chomp $formula;
close IN;

print "\nUsing formula: $formula\n";
print "\$qp\t\$et\t\$val\n";

foreach $qp (0 .. 2) {
 foreach $et (0 .. 2) {

   $val = $formula;
   $val =~ s/.*=(.*)/eval($1)/e;

   print "$qp\t$et\t$val\n";
 }
}

%  ./eval_it.prl

Using formula: $val = $qp + 0.000277777 * $et
$qp     $et     $val
0       0       0
0       1       0.000277777
0       2       0.000555554
1       0       1
1       1       1.000277777
1       2       1.000555554
2       0       2
2       1       2.000277777
2       2       2.000555554

#
# ... and Don Bahls' response proves that former ARSC HPC Users'
# Newsletter editors think alike:
#

In perl you can do something like this with a combination of regular
expressions and the eval command.

pingo1 % more formula
val = qp + 0.000277777 * et

#!/usr/bin/perl

use strict;
# read the formula from the file.
my $form=`cat formula 
 cut -d "=" -f 2`;
chomp($form);
my $qp;
my $et;

printf("Formula is $form\n\n");
printf("%10s %10s %10s\n", "qp", "et", "val");
foreach $qp (1,2,3,4,5)
{
   foreach $et (5,6,7,8)
   {   # copy the formula
       my $tform=$form;
       # replace et with $et and qp with $qp
       $tform=~s/qp/$qp/g;
       $tform=~s/et/$et/g;
       # use eval to generate the result.
       my $res=eval "$tform";
       # print $qp, $eq and $val
       printf("%10f %10f %10f\n", $qp, $et, $res);
   }
}


Q: I have a cron job that invokes a script, update.pl, every hour.
   Sometimes this script takes more than an hour to run, causing two
   of these script processes to overlap, which makes them interfere with
   each other.  Is there a way I can tell this update.pl cron job to run
   ONLY if there is not already an update.pl process running, to prevent
   the processes from overlapping?
 

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top