ARSC HPC Users' Newsletter 410, January 29, 2010

esLogin Nodes Available on Pingo

In an effort to provide users with a more robust pingo login environment, ARSC has installed two new external service login (eslogin) nodes, pingob.arsc.edu and pingoc.arsc.edu. Each of these nodes contains four quad-core processors, 128GB of shared memory, and a 10 Gigabit Ethernet Interconnect. Individual process memory limits on these eslogin nodes are currently set to 4GB for soft limits, and 16GB for hard limits.

ARSC is encouraging all pingo users to login to the new eslogin nodes and take advantage of the increase in available resources.

On March 10, 2010 we plan on changing the default "pingo.arsc.edu" login to point to the eslogin node. Please review the pingo eslogin news item for specific details about the programming environment on the eslogin nodes:

http://www.arsc.edu/support/news/systemnews/news.xml?system=pingo#pingo_eslogin

As always, please contact the ARSC Help Desk with any questions.

More About Git

[ By Kate Hedstrom ]

Back in issue 404, I wrote an introduction to git, a new open source version control package. That article is available here:

/arsc/support/news/hpcnews/hpcnews404/index.xml#article1

I've since been learning more about it, especially reading much of the O'Reilly book "Version Control with Git" by Jon Loeliger over the holiday break. There are a few things which I've found just a little confusing or surprising coming from a cvs/svn background, in addition to the concept of a distributed repository system. I'd like to talk about one just a little, but first, I'd like to apologize for this incorrect statement from last time:

It can then be pointed to a different SVN server - from the same sandbox! Magic!

It turns out that's not true - a git sandbox can point to any number of remote git repositories, but to at most one remote svn server. All is not lost, however, since you can have two git sandboxes, each pointing to one svn server, but as git remotes of each other. Thanks to Brian Powell for sending me this link:

http://labs.trolltech.com/blogs/2009/04/03/two-kde-svn-branches-and-git/

I actually think you can point to two branches in the same svn site from one git directory, but you have to tell it how the branches are layed out in the "git svn clone" operation. I haven't tried it yet, though.

The Index

The concept of the index is new to me, and something I didn't come to appreciate until working with git for a bit. It is a staging area for building up your next commit, between the working directory and the repository. You can have changes going on which should logically be checked in separately - you add the first set to the index with "git add file1", then commit with "git commit", leaving changes to file2 to be checked in later, assuming the changes aren't all in the same file.

Also, "git diff" is a diff between the working directory and the index, not a diff between the working directory and the latest commit (HEAD), as is the case with cvs and svn. To get the diff from the last commit, use instead "git diff HEAD". Likewise, "git diff --cached" is the diff between the HEAD and the index, showing what would be checked in with a "git commit".

Also, when doing a "git pull" or a "git merge", only the conflicts show up with "git diff". I've been caught by this one, thinking I knew what all had arrived in a "git pull" from a colleague! I think the advice of sticking to "git fetch" rather than "git pull" is one I'll try. By the way, "git pull" is a "git fetch" followed by a "git merge".

I'm still committed to getting more comfortable with git and using it more intelligently. We got a proposal funded which will involve me having to work with svn at two remote sites in two different states, one of which is now woefully out of date. Call it a New Year's resolution to get that fixed with git!

Paraview on Pingo

[ By Patrick Webb ]

ARSC is pleased to announce the availability of Paraview 3.6.2 on Pingo, available for all users. A news article that describes the connection process is posted on Pingo and is accessible by typing "news paraview". The process is automated enough that a basic client/server session can be created. This article will go through the options available to a user in order to allow that user to get the most out of their Paraview session.

The process for running Paraview on Pingo demonstrated in the news article allows the user to modify and customize their session starting at step 5. Steps 1-4 consist of guiding the user through downloading the connection script that does all the heavy lifting of creating the connection to Pingo. Step 5 is where the user can set up options that will define the resources needed, the task time, and the connection parameters. A good configuration will increase the amount of work that a user can get done. It is also possible to select options that will cause errors and kill a connection, so let's go through the options and determine the best way to set them up.

Paraview Version: This item currently contains only one option (version 3.6.2). Future upgrades to Paraview may change this menu if an older version is needed.

SSH executable: Path to your local ssh executable. The default setting is for the ARSC Linux workstations. If your system is different, put the path here.

Username: Enter the username that you would use to ssh to an ARSC system.

Server name: This is the name of the login node you will connect to. Pingo1-6 are listed in this menu. For the most part, any will do.

Queue name: Select the Pingo PBS queue you will use.

Client port number: This is the port that the Paraview client running on your machine will use to connect to Pingo. If you are running multiple instances of Paraview, this number will need to be different to avoid conflicts. This option is available if, for some reason, it is important to define which port on your local system is opened. The default value is usually fine.

Remote port number: This is the port that will be opened on Pingo. Again, multiple instances of Paraview can't use the same ports so changing this number may prevent conflict. The default is usually just fine.

Connection ID: Built in security measure for Paraview. The server side of the Paraview connection will only accept a connection from a client that has this ID number (that is, yours).

Number of Processes: This is how many instances of Paraview you will be running. This is equivalent to aprun -n<some#> where <some#> is the number of Paraview processes. Keep in mind that this number is closely related to the next option, Processor Tiling. It MUST be greater than or equal to the number selected in the Processor Tiling option.

Processor Tiling: This option controls how many instances of Paraview will run on each node. This value MUST be less than or equal to the Number of Processes option.

The previous two options give the user control over how the Paraview application is spread across nodes. The math to figure out how many nodes you require is straightforward, but should not be ignored. 40 Paraview processes tiled 8 per node only requires 5 nodes, but reducing the tiling number can mean requesting many more nodes than intended. The default of Processor Tiling is set to 8 in order to get the most instances onto one node as possible, but the script will remember previous selections, so if you change the values, double check them before you click "Connect"!

Wall time: This option is very simple, just be aware that it is counted in minutes and the default is only 5. While working, keep your eye on the clock and don't forget to save your work early and often!

That's all the options! Depending on the usage of the system, Paraview will take as little as 5-10 seconds to establish the connection. You will have access to the Pingo file systems and all of your data therein.

Once you are using Paraview with Pingo, there are some behaviors that might seem unusual for Paraview. For example:

Any interruption of the ssh connection between the Paraview client and server will terminate both processes.

If the walltime times out, the server process will terminate, but the client process will not. The server process terminating will not terminate the client, however the client terminating (crash, user close, etc.) will kill the server.

Your user settings in the connection script will be remembered if the Paraview client terminates 'nicely', but if it crashes it will revert to defaults.

Every session will produce a pvserver.<processID> file in your home directory. It contains all the error and log output from the Paraview session.

Happy visualizing!

Brain Teaser

We were sent this puzzle by a certain esteemed reader who thought our audience might enjoy it. We will put a little twist on it and say, send us your answer and your algorithm for computing it and we will publish all the solutions that we deem "reasonable" in the next issue. We know one solution. Are there others?

If these conditions apply:     2 + 3 = 10     7 + 2 = 63     6 + 5 = 66     8 + 4 = 96

Then, what is the result of:     9 + 7 = ????

Quick-Tip Q & A


A:[[ I need to generate some test data for a program I am writing.
 [[ The input to the program is a list of integers.  How can I generate
 [[ all the permutations of a list of n integers?  For example, for
 [[ n=3, my generator would have an input of 3 and it would output
 [[ something like:
 [[ 
 [[    1 2 3
 [[    1 3 2
 [[    2 1 3
 [[    2 3 1
 [[    3 1 2
 [[    3 2 1
 [[

#
# Having already solved this problem years ago for an unrelated project,
# Dale Clark was quick to submit the script he had on hand:
#

This script prompts for a word, then generates all its permutations. Of 
course, this would also work for integers compressed into a word without 
spaces. But it doesn't squash repeated instances. Anyway, it's a kind of 
solution I generated some time ago for a separate word problem.

#!/usr/local/bin/perl
# Permutation generator.

# 1998-03-05 dvc First working version.
# 1998-10-29 dvc Now loops 'til done.

# todo: print permutations that are dictionary words.

until ($Done)
{
     &GetWord(*Done,*Stack,*Word)
  && &Permute(*Stack,*Word);
}

sub GetWord
{
  local(*Done,*Stack,*Word) = @_;

  ($Done,@Stack,%Word,$i) = ();
  print "Enter the word you wish to permute: ";
  ($_ = <STDIN>) =~ s
^\s*(.*\S)?\s*$
$1
;
  if (m
\S
)
  {
    for (split //)
    {
      $Word{$i} = $_;
      push @Stack,$i;
      $i++;
    }
  }
  else
  {
    $Done = 1;
  }
  return not $Done;
}

sub Permute
{
  local(*Stack,*Word) = @_;
  local($Done,$Perms,$StackPtr,@Temp,$i,$j) = ();
  
  while (not $Done)
  {
    $Perms++;
    &PrintStack(*Stack,*Word);
    $StackPtr = $#Stack;
    $StackPtr-- while $StackPtr and $Stack[$StackPtr - 1] > $Stack[$StackPtr];
    if (0 == $StackPtr)
    {
      $Done = 1;
    }
    else
    {
      $StackPtr--;
      $i = $Stack[$StackPtr];
      @Temp = sort { $a <=> $b } splice @Stack,$StackPtr;
      for $j (0 .. $#Temp)
      {
        if ($Temp[$j] > $i)
        {
          push @Stack,splice @Temp,$j,1;
          last;
        }
      }
      push @Stack,@Temp;
    }
  }
  printf "Total permutations = %d.\n",$Perms;
}

sub PrintStack
{
  local(*Stack,*Word) = @_;

  for (@Stack) { print $Word{$_} }
  print "\n";
}

#
# Greg Newby shows how this can be done in many different languages:
#

For a less-than-quick tip, see:

 Knuth, Donald A.  2005.  "The Art of Computer Programming, Volume 4,
 Fascicle 2: Generating All Tuples and Permutations."
 Addison-Wesley.  ISBN: 0201853930

For one quick solution, install Algorithm::Permute then "perldoc
Algorithm::Permute" for details.  Here is an example that takes
command-line arguments and permutes them (it does not check for
duplicates or ranges, and will permute letters or strings):

 #!/usr/bin/perl -w
 use strict; use Algorithm::Permute;
 die "nothing to permute, exiting\n" unless defined @ARGV; 
 Algorithm::Permute::permute { print "@ARGV\n" } @ARGV;
 exit;

Python is quick, too, and comes with the itertools module built in:

 #!/usr/bin/python
 import itertools
 import sys
 sys.argv.pop(0)  # argv(0) is the program name
 print list (itertools.permutations(sys.argv))

Ruby is quick if you install the Ruby library, Permutation.  See:
 http://permutation.rubyforge.org/

 #!/usr/bin/ruby
 require 'rubygems'
 require 'permutation'
 perm = Permutation.new(ARGV.length)
 print perm.map { 
q
 q.project(ARGV) }

The iteration methods used for these built-in functions might not be
particularly efficient, and could require lots of memory for large lists of
items to permute.  See Knuth for details.

For C/C++, see the gsl_permute() function in the Gnu Scientific Library.
MATLAB and Octave have a built-in "permute" function.  I did not find a
comparable built-in function for Fortran.  Of the examples above, only the
Python example works without first installing additional software
components.

[ Editor's Note: The permutations() method was added to the itertools 
module in version 2.6 of Python, which is available on Midnight by loading 
the module python-2.6.2 and on Pingo by loading python/2.6.2. ]

#
# Jed Brown's solution demonstrates how easy this is to do using the 
# Glasgow Haskell Compiler (ghc):
#

This is mostly formatting

 ghc -e 'mapM_ (putStrLn . Data.List.unwords) . Data.List.permutations . map (:[]) $ "abc"'

[ Editor's Note: ghc is not currently installed on any ARSC systems. ]

#
# Scott Kajihara's solution takes a memory-conscious approach:
#

All right, a recursive Perl solution which is irking as an iterative is
always preferable. Oh, and it uses lists-of-lists (or arrays-of-arrays)
which always makes a Perl script fun.

I have tried to keep the memory usage down as much as possible by
printing out a topmost iteration's permutations when they are known,
but as the number of permutations goes like \Gamma(n+1), no guarantees
about scaling.

========================================================================
my @list = (1 .. 5);
my @result = ();

permute(\@list, \@result, 1);

sub permute {
# Takes $input list and returns all permutations in $output list-of-lists
# $first flags the outermost loop 

 my ($input, $output, $first) = @_;

 my $count = $#{$input};
 my ($subset, $head);
 my @temp;

 if ($count > 0) {
   for ( ; $count > -1; --$count) {
     $head = shift @$input; # removes head of list
     @temp = ();

     permute($input, \@temp, 0);

     push @$input, ($head); # puts original "head" at tail

     unless ($first) {
       for ($subset = $#temp; $subset > -1; --$subset) {
# puts "head" at head of each permutation of sublist

         unshift @$output, [$head, @{$temp[$subset]}];
       }
     }

     else { # prints out current permutations
       for ($subset = $#temp; $subset > -1; --$subset) {
         print join(" ", ($head, @{$temp[$subset]})), "\n";
       }

       @$output = ();
     }
   }
 }

 else {
   unshift @$output, [@$input];
 }
}
========================================================================



Q: One of the programs I use to preprocess my data occasionally hangs 
   when I give it a bad input file.   Is there an easy way to set a 
   timeout for a command?

   Usually the program runs in under a minute, but it's embarrassing 
   when it hangs on the input file.

   Yes I'm trying to fix the problem, but I'm hoping a timeout will help 
   me avoid using all of my CPU time.

   My script looks like this:

   #!/bin/bash
   ...

   # I'd like to be able to specify the following command not run longer 
   # than 90 seconds.
   ./preprocess input

   mpirun ./myjob
 

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top