Long Term Storage Commands and Utilities

Long Term Storage Introduction

Bigdipper is a Sun SPARC Enterprise T5440 Server providing long term, backed up data storage for ARSC resources. Bigdipper hosts the /archive filesystem for the ARSC HPC resources. The server utilizes SAM-QFS for managing user files stored in the corresponding archive file system. SAM-QFS consists of "online" disk storage, "offline" tape storage, and a set of daemons managing file status.

When a file is saved to the archive filesystem, the file is initially "online". Shortly thereafter, one copy of the file is automatically written to an archive tape. The file will remain "online" if sufficient disk space is available, otherwise the file will be taken "offline" and will be removed from disk storage, leaving only the tape copy.  If the file is "online", it is immediately accessible to the user. If the file is "offline", the user must request a copy of the file to be brought "online" by the SAM-QFS system. Please read more on the stage and batch_stage commands to request files to be brought back "online". The staging process is highly encouraged, especially when potentially working with thousands of "offline" files.

SAM-QFS File Management Utilities

Users of pacman and fish have an account on bigdipper, where they can more efficiently manage their files. The SAM-QFS commands listed below can be used to manage data from bigdipper.

The following commands must be executed on bigdipper where your files are stored. For more information, please read the man page for the particular command (e.g. man sls).

archive

The "archive" command issues a request for a file, or group of files matching wildcards (when * or ? are specified), to be copied to tape. Please note that the return of this command does not mean the copy to tape has finished. Check for file transfer completion with the sfind or sls commands. Also, disk space is not released unless the release command is executed.

SAM-QFS will archive files automatically. This normally occurs within approximately four hours of creation or modification.

Usage:

archive filename

release

After a file has been archived, one copy exists on tape and one copy exists on disk. The "release" command will remove the copy of the "online" file if the "offline" (tape) copy has already been written. A directory listing will remain for the now "offline" file. To access the contents of a file after issuing the "release" command, the file must be copied from tape back to disk using the "batch_stage" command.

SAM-QFS may release files automatically over time depending on the level of activity on the file system.

Usage:

release filename

sfind

The "sfind" command finds files with the requested attributes. Searchable attributes include:

  • -offline (file copied to tape and disk space released)
  • -online (copy exists on disk)
  • -archdone (all archive/stage steps completed)
  • -copies 2 (both copies exist on tape)

In most cases, "sfind" uses the same keywords as sls -DK.

Usage:

sfind -offline

Examples :

Search for offline files with $ARCHIVE

bigdipper % sfind $ARCHIVE -offline

Search for files that have two copies on tape:

bigdipper % sfind $ARCHIVE -copies 2

stage

The "stage" command initiates a request to copy an "offline" (tape copy only) file to be placed "online" (on disk). Only online files may be read, copied, etc. Giving advanced notice that the file will be needed is a way to ensure the file is "online" and ready to use when requested. Otherwise, an attempt to read the file will automatically stage the file, although it may take some time for the tape to mount and a new "online" copy of the file to be placed on disk.

Please note that the return of this command does not mean the copy to disk has finished. Check for "online" file completion with the sfind or sls commands.

When staging large numbers of files, the batch_stage command may perform more optimally. Therefore, consider using "batch_stage" over the "stage" command in this situation.

Usage:

stage filename

Examples :

The following sfind command will find all of your f90 files and stage them to disk, then will echo the filenames to the terminal:

bigdipper % sfind . -name "*.f90" -offline -exec stage {} \; -exec echo {} \;

batch_stage

The "batch_stage" command brings a set of files "online". Files are staged in the order they are written to tape. This minimizes tape seeks and typically reduces the amount of time it takes to bring multiple files back "online".

Usage:

batch_stage filenames

Examples :

To stage several files by name, simply list all the files that need to be staged following the "batch_stage" command:

bigdipper % batch_stage fileone filetwo filethree

Wildcards may also be used. The following command will stage all the files in the $ARCHIVE/data directory:

bigdipper % batch_stage $ARCHIVE/data/*

A list of files to be staged can also be supplied through stdin, making "batch_stage" ideal to use in conjunction with the "find" command:

bigdipper % find $ARCHIVE/somedirectory/ -name \*.nc | batch_stage -i

sls

The "sls" command is an extended version of the "ls" command which lists files including their SAM-QFS attributes. The -D option will show most of the SAM-QFS attributes.

Usage:

sls filename

Examples:

Show detailed description of SAM information for all files in $ARCHIVE

bigdipper % sls -D $ARCHIVE

Show two lines of output with SAM information:

bigdipper % sls -2 $ARCHIVE/myfile

sdu

The "sdu" command is a SAM version of "du". The command reports the sum of offline and online disk usage.

Usage:

sdu directory

Examples:

Show summary (-s) of usage for each subdirectory in $ARCHIVE in kilobytes (-k):

bigdipper % sdu -sk $ARCHIVE/*

Show summary (-s) of usage for $ARCHIVE directory in human readable form (-h):

bigdipper % sdu -sh $ARCHIVE

Back to Top