Data Mover Service

The Data Mover is a software called nodeum developed by MT-C. The goal of the Data Mover is to move data with high speed and secure between the HPC-Filesystems (POSIX conform) and the object store (using SWIFT API). The service is programmable and can be extended with scriptlets to run your own scripts. The CLI and many other features were developed during an ICEI funded project. The same software is meant to be used on all 5 project member sites (BSC, CEA, JSC, CINECA, CSCS) but will only move data site locally. Authentication is done using the central FENIX AAI. Dedicated nodes are running and hosting the service. For user access (like triggering a movement) a CLI called nd is provided. This document describes the usage of the CLI.

Technical Concept

_images/Data-Mover.png
  • Authentication: The Data Mover requires a security token from the FENIX AAI infrastructure. The FENIX AAI is connected to the site local authentication services, so every Jülich HPC user can use it. The CLI will ask the user to create a security token (by web). When the user has successfully authenticated a token is created which will be valid for one hour.

  • Storage Connectors: The Data Mover can move data/files between different storage repositories which are POSIX file systems or object store. In the nd CLI each repository is defined as a “connector”. In the Jülich Data Mover service are available:

    Connector

    Type

    Description

    largedata_pool

    POSIX File System

    /p/largedata, see file systems

    largedata2_pool

    POSIX File System

    /p/largedata2, see file systems

    object_pool

    Object Store

    see object store

  • Data Mover Cluster: A dedicated cluster will run the data transfer between the storage repositories.


Data Mover Command Line Interface (CLI)

The Nodeum Tool

The nd client is installed on JUDAC and can be used by all users.

$> nd
NAME:
   nd - Nodeum CLI

USAGE:
   nd [global options] command [command options] [arguments...]

VERSION:
   2.0.5

COMMANDS:
   admin
   config    configure the Nodeum Client
   copy, cp  create copy task
   move, mv  create move task
   task
   help, h   Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --json                          output as JSON (default: false)
   --config value                  path to configuration file (default: <config-dir>/config.json) [$ND_CONFIG]
   --config-dir value, -C value    path to configuration folder (default: "/p/home/jusers/lischewski1/judac/.config/.nd") [$ND_CONFIG_DIR]
   --alias value                   alias in configuration file for authentication (default: "default") [$ND_ALIAS]
   --url value                     URL of Nodeum [$ND_URL]
   --access-token value            for API authentication (1st authentication method) [$ND_ACCESS_TOKEN]
   --refresh-token value           for API authentication (1st authentication method, not saved in config) [$ND_REFRESH_TOKEN]
   --authorization-endpoint value  for Device Authorization Flow (2nd authentication method)
   --token-endpoint value          for Device Authorization Flow (2nd authentication method)
   --client-id value               for Device Authorization Flow (2nd authentication method)
   --scopes value                  for Device Authorization Flow (2nd authentication method)
   --persist-session               persist Device Authorization session on disk for 1 hour (default: true)
   --persist-session-renew         if persist session is enabled, renew the token (default: false)
   --username value                for API authentication (3rd authentication method) [$ND_USERNAME]
   --password value                for API authentication (3rd authentication method) [$ND_PASSWORD]
   --anonymous                     no login (default: false)
   --help, -h                      show help (default: false)
   --version, -v                   print the version (default: false)

Task Handling

A task is one data transfer triggered by the nd client. The tool saves information about every tasks in it’s database.

List all created tasks

This command list all tasks created by the user in the data mover service. The columns describe:

  • TASK ID: ID of the Task

  • TASK NAME: Name of the task defined during the creation

  • COMMENT: Associated comment.

  • CREATE BY: User who has created the task


$> nd task list

+--------------------------+----------------------------------------------------------+---------+------------+-----------------------+
| TASK ID                  | TASK NAME                                                | COMMENT | CREATED BY | LAST EXECUTION STATUS |
+--------------------------+----------------------------------------------------------+---------+------------+-----------------------+
| 6287a726a91db0b194e97d8a | From /largedata2_pool/test_data to pool1                 |         | John Doe   | done                  |
| 6287a774a91db0b194e97d8d | From /largedata2_pool/storagetestdata/test_data to pool1 |         | John Doe   | done                  |
| 6331d52ba91db02d6797e6ae | From nod://largedata2_pool/storagetestdata/ to vg--1598  |         | John Doe   | stopped_by_user       |
| 6331d5c5a91db02d6797e6b4 | From nod://largedata2_pool/storagetestdata/ to vg--1590  |         | John Doe   | stopped_by_user       |
| 6331d677a91db02d6797e6b7 | From nod://largedata2_pool/storagetestdata/ to vg--1598  |         | John Doe   | done                  |
| 6331d692a91db02d6797e6ba | From nod://largedata2_pool/storagetestdata/ to vg--1598  |         | John Doe   | stopped_by_user       |
| 6333ff2ea91db091264b68a2 | From nod://largedata2_pool/storagetestdata/ to vg--1500  |         | John Doe   | finished with warning |
| 63358216a91db0397f128dcb | From nod://largedata2_pool/storagetestdata/ to vg--1500  |         | John Doe   | finished with warning |
| 6335822da91db0397f128dce | From nod://largedata2_pool/storagetestdata/ to vg--1502  |         | John Doe   | done                  |
| 633584d5a91db0397f128dd1 | From nod://largedata2_pool/storagetestdata/ to vg--1502  |         | John Doe   | finished with warning |
| 6336b341a91db0397f128dd4 | From nod://largedata2_pool/storagetestdata/ to vg--1500  |         | John Doe   | done                  |
+--------------------------+----------------------------------------------------------+---------+------------+-----------------------+
| NUMBER OF TASK(S)        | 11                                                       |         |            |                       |
+--------------------------+----------------------------------------------------------+---------+------------+-----------------------+

Execute a new task

This command send a copy request to the data mover service.

nd copy \
   --md project_name=<my project name> \
   nod://<largedata[2]_pool>/<my project name>/mypath/ \
   nod-cloud://object_pool/<my_new_container>
  • <my project name> is the name of your data project

  • nod://largedata[2]_pool/storagetestdata/mypath/ is the source (connector notation)
    • largedata[2]_pool is the logical name of the POSIX File System. In the examples the largedata2_pool is used.

    • storagetestdata is the name of a user project folder

    • mypath is a sub folder

  • nod-cloud://object_pool/my_new_container is the destination (connector notation)
    • object_pool is the name of the Openstack swift storage in the JSC object store

    • my_new_container is the name of the Container where the files will be copied

  • optional arguments (standard):

    Option name

    Alternative

    Description

    Value (type)

    Default

    --help

    -h

    Show help

    --no-run

    Create the task and don’t run it

    false

    --name value

    -n value

    Name of task

    string

    auto generated

    --comment value

    additional comment for task

    string

    --priority value

    Priority of the task, between 0 and 9 (0 is the highest priority)

    0-9

    0

    --recursive

    -R

    Recursive copy of the folder. If sub folders are present, the service will also copy the contents of each sub folder

    false

    --working-dir

    --wd

    Defines the working directory and the path that will be kept at the destination

    string

    ‘.’

    --ignore-hidden

    Task will not handle hidden files

    false

    --progress

    Display live progress when running the task

    true

    --processed-nodes

    If --progress is used. Display the processed nodes.

    none, error, all

    error

  • optional arguments (advanced):

    Option name

    Alternative

    Description

    Value (type)

    Default

    --parallel value

    Define the number of mover which will handle the movement

    integer

    1

    --callback type

    Execute custom script on finalizing task.

    ./path/to/file

    --trigger-md key=value

    --md key=value

    Set metadata on the trigger.

    key=value

    --task-md key=value

    Set metadata on the task.

    key=value

    --files-md key=value

    Set metadata on the files.

    key=value

Example with minimal parameter: Copy all my data from the POSIX file system /p/largedata2/storagetestdata/mypath recursively to the container `my_new_container in the Jülich object store.

$> nd copy --md project_name=storagetestdata \
   --recursive  \
   nod://largedata2_pool/storagetestdata/mypath/  nod-cloud://object_pool/my_new_container

Types of Data Transfer Tasks

  • Execute a copy from POSIX to Object Store

$> nd copy \
      --md project_name=<my project name> \
      --recursive --ignore-hidden \
      nod://largedata2_pool/storagetestdata/mypath/ \
      nod-cloud://object_pool/my_new_container
  • Execute a copy from Object Store to POSIX

$> nd copy \
      --md project_name=<my project name> \
      --recursive --ignore-hidden
      nod-cloud://object_pool/my_new_container/my_path/ \
      nod://largedata2_pool/storagetestdata/
  • Move data from POSIX to Object Store

$> nd move \
      --md project_name=<my project name> \
      --recursive --ignore-hidden \
      nod://largedata[2]_pool/storagetestdata/mypath/ \
      nod-cloud://object_pool/my_new_container
  • Move Data from Object Store to POSIX

$> nd move \
      --md project_name=<my project name> \
      --recursive --ignore-hidden
      nod-cloud://object_pool/my_new_container/my_path/ \
      nod://largedata2_pool/storagetestdata/

Run copy task and display task status

Run the copy task:

$>  nd copy \
       --md project_name=storagetestdata \
       nod://largedata2_pool/storagetestdata/doe1/j.jpg
       nod-cloud://object_pool/my_new_jpg_container/
INFO Connecting with device flow...
INFO Connected with user John Doe
Processed size ... done! [38.28KB in 23s]
Processed items ... done! [1 in 23s]
          ID: 63b7fd126368e8888df23c49
     Task ID: 63b7fd12a91db02194549f2a
        Name: From nod://largedata2_pool/storagetestdata/doe1/j.jpg to my_new_jpg_container
     Comment:
  Created by: John Doe
       Nodes: 1 / 1
        Size: 38.28 kB / 38.28 kB
      Status: done

To display the status of this task use the “Task ID”:

$>  nd task status 63b7fd12a91db02194549f2a
INFO Connecting with device flow...
INFO Connected with user John Doe
          ID: 63b7fd126368e8888df23c49
     Task ID: 63b7fd12a91db02194549f2a
        Name: From nod://largedata2_pool/storagetestdata/doe1/j.jpg to my_new_jpg_container
     Comment:
  Created by: John Doe
       Nodes: 1 / 1
        Size: 38.28 kB / 38.28 kB
      Status: done

Miscellaneous

Use --working-dir to adjust destination tree structure

If you do use the parameter --working-dir (or short --wd) you can decide how much of the complete path of the SOURCE is not used on the DESTINATION.

Example:

working-dir

short

source file (POSIX)

destination file (Object Store)

default

‘.’

/p/largedata2/storagetestdata/doe1/j.jpg

my_new_jpg_container:j.jpeg

nod://largedata2_pool/storagetestdata/doe1

‘.’

/p/largedata2/storagetestdata/doe1/j.jpg

my_new_jpg_container:j.jpeg

nod://largedata2_pool/storagetestdata

‘..’

/p/largedata2/storagetestdata/doe1/j.jpg

my_new_jpg_container:doe1:j.jpeg

nod://largedata2_pool

/p/largedata2/storagetestdata/doe1/j.jpg

my_new_jpg_container:storagetestdata:doe1:j.jpeg

Use relative path -working-dir

The nd CLI offers an abbreviation to use paths relative to the source directory. If source is nod://largedata2_pool/storagetestdata/doe/ he can use --working-dir . which is the equivalent to --working-dir  nod://largedata2_pool/storagetestdata/doe1/. Also available is --working-dir .. which is the equivalent to --working-dir  nod://largedata2_pool/storagetestdata/.