PoINT Data Replicator

Version 1.1 with Service Pack 1 (1.1.40)

Contents

Overview

This application allows copying objects from a file system or an S3 compatible object storage to another S3 compatible object storage. The path names of the source files will be used as object identifier on the target storage so that the generated file structure will match the source file system.

Key features:

Limitations

System Requirements

Operating Systems

The PoINT Data Replicator core program (command line tool) can be used on all platforms which are supported by the .NET Core 3.1 runtime. This comprises Microsoft Windows 7 SP1 and later, several Linux distributions and Mac OS X version 10.13 and later. Click here to view the full list of supported operating systems.

The Graphical Configuration Editor can be used on the same operating systems as the core program. It additionally requires a Window manager on Linux.

This version of PoINT Data Replicator has been tested on these operating systems:

System Resources

Installation

Core Components

Extract the archive and copy the content to an empty directory on the computer which shall run the application.

The .NET Core 3.1 Runtime is required to run the application. Please refer to the following link for information how to download and install .NET Core Runtime on your operating system: Download .NET Core 3.1. Following this link select .NET Core Runtime matching your operating system. ASP.NET and Desktop Runtime are not required.

Before using the application, please request a license from your vendor. The license will be sent as a file named 'pdr.lic' which must be copied to the executable directory.

Graphical Configuration Editor

The Graphical Configuration Editor comes in a separate archive file for Windows, Linux and MacOS (each running on an x64 CPU). The archive contains a directory 'gui' which must be copied into the directory where the core components have been copied to.

Command Line

If a valid license is present, the application can be started on the command line as described below:

All platforms:
dotnet pdrcmd.dll [global options] [command] [options]

Microsoft Windows only:
pdrcmd.exe [global options] [command] [options]

Global options:
--config <config file name>
--task <task ID>
--alias <task alias>

Commands:
show [--all]
copy
retry
replicate
verify [--update]
protocol --file <protocol file> --type <type>
remove

The global options '--config', '--task' and '--alias' are optional for all commands, except of 'protocol' and 'remove' which always require the specification of a task using either its task ID or alias. Either the option '--task' or '--alias' can be used, but not both. The specified alias must match the value specified as alias in the configuration of one of the tasks. A specified ID must be the one reported by 'show' command.
If no task is specified, then the command will be performed for all tasks which are specified in the configuration and are not marked as disabled. The commands 'remove' and 'protocol' require the specification of a task using either '--task' or '--alias'.

The option '--config' can be used to specify the name of the configuration file. If this parameter is not present, a file named 'config.json' will be expected in the current directory.

Command 'show'
This command displays all configured tasks, including their assigned ID and alias and the current status. By default, the command displays only tasks which are present in the configuration file. Specify the option '--all' to also display tasks which have been removed from the configuration file, but are still present in the database.

Command 'copy'
The copy command shall be used to actually copy data from the source to the target storage. This command scans the source storage and uses the database to check if the file has already been copied. New or changed files/objects will then be copied to the target storage and remembered in the database. Objects will be treated as changed, and uploaded again, if either the size or the modification time stamp has changed. If reading of a source directory or copying of a file fails, it will also be remembered in the database. This information can later be used to retry copying the failed files or to list them using the protocol command.
You can specify additional options and filters using the section 'CopyOptions' as documented in the configuration file description below.

Command 'retry'
If the initial copy command failed to read some source directories or it failed to copy files, you can use the retry command to copy only the files which failed in the previous run. This command does not scan the source storage for other changes.

Command 'replicate'
This command uses S3 notifications to monitor the source storage for changes and to replicate new files immediately to the target storage. The command is only available if both, source and target are S3 based storage systems and if the source storage provides one of the supported notification mechanisms. Refer to section 'NotificationSettings' in the configuration file description below.

Command 'verify'
Use this command to find identical, missing and different objects between source and target storage. Using the optional parameter '--update', the command will remember identical files in the database so that they can be skipped by a further copy or retry command.

Command 'protocol'
Use this command to create a list of all files which have been copied or failed to copy. This command requires the specification of a task ID or a task alias and a file name for the protocol to be generated. Optionally use '--type' to specify whether to list only FAILURES, SUCCESS or ALL. All files will be listed if this option is not specified.

Command 'remove'
Use this command to remove a configured task from the database. This command requires the specification of a task ID or alias. If you removed a task which still exists in the configuration file, PoINT Data Replicator will create a new task with a new ID and an empty database, when executing any of the other commands.

Configuration File Format

The following section describes the layout of the configuration file. It is recommended to create and modify configuration files using the Graphical Configuration Editor. In this case the following sections are here for reference only. A configuration file which has been modified by the Configuration Editor will not be well formatted like the example below. If you want to view or modify such a file you'll need to use an application which supports the JSON format.

The configuration file is a text file which contains global and task specific settings in JSON format. You may open the example configuration file ('config.example.json') with an text editor which preferably supports the JSON format and save the modified file using another file name, e.g. 'config.json'. Unless specified otherwise on the command line, 'config.json' is the expected file name for the configuration file.

Note: Windows uses backslashes to separate directory and file names. Unfortunately, this character requires special handling in the JSON format and must be 'escaped' by specifying an additional backslash. E.g. the path "C:\dir" must be specified as "C:\\dir" and "\\SERVER\SHARE" as "\\\\SERVER\\SHARE". The S3 target path must always be specified using forward slashes.

The structure of the configuration file is as follows:


        {
            "Settings": {
                "Threads": 16,
                "DatabasePath": "/var/pdrcmd",
                "LogLevel": 0,
                "MaxErrors": 100,
                "BufferSizeKB": 64
            },
            "Tasks": [
                {
                    "Alias": "MyTask",
                    "Status": "enabled",
                    "SourceType": "FileSystem" or "S3",
                    "SourcePath": "/mnt/sourcefs1",
                    "SourceOptions": {
                        "ServerURL": "http://127.0.0.1:80",
                        "PathStyle": false,
                        "AccessKey": "ACCESS_KEY",
                        "SecretKey": "SECRET_KEY",
                        "BucketName": "bucket",
                        "SignatureVersion": 2,
                        "Timeout": 300,
                        "MaxErrorRetry": 2
                    },

                    "TargetType": "S3",
                    "TargetPath": "fs1",
                    "TargetOptions": {
                        "ServerURL": "http://127.0.0.1:80",
                        "PathStyle": false,
                        "AccessKey": "ACCESS_KEY",
                        "SecretKey": "SECRET_KEY",
                        "BucketName": "bucket",
                        "MPUPartSizeMB": 500,
                        "SignatureVersion": 2,
                        "Timeout": 300,
                        "MaxErrorRetry": 2
                    },

                    "CopyOptions": {
                        "Metadata": true,
                        "ACL": false,
                        "Tags": true,
                        "ObjectLock": true,
                        "MinTimeStamp": "2000-01-01 00:00:00",
                        "MaxTimeStamp": ""
                    }

                    "NotificationSettings": {
                        "DeviceType": "MINIO" or "IBMCOS",
                        "Hosts": "kafka1.company.org:9020,kafka2.company.org:9020",
                        "Topic": "MyTopic"
                    }
                }
            ]
        }
        
To specify more than one task, use a comma to separate the tasks:

        {
            "Settings": {
                ...
            },
            "Tasks": [
                {
                    ...
                },
                {
                    ...
                },
                {
                    ...
                }
            ]
        }
        

Settings
The section "Settings" contains some global options:
Tasks
The section "Tasks" is actually a list of one or more (comma separated) tasks wrapped in square brackets. Each of the tasks must be wrapped in curly braces.
SourceOptions
This section provides additional parameters which are required for the source storage. In case of S3, the following settings must be specified. For file system source storage, you can remove the complete section from the configuration file. TargetOptions
This section provides additional parameters which are required for the target storage. For S3 the following settings must be specified: CopyOptions
This section provides additional parameters for the 'copy', 'retry' and 'verify' commands. These are optional parameters. The default value for all boolean properties is false and time stamps are ignored if not specified. The options 'Metadata', 'ACL', 'Tags' and 'ObjectLock' are not used for file system sources. NotificationSettings
This section is only required when using the 'replicate' command.

Usage Hints

Configuration File

PoINT Data Replicator allows working with multiple configuration files so that it is possible to work on different scenarios at the same time. To select the active configuration file, you can either specify it using the '--config' option or change to the directory where the file is located and start the command line there. The database and log files will be created in the same directory as the configuration file or in the directory which is specified as "DatabasePath" in the configuration file. The database paths must be unique when using multiple configuration files. A safe way to ensure the database paths are unique is to omit the "DatabasePath" setting or specify a path relative to the configuration file.

However, the easiest way is to work with only one configuration file in the directory where the 'pdrcmd' executable is located and always omit the '--config' option.

Working with Tasks

Tasks must be configured using the configuration file. Each task consists of a specification of a source, a target and additional options. After starting the command line program, it reads the configuration file and checks if the database already contains a task with the same source and target specification (Path, URL and Bucket). If no such task exists, it will create one with a unique task ID. Consequently, after configuring a task in the configuration file, use the command 'show' to display the configured tasks and their assigned IDs. The ID can then be used to execute commands for a specific task by means of the '--task' option.

Because PoINT Data Replicator uses the source and target specifications (Path, URL and Bucket) to map configuration file entries to tasks in the database, there must not be multiple tasks using the same source and target specifications. Also, if the source or target specification (either of Path, URL or Bucket) changes, PoINT Data Replicator will treat the task as a new task.

Instead of using the task ID you can specify an alias for the task in the configuration file and use that to select the task on the command line using the '--alias' option. Please note that aliases and task IDs are not fully interchangeably. A task ID specifies a task in PoINT Data Replicators database, while the alias refers to a task in the configuration file. In most cases those are the same. However, when changing the source or the target specification for a task in the configuration file, PoINT Data Replicator will handle this as another task and will assign a new ID to that task. The alias now refers to the modified task entry in the configuration file, but the previous task ID still refers to the original task in the database - which does not have a corresponding configuration file entry anymore.

Network Paths

When running on Windows, CIFS network paths can be specified using the UNC format (e.g. "\\Server\Share"). Note: When editing the configuration file using a text editor, you must enter all backslashes twice, e.g. "\\\\Server\\Share"). This is not necessary when using the Configuration Editor.

When copying NFS exports it is recommended to run the application on a Linux based operating system. On Linux, NFS exports and CIFS network shares must be mounted locally and the mount point must be specified as "SourcePath" in the configuration.

Workflow

After adding a task to the configuration file, use the 'show' command to verify the configured parameters and to display the assigned task ID.

You may now run the 'verify' command to figure out how many objects will be copied. If the target contains already objects which have been copied from the source, you should additionally specify the option '--update' to avoid copying these files again.

Now start copying the data using the 'copy' command. Specify the task ID or the alias on the command line. The specification of the task is not necessary, if only one task is configured. PoINT Data Replicator will create a log file for each task. By default, this log will receive error messages and statistics for each executed command. Log files will be stored in the database directory, if specified, or in the directory of the configuration file.

The task is complete, if the 'copy' command completed without any failures. If it finished enumerating the data source, but failed to copy some objects, use the 'retry' command to retry failed objects.

If some directories could not be read or if files have been added or modified on the source storage, you need to use the 'copy' command to scan the source tree for new files and copy them. Please note that files which have been removed from the source will not be removed from the target storage and files wich have been removed from the target will not be copied again.

You can verify data on the source and target storage by using 'verify' command. Once the 'verify' command finished, statistics will be displayed with number of matched, missing, different and failed objects. This command also checks ACLs, Tags, ObjectLocks and Custom Metadata if enabled in CopyOptions.

Once the task is considered as complete, you can use the 'protocol' command to create a list of the copied and/or failed files.

Running Multiple Instances in Parallel

It is not recommended to run multiple instances of the application in parallel on the same computer. However, it is possible to run multiple instances when specifying different configuration files and a different database directory for each instance.

Graphical Configuration Editor

Overview

PoINT Data Replicator - Configuration Editor is a cross platform application which helps creating and modifying PoINT Data Replicator configuration files. It need not be run on the same system as PoINT Data Replicator itself. You can use it to create the configuration file on a system which has a graphical desktop and then copy the file to the system where PoINT Data Replicator shall be executed.

Usage

The Configuration Editor executable is named 'pdrgui.exe' on Windows and 'pdrgui' on Linux and MacOS. If installed as described above, it is located in the directory 'gui' in the directory where PoINT Data Replicator has been installed. To execute on Windows and MacOS just double click on the executable. On Linux you may need to mark the file as executable first and then start it e.g. from a command line using './pdrgui'.

After starting the Configuration Editor executable, the welcome screen appears and provides the options to either browse for a configuration file or to open/create it on the default location. The default location is a file named 'config.json' in the same directory where the 'pdrcmd' executable is located. If that location is not writable in your environment, use the browse function to create the file in another location.

After selecting the config file location and if no file did exist yet, the application will show a section with default global settings and no tasks. Click on Create Task to add a new task to the configuration and fill the required fields for source and target. Specification of an alias is optional, but it must be unique if it is specified.

Every field has a small information button and on hover you will get more information about that field. Also, there are validators in place and wrongly entered input will be highlighted with a red border. If you try to save the file and there are any errors present, an error will be displayed and the first field with invalid content will be selected.

Individual tasks can be enabled or disabled. If a task is disabled, it will be ignored by the PoINT Data Replicator command line program. Thus, if you disable all but one task, you can omit the task specification on the command line because only that task will be considered.

In the header of a tasks there are buttons to clone and to remove the task. Click on 'Clone' to create a copy of this task. After creating the copy, make sure to change the source or target specifications and the alias, if present. Another button allows cloning of a task, but with exchange of source and target storage. This function is only available if source and target specify S3 storage. Click on 'Remove' to remove a task from the configuration file. Please note that this will not remove the task from the database. Refer to the command line commands 'show' and 'remove' for more information.

Finally, click on Save to save the modified configuration file to disk. This operation may report an error if either field has invalid content or two or more tasks with same source and target specifications exist.

Once the file has been saved, use the 'pdrcmd' executable to execute tasks. If another but the default file location has been specified, you need to specify that file name on the command line using the '--config' option or change to that directory first.

License Information
Possession, use, duplication or dissemination of this documentation as well as the software described in this documentation is authorised only pursuant to a valid written license from PoINT Software & Systems GmbH or an authorised sublicensor.

Full License Terms for Software Products of PoINT Software & Systems GmbH

PoINT Data Replicator uses open source software. Please refer to the following documents for the related licenses:

Angular framework and modules
Chromium framework
Electron framework and modules
AWS SDK for .NET (Apache License 2.0)
.NET Core (The MIT License)

Disclaimer
PoINT Software & Systems GmbH believes the Information included in this publication is accurate as of the date of publication, it is subject to change without notice. PoINT Software & Systems GmbH is not responsible for any inadvertent errors. PoINT Software & Systems GmbH makes no representations that the use of its products in the manner described in this document will not infringe on existing or future patent rights, nor do the descriptions contained in this document imply the granting of licenses to make, use, or sell equipment or software in accordance with the description.

A functioning storage workflow requires a well configured system environment with devices working free of faults and, if applicable, flawless storage media. Therefore it is of the essence that the user does backup all data by functions offered by PoINT software and/or (if required) by supplementary software products at adequate intervals (i.e. in accordance with the scope and frequency of changes), and thereby to facilitate the reinstatement of these data even in exceptional situations (i.e. in case of hardware malfunction).