Table of contents

WIPP deployment: General information

Version 2.0.0

Disclaimer

This software was developed at the National Institute of Standards and Technology by employees of the Federal Government in the course of their official duties. Pursuant to title 17 Section 105 of the United States Code this software is not subject to copyright protection and is in the public domain. This software is an experimental system. NIST assumes no responsibility whatsoever for its use by other parties, and makes no guarantees, expressed or implied, about its quality, reliability, or any other characteristic. We would appreciate acknowledgement if the software is used.

Important security information

YOU SHOULD NOT DEPLOY THIS SYSTEM ON A PUBLIC SERVER SINCE THE SOFTWARE DOES NOT INCLUDE ANY ACCOUNT AND UPLOAD ACCESS MANAGEMENT. THE CURRENT IMAGE UPLOAD IS COMPLETELY UNRESTRICTED AND COULD BE USED TO UPLOAD MALWARE, VIRUSES OR INNAPROPRIATE CONTENT.

The Web Image Processing Pipelines system (WIPP) version 2.0.0 does not include any web security management. WIPP 2.0.0 allows unrestricted uploading of files via the web browser interface and the uploaded files interact with the file system as well as with an instance of MongoDB database.
WIPP 2.0.0 is intended for deployment on private networks behind a firewall. Future releases will include account and upload access management.

Deployment options

Three options are currently available for deploying WIPP:

WIPP deployment using Docker

We recommend using Docker containers for deploying the WIPP system.
Due to the inclusion of many libraries and their configurations in WIPP, we pre-installed and packaged all software in Docker containers. The containers simplify the WIPP deployment and make it more reproducible with consistent configurations.

What is Docker?

Docker is the container software platform with libraries and settings packaged in containers. One can deploy multiple instances of the Docker container to a set of machines running the Docker Engine.
The Docker containers can form a cluster of nodes by using Docker Swarm technology. Docker Swarm consists of a manager node and worker nodes providing services, as well as, an overlay network for multi-host networking. The manager node assigns tasks to the worker nodes in the form of Docker containers that can perform specific services. For more information about Docker Swarm, visit the Docker Swarm documentation page.

Requirements

Please make sure that the system(s) hosting WIPP meets the following requirements:
  • - Unix operating system meeting the requirements for running the Docker Engine (we tested on Ubuntu 16.04 LTS x86_64, with Docker version 17.03.1-ce).
  • - At least 8GB (per host for multi-hosts) or 16 GB (single host) of available RAM (some algorithms require the RAM size to be up to ten times the size of the input image).
  • - At least 50 GB of available disk space (should be scaled according to the expected total amount of uploaded and computed data).
The WIPP system deployed using Docker on a single host will take 6 GB of disk space.

WIPP deployment on a single host

Deploying WIPP on a single host consists of six steps:
  • 1. installing Docker,
  • 2. configuring a Docker Swarm,
  • 3. configuring Docker volumes for database and file system storage,
  • 4. deploying WIPP using the provided script,
  • 5. opening firewall ports for the WIPP system, and
  • 6. accessing WIPP system web interface.

Step-by-step instructions for Linux

Download WIPP Zip file containing the installation script and a README file here.

1. Installing Docker

To install Docker, please, follow the instructions from the Docker web site (menu "Get Docker") according to the operating system the host is running.

Installing Docker Community Edition (CE), for example, Ubuntu Xenial 16.04

Official Docker instructions available here.

a. Update the apt package index and upgrade the installed packages:
sudo apt-get update
sudo apt-get upgrade
b. Set up the Docker CE apt-get repository:
- Install packages to allow apt to use a repository over HTTPS:
sudo apt-get install \
	apt-transport-https \
	ca-certificates \
	curl \
	software-properties-common
- Add Docker’s official GPG key:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
- Set up the Docker stable repository (for architecture amd64):
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
c. Install Docker:
- Update the apt package index:
sudo apt-get update
- Install the latest version of Docker:
sudo apt-get install docker-ce
- Verify that Docker CE is properly installed (will print an informational message):
sudo docker run hello-world

Running Docker as non-root user

After installing Docker, you will have to use sudo for all Docker commands. To avoid that, follow the instructions from the Docker Linux postinstall web page:

- Create the docker group:
sudo groupadd docker
- Add your user to the docker group:
sudo usermod -aG docker $USER
- Log out and log back in so that your group membership is re-evaluated, then verify that you can run Docker commands without sudo:
docker run hello-world

Configure Docker to start on boot

In order to have the Docker service automatically started when the system boots, follow the instructions from the Docker Linux postinstall web page:
- For Ubuntu 16.04 and higher, run:

sudo systemctl enable docker

2. Configuring a Docker Swarm for WIPP

The WIPP Docker deployment is using Docker Swarm to create a cluster of Docker nodes on a single or multiple hosts configuration.

- Initialize the swarm:
IP_MGR=129.1.2.3 (replace 129.1.2.3 by the IP address of the host)
docker swarm init --advertise-addr ${IP_MGR} --listen-addr ${IP_MGR}
- Configure the swarm network overlay:
docker network create --driver overlay wippnet

3. Configuring Docker volumes for WIPP and Pegasus databases and file system storage

The WIPP Docker containers are using Docker volumes to store data on the host (see Docker volumes web page for more information about Docker volumes).

- Create a Docker volume for the WIPP database:
docker volume create --name wippdbvolume
- Create a Docker volume for the WIPP data storage:
docker volume create --name wippdatavolume
- Create a Docker volume for the Pegasus database:
docker volume create --name pegasusdbvolume
- Create a Docker volume for the Pegasus data storage:
docker volume create --name pegasusdatavolume

4. Deploying WIPP

A deployment script setup.sh is provided in the WIPP Zip file.

- Make sure that the execution permissions are set for the script:
chmod a+x setup.sh
- Run the script to deploy WIPP on the Docker Swarm:
./setup.sh wippnet wippdbvolume wippdatavolume pegasusdbvolume pegasusdatavolume 4G
Replace "4G" by the maximum amount of RAM you want to allow for the image processing algorithms. This amount should be scaled according to the available RAM on the host and the size of the data you want to process, knowing that algorithms may require up to 10 times the size of a single image in RAM size, i.e., a 1GB image may need up to 10GB of RAM. Format: size[g|G|m|M|k|K]
This script will pull the WIPP Docker images from the WIPP Docker Hub repository, as well as the public MongoDB (database) image, and deploy them as services in the Docker Swarm.
After running the script, the services will start their initialization. To get the status of the WIPP services, run:
docker service ls
The output should be similar to the following:
ID            NAME     MODE        REPLICAS  IMAGE
1x4y8ee96kn2  master   replicated  0/1       wipp/wipp_master:release-2.0.0
3wx2zpmbj6om  wipp     replicated  0/1       wipp/wipp:2.0.0
iyyrjoln0t0t  mongodb  replicated  0/1       mongo:3.4
n3bgifaet7vf  exec     replicated  0/1       wipp/wipp_executor:2.0.0
REPLICAS 0/1 means services are not yet started (either loading or failing to start).

For detailed information about a service:

docker service ps $NAME
For example, to check the state of the mongodb (database) service: docker service ps mongodb 
ID            NAME       IMAGE         NODE            DESIRED STATE  CURRENT STATE           ERROR  PORTS
02zpjbxq42a7  mongodb.1  mongo:latest  vm-itl-ssd-063  Running        Running 44 seconds ago

WIPP system services will start in this order: mongodb, master, wipp, exec.
Once everything is started (after a few minutes), the state of the services should be similar to the following:
docker service ls

ID            NAME     MODE        REPLICAS  IMAGE
1x4y8ee96kn2  master   replicated  1/1       wipp/wipp_master:2.0.0
3wx2zpmbj6om  wipp     replicated  1/1       wipp/wipp:2.0.0
iyyrjoln0t0t  mongodb  replicated  1/1       mongo:3.4
n3bgifaet7vf  exec     replicated  1/1       wipp/wipp_executor:2.0.0

5. Opening firewall ports for the WIPP system

WIPP is running through the ports 18080 (web interface) and 15005 (Pegasus dashboard). In order to access the WIPP system, these ports have to be open for communication.

- On Ubuntu 16.04 and up, one option is to use the uncomplicated firewall (ufw):
sudo ufw allow 18080
sudo ufw allow 15005
- Check that the ports are open:
sudo ufw status verbose

6. Accessing WIPP system web interface

Once deployed, the WIPP web interface will be accessible from http://host-ip:18080 and the Pegasus dashboard (for troubleshooting any failing jobs) will be accessible from https://host-ip:15005 (replace host-ip by the IP address of the host.)

The Pegasus dashboard is accessible via HTTPS, but does not ship with a SSL certificate. Some web browsers will complain about the lack of valid certificate and ask the user to add a security exception, or confirm "Proceed to the website (unsafe)" before being able to access the dashboard. The dashboard is set up to be accessible with the credentials "wipp/zaq123"".

Upgrading from WIPP 1.1.0 to 2.0.0

1. Stopping and removing the running version of WIPP

To stop and remove the WIPP Docker services that are currently running on the machine, use the following command:

docker service rm exec wipp master mongodb

2. Deploying WIPP

A deployment script setup.sh is provided in the WIPP Zip file.

- Make sure that the execution permissions are set for the script:
chmod a+x setup.sh
- Run the script to deploy WIPP on the Docker Swarm:
./setup.sh wippnet wippdbvolume wippdatavolume pegasusdbvolume pegasusdatavolume 4G
Replace "4G" by the maximum amount of RAM you want to allow for the image processing algorithms. This amount should be scaled according to the available RAM on the host and the size of the data you want to process, knowing that algorithms may require up to 10 times the size of a single image in RAM size, i.e., a 1GB image may need up to 10GB of RAM. Format: size[g|G|m|M|k|K]
This script will pull the WIPP Docker images from the WIPP Docker Hub repository, as well as the public MongoDB (database) and HTCondor (job scheduler) images, and deploy them as services in the Docker Swarm.
After running the script, the services will start their initialization. To get the status of the WIPP services, run:
docker service ls
The output should be similar to the following:
ID            NAME     MODE        REPLICAS  IMAGE
1x4y8ee96kn2  master   replicated  0/1       wipp/wipp_master:2.0.0
3wx2zpmbj6om  wipp     replicated  0/1       wipp/wipp:2.0.0
iyyrjoln0t0t  mongodb  replicated  0/1       mongo:3.4
n3bgifaet7vf  exec     replicated  0/1       wipp/wipp_executor:2.0.0
REPLICAS 0/1 means services are not yet started (either loading or failing to start).

For detailed information about a service:

docker service ps $NAME
For example, to check the state of the mongodb (database) service: docker service ps mongodb 
ID            NAME       IMAGE         NODE            DESIRED STATE  CURRENT STATE           ERROR  PORTS
02zpjbxq42a7  mongodb.1  mongo:latest  vm-itl-ssd-063  Running        Running 44 seconds ago

WIPP system services will start in this order: mongodb, master, wipp, exec.
Once everything is started (after a few minutes), the state of the services should be similar to the following:
docker service ls

ID            NAME     MODE        REPLICAS  IMAGE
1x4y8ee96kn2  master   replicated  1/1       dscnaf/htcondor-debian:release-0.2.0
3wx2zpmbj6om  wipp     replicated  1/1       wipp/wipp:latest
iyyrjoln0t0t  mongodb  replicated  1/1       mongo:3.4
n3bgifaet7vf  exec     replicated  1/1       wipp/wipp_executor:latest

3. Accessing the WIPP system web interface

Once deployed, the WIPP web interface will be accessible from http://host-ip:18080 and the Pegasus dashboard (for troubleshooting any failing jobs) will be accessible from https://host-ip:15005 (replace host-ip by the IP address of the host). Any data and jobs created in the version 1.0.0 should be present in the new version.

The Pegasus dashboard is accessible via HTTPS, but does not ship with a SSL certificate. Some web browsers will complain about the lack of valid certificate and ask the user to add a security exception, or confirm "Proceed to the website (unsafe)" before being able to access the dashboard. The dashboard is set up to be accessible with the credentials "wipp/zaq123"".

Upgrading from WIPP 1.0.0 to 2.0.0

1. Stopping and removing the running version of WIPP

To stop and remove the WIPP Docker services that are currently running on the machine, use the following command:

docker service rm exec wipp master mongodb

The WIPP system will be deleted, but the WIPP database and file system will remain intact in the Docker volumes. However, the Pegasus database populating the Pegasus WMS monitoring dashboard will be deleted due to a bug present in version 1.0.0 (and fixed since 1.1.0).

2. Configuring additional Docker volumes for Pegasus databases and file system storage

The Pegasus WMS database and file system are now stored in Docker volumes to avoid them being deleted when the WIPP Docker containers are restarted.

- Create a Docker volume for the Pegasus database:
docker volume create --name pegasusdbvolume
- Create a Docker volume for the Pegasus data storage:
docker volume create --name pegasusdatavolume

3. Deploying WIPP

A deployment script setup.sh is provided in the WIPP Zip file.

- Make sure that the execution permissions are set for the script:
chmod a+x setup.sh
- Run the script to deploy WIPP on the Docker Swarm:
./setup.sh wippnet wippdbvolume wippdatavolume pegasusdbvolume pegasusdatavolume 4G
Replace "4G" by the maximum amount of RAM you want to allow for the image processing algorithms. This amount should be scaled according to the available RAM on the host and the size of the data you want to process, knowing that algorithms may require up to 10 times the size of a single image in RAM size, i.e., a 1GB image may need up to 10GB of RAM. Format: size[g|G|m|M|k|K]
This script will pull the WIPP Docker images from the WIPP Docker Hub repository, as well as the public MongoDB (database) and HTCondor (job scheduler) images, and deploy them as services in the Docker Swarm.
After running the script, the services will start their initialization. To get the status of the WIPP services, run:
docker service ls
The output should be similar to the following:
ID            NAME     MODE        REPLICAS  IMAGE
1x4y8ee96kn2  master   replicated  0/1       wipp/wipp_master:2.0.0
3wx2zpmbj6om  wipp     replicated  0/1       wipp/wipp:2.0.0
iyyrjoln0t0t  mongodb  replicated  0/1       mongo:3.4
n3bgifaet7vf  exec     replicated  0/1       wipp/wipp_executor:2.0.0
REPLICAS 0/1 means services are not yet started (either loading or failing to start).

For detailed information about a service:

docker service ps $NAME
For example, to check the state of the mongodb (database) service: docker service ps mongodb 
ID            NAME       IMAGE         NODE            DESIRED STATE  CURRENT STATE           ERROR  PORTS
02zpjbxq42a7  mongodb.1  mongo:latest  vm-itl-ssd-063  Running        Running 44 seconds ago

WIPP system services will start in this order: mongodb, master, wipp, exec.
Once everything is started (after a few minutes), the state of the services should be similar to the following:
docker service ls

ID            NAME     MODE        REPLICAS  IMAGE
1x4y8ee96kn2  master   replicated  1/1       dscnaf/htcondor-debian:release-0.2.0
3wx2zpmbj6om  wipp     replicated  1/1       wipp/wipp:latest
iyyrjoln0t0t  mongodb  replicated  1/1       mongo:3.4
n3bgifaet7vf  exec     replicated  1/1       wipp/wipp_executor:latest

4. Accessing the WIPP system web interface

Once deployed, the WIPP web interface will be accessible from http://host-ip:18080 and the Pegasus dashboard (for troubleshooting any failing jobs) will be accessible from https://host-ip:15005 (replace host-ip by the IP address of the host). Any data and jobs created in the version 1.0.0 should be present in the new version.

The Pegasus dashboard is accessible via HTTPS, but does not ship with a SSL certificate. Some web browsers will complain about the lack of valid certificate and ask the user to add a security exception, or confirm "Proceed to the website (unsafe)" before being able to access the dashboard. The dashboard is set up to be accessible with the credentials "wipp/zaq123"".

WIPP deployment on multiple hosts

This set of instructions is intended for advanced users and system administrators. We only support Unix operating systems for the multi-hosts deployment, and have been testing on three Ubuntu 16.04 servers.

Deploying WIPP on a multiple hosts consists of six steps:
  • 1. installing Docker on each host,
  • 2. configuring a Docker Swarm,
  • 3. configuring shared data folders for database and file system storage,
  • 4. deploying WIPP using the provided script,
  • 5. opening firewall ports for the WIPP system, and
  • 6. accessing the WIPP system web interface.

Configuration and deployment instructions

Download the WIPP Zip file containing the installation script and a README file here.

1. Installing Docker on each host

To install Docker, please, follow the instructions from the Docker web site (menu "Get Docker") according to the operating system the host is running.

2. Configuring a Docker Swarm for WIPP

The WIPP Docker deployment is using Docker Swarm to create a cluster of Docker nodes on a single or multiple hosts configuration.

- Choose one of the hosts to be the Swarm manager and initialize the swarm on this manager host:
IP_MGR=129.1.2.3 (replace 129.1.2.3 by the IP address of the host)
docker swarm init --advertise-addr ${IP_MGR} --listen-addr ${IP_MGR}
The output should be similar to:
Swarm initialized: current node (xxxxxxxxx) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join \
    --token xxx-yyyy-zzzz \
    129.1.2.3:2377

Save this docker swarm command to use it on the worker hosts later to make them join the swarm.

- Open port 2377 for Docker Swarm communications (using ufw for example):
sudo ufw allow 2377
- Open ports 4789 and 7946 for Swarm network overlay communications (using ufw for example):
sudo ufw allow 4789
sudo ufw allow 7946
- Configure the swarm network overlay:
docker network create --driver overlay wippnet
- Join the Swarm from the other hosts using the saved docker swarm command similar to the following:
docker swarm join \
    --token xxx-yyyy-zzzz \
    129.1.2.3:2377

3. Configuring shared folders for WIPP and Pegasus databases and file system storage

On a multi-hosts deployment, the WIPP Docker containers are binding folders that are shared across all the hosts (see Docker bind mounts web page for more information about the usage of bind mounts).

Four shared folders need to be available from all hosts:
- Shared folder for the WIPP database (wippdbvolume)
- Shared folder for the WIPP data storage (wippdatavolume)
- Shared folder for the Pegasus database (pegasusdbvolume)
- Shared folder for the Pegasus data storage (pegasusdatavolume)

Notes: One solution for sharing folders across multiple hosts is to use GlusterFS to mount shared volumes on all hosts. Please refer to the GlusterFS documentation for configuring shared gluster volumes.
The wipp user inside of the Docker containers has the UID 1000, so may need to change the ownership of the shared folders to the user 1000, and set read and execution permissions for group and others as well.
Please note that the wippdbvolume, pegasusdbvolume and pegasusdatavolume will contain database files, which may cause issues when using a NAS (Network Attached Storage) for hosting the shared folders, for example using a CIFS mount.

4. Deploying WIPP

A deployment script setup.sh is provided in the WIPP Zip file. This script must be executed from the Swarm manager host.

- Make sure that the execution permissions are set for the script:
chmod a+x setup.sh
- Run the script from the Swarm manager host to deploy WIPP on the Docker Swarm:
./setup.sh wippnet path/to/wippdbvolume path/to/wippdatavolume path/to/pegasusdbvolume path/to/pegasusdatavolume 4G
Replace "4G" by the maximum amount of RAM you want to allow for the image processing algorithms. This amount should be scaled according to the available RAM on the host and the size of the data you want to process, knowing that algorithms may require up to 10 times the size of a single image in RAM size, i.e., a 1GB image may need up to 10GB of RAM. Format: size[g|G|m|M|k|K]
This script will pull the WIPP Docker images from the WIPP Docker Hub repository, as well as the public MongoDB (database) image, and deploy them as services in the Docker Swarm.
After running the script, the services will start their initialization. To get the status of the WIPP services, run:
docker service ls
The output should be similar to the following:
ID            NAME     MODE        REPLICAS  IMAGE
1x4y8ee96kn2  master   replicated  0/1       wipp/wipp_master:2.0.0
3wx2zpmbj6om  wipp     replicated  0/1       wipp/wipp:2.0.0
iyyrjoln0t0t  mongodb  replicated  0/1       mongo:3.4
n3bgifaet7vf  exec     replicated  0/1       wipp/wipp_executor:2.0.0
REPLICAS 0/1 means services are not yet started (either loading or failing to start).

For detailed information about a service:

docker service ps $NAME
For example, to check the state of the mongodb (database) service: docker service ps mongodb 
ID            NAME       IMAGE         NODE            DESIRED STATE  CURRENT STATE           ERROR  PORTS
02zpjbxq42a7  mongodb.1  mongo:latest  vm-itl-ssd-063  Running        Running 44 seconds ago

WIPP system services will start in this order: mongodb, master, wipp, exec.
Once everything is started (after a few minutes), the state of the services should be similar to the following:
docker service ls

ID            NAME     MODE        REPLICAS  IMAGE
1x4y8ee96kn2  master   replicated  1/1       wipp/wipp_master:2.0.0
3wx2zpmbj6om  wipp     replicated  1/1       wipp/wipp:2.0.0
iyyrjoln0t0t  mongodb  replicated  1/1       mongo:3.4
n3bgifaet7vf  exec     replicated  1/1       wipp/wipp_executor:2.0.0

5. Opening firewall ports for the WIPP system

WIPP is running through the ports 18080 (web interface) and 15005 (Pegasus dashboard). In order to access the WIPP system, these ports have to be open for communication.

- For example, using ufw:
sudo ufw allow 18080
sudo ufw allow 15005

6. Accessing WIPP system web interface

Once deployed, the WIPP web interface will be accessible from http://host-ip:18080 and the Pegasus dashboard (for troubleshooting any failing jobs) will be accessible from https://host-ip:15005 (replace host-ip by the IP address of the host.)

The Pegasus dashboard is accessible via HTTPS, but does not ship with a SSL certificate. Some web browsers will complain about the lack of valid certificate and ask the user to add a security exception, or confirm "Proceed to the website (unsafe)" before being able to access the dashboard. The dashboard is set up to be accessible with the credentials "wipp/zaq123"".

WIPP deployment using Docker Machine (personal or test instance only)

For testing the WIPP system or setting up a personal instance on a personal computer, we provide a solution for a pre-configured and automatic single-host installation of WIPP using Docker Machines.
This installation has been tested on:
  • - Ubuntu 16.04LTS
  • - MacOS X Sierra
  • - Windows 7

Prerequisites

  • - Download and unzip the WIPP Zip file containing the WIPP installation files here
  • - Install Docker
  • - Install Docker Compose
  • - Install VirtualBox

Notes: For Windows and MacOS, these tools will be automatically installed after installing Docker Toolbox. A step-by-step guide on how to install Docker Toolbox on Windows 7 is provided in the documentation PDF "WIPP-docker-machine-deployment.pdf" under the "doc" folder of the downloaded WIPP Zip file.
If using Docker for Mac (for MacOS X El Capitan 10.11 and newer), VirtualBox 4.3.30 or newer has to be installed separately.
Docker for Windows (for Windows 10) is not supported for this WIPP installation, due to the use of VirtualBox, Docker Toolbox must be used instead.

Please check the documentation PDF "WIPP-docker-machine-deployment.pdf" under the "doc" folder, containing step-by-step instructions for installing Docker and deploying WIPP on a Windows 7 computer.

Configuration

The installation script (file wipp.sh) is pre-configured for:
  • - Creating a Docker machine (Virtual Machine) using 2 CPUs, 8GB of RAM and 80GB of disk space,
  • - Deploying the WIPP system on this Docker machine, allowing 4GB of RAM for computations.
You may change this configuration by modifying the following values at the top of the script wipp.sh:
#!/bin/bash
export cpus=2               # 2 CPUs - number of CPUs allocated to VM
export ram=8192             # 8 GB - amount of RAM allocated to VM in MB
export disk=80000           # ~80GB - disk space allocated to VM (hosting the WIPP system + data)
export backend_memory="4G"  # 4 GB - maximum amount of RAM in GB allocated to computations, should be lower than amount of RAM allocated to the VM

Installation

Run the installation script (file wipp.sh) using the setup option:
sh wipp.sh setup

Accessing the WIPP interface

The installation script will provide the URL for accessing WIPP and the Pegasus dashboard.
The IP address will be the one of the created Docker machine, can be retrieved using:
docker-machine ip wipp
  • - Accessing WIPP: http://displayed_ip:18080
  • - Accessing the Pegasus dashboard: https://displayed_ip:15005