Highly Scalable Data Service (HSDS)

Note: If you only want to use the dataset, this is unnecessary. See instructions on how to use the

This documentation has been modified from HOBO-request documentation by LF Murillo.

Accompanying files can be found in the docs/k3s-conf directory after cloning Prop3D (git clone https://github.com/bouralab/Prop3D)

You can set up HSDS on any cloud platform or a single machine using Docker or on a cluster using Kubernetes (or AKS on Microsoft Azure). If you are not using a single machine, please follow the official Kubernetes instruction instructions.

Single Machine Setup

For single machine setup, please clone the HSDS repository.

Running HSDS (Highly Scalable Data Service) on K3s

This document describes the installation of HSDS on K3s, a certified Kubernetes distribution. In our example, we will install HSDS on a single host running K3s for testing purposes, but you can run it on as many hosts as you need.

Requirements

K3s is pre-packaged for various Linux distributions. Before getting started, make sure you satisfy the requirements for k3s with your distro which include cgroups, legacy iptables, and more.

Our example is based on k3s version 1.21.5+k3s2.

Installing K3s

If your distro does not have K3s packaged, you can run the installation script yourself with the following command:

curl -sfL https://get.k3s.io | sh -

After it finishes the script, you should have k3s up and running as a systemd service:

k3s.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enab
Active: active (running) since Wed 2021-11-03 23:41:55 EDT; 2min 21s ago
    Docs: https://k3s.io
Process: 2446 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet
Process: 2448 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/
Process: 2449 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCE
Main PID: 2451 (k3s-server)
    Tasks: 99
Memory: 1.3G
CGroup: /system.slice/k3s.service
...

You can check if everything is ready for hsds installation with kubectl, as one would for the official distribution of Kubernetes:

sudo kubectl get nodes

In our case, we will only get one node listed as we are just running a single-node install for testing purposes:

NAME             STATUS   ROLES                  AGE     VERSION
$YOUR_HOSTNAME   Ready    control-plane,master   5m55s   v1.21.5+k3s2

Looking good! Time to continue on to deploy HSDS.

Deploying HSDS (Highly Scalable Data Service)

For our test, we will deploy HSDS with 8 data nodes (pods) on a regular filesystem (described as a POSIX storage install in the HSDS documentation).

Instead of using the yml distributed with HSDS, you will need use to deployment files that were modified to work with K3s. You will find these files in the directory doc/k3s-conf in the root of the hobo-connect repository on Gitlab.

HSDS has been ‘containerized’ for deployment with Kubernetes, so we can skip the step of preparing an application to be deployed.

Now, we can proceed to the first step, which is to apply the configuration for ‘role-based access control’ (RBAC):

sudo kubectl apply -f k8s_rbac.yml

This will enable HSDS pods to “find each other” so that requests can be accelerated by distributing reads and writes across multiple pods.

Next, edit the file override.yml with the parameters that you need and proceed to run a command that will create a ‘configuration map’ that can be used by HSDS. This is the approach that Kubernetes uses to separate configurations that are specific to your use-case and environment from your container images (which have the standalone application only):

sudo kubectl create configmap hsds-config --from-file=config.yml --from-file=override.yml

HSDS can use a password file to authenticate users using the HTTP Basic Auth protocol (authentication using OpenID or Azure Active Directory is also supported). To construct a password file, create a text file like the following (hopefully using more secure passwords!):

admin:admin
test_user1:test
test_user2:test

Each line in the file is in the format <username>:<password>. You’ll need the admin user for performing certain tasks like setting up top-level folders.

Once you have the password file, run:

kubectl create secret generic user-password --from-file=<passwd_file>

You can always check to see if everything has been loaded properly, but you really do not have to. Using kubectl you can get (request) info about configmap:

sudo kubectl get configmap

The output should include the configmap we just loaded:

NAME               DATA   AGE
kube-root-ca.crt   1      61m
hsds-config        2      21s

Awesome, just a few more commands and we are done!

We need to configure storage by ‘claiming a persistent volume’ (PVC). Before running the command, edit the file hsds_pvc.yml and set the disk space that you want to use. Then, proceed with the following command:

sudo kubectl apply -f hsds_pvc.yml

K3s will output persistentvolumeclaim/hsds-pvc created, so you know you are onto something good.

Now, let’s run two more commands for deploying the HSDS container and expose the HSDS service in the cluster, respectively:

sudo kubectl apply -f hsds_deployment.yml
sudo kubectl apply -f hsds_service.yml

You may want to check if everything is good on the K3s side:

You will see that it will change its STATUS from ContainerCreating to Running:

NAME                    READY   STATUS    RESTARTS   AGE
hsds-857754bf58-p2n8b   2/2     Running   0          103s

You may want to look into the service that has been provisioned as well:

sudo kubectl get services

And see something along the lines of this output:

NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kubernetes   ClusterIP   10.43.0.1       <none>        443/TCP          78m
hsds         NodePort    10.43.173.154   <none>        5101:32613/TCP   3m

You can try sending a request to the service like so:

$ curl http://10.43.173.154:5101/about

Replace the IP above with the cluster IP value for the HSDS service.

If you see a JSON response with a “status” key of “READY”, congratulations, you have HSDS up and running on K3s!

If not, you can review the HSDS logs to see what the problem is. Each pod has two containers named “sn” and “dn” which support frontend and backend aspects of the HSDS service.

To display the log for the sn container you can run:

kubectl logs hsds-857754bf58-p2n8b sn

Where hsds-8577… is the pod id. Similarly use “dn” to see the dn logs.

Since HSDS can be rather chatty, it can be useful to filter by just ERROR or WARN entries:

kubectl logs hsds-857754bf58-p2n8b sn | grep ERROR

You can tweak the number of pods (instances) of HSDS, scaling it up or down as needed with the following command. For our tests, we will run 8 replicas:

sudo kubectl scale --replicas=8 deployment/hsds

The number of HSDS pods you will be able to create depends on the amount of memory available on the machine. If you see one or more pods that stay in “Pending” status, it’s likely there’s not sufficient system resources to support that number of pods and you’ll need to scale down a bit.

But… before we say good-bye, make sure you configure the users of your test instance. Below, we will create the directory for the ‘admin’ user:

pip install h5pyd
hstouch -u admin -p admin_password -e http://<ip>:5101 /home/
# run the following for each user who will need a "home" folder:
hstouch -u admin -p admin_password -e http://<ip>:5101 -o <username> /home/<username>

Now each user who will be interacting with the system can run: hsconfigure. They will be prompted for server endpoint, username, and password. Information will be stored in a file .hscfg in their home directory. This will be used to authenticate with the server when using tools like: hsinfo, hsls, hsload, etc., and also when using h5pyd in Python scripts.

Happy K3s + HSDS testing!

– Sign-off: LF Murilo, 11-03-2021

Recommended HSDS changes

dn_ram: 6g
sn_ram: 6g
max_tcp_connections: 1000
max_task_count: 1000
aio_max_pool_connections: 264
metadata_mem_cache_size: 1g
chunk_mem_cache_size: 1g
data_cache_size: 1g
timeout: 120