Create Spark cluster using Terraform: Difference between revisions

Revision as of 13:23, 31 October 2022

OpenStack

Install OpenStack

On your local machine, create an exercise-5 folder, for example a subfolder of info319-exercises, and cd into it.

Install openstackclient. There are several ways to do this. On Ubuntu Linux, you can do:

sudo apt install python3-openstackclient

(You may have to reinstall python3-six and python3-urllib3. You may also need python3-dev.)

Other guides suggest you install it as a Python package (using a virtual environment if you want):

pip install python-openstackclient

Configure OpenStack for command line

If you want to run OpenStack from the command line, create the file keystonerc.sh on your local machine:

touch keystonerc.sh
chmod 0600 keystonerc.sh

Use this as a template:

export OS_USERNAME=YOUR_USER_NAME@uib.no
export OS_PROJECT_NAME=uib-info-YOUR_NREC_PROJECT
export OS_PASSWORD=g3...YOUR_PASSWORD...Qb
export OS_AUTH_URL=https://api.nrec.no:5000/v3
export OS_IDENTITY_API_VERSION=3
export OS_USER_DOMAIN_NAME=dataporten
export OS_PROJECT_DOMAIN_NAME=dataporten
export OS_REGION_NAME=YOUR_REGION_EITHER_bgo_OR_osl
export OS_INTERFACE=public
export OS_NO_CACHE=1
export OS_TENANT_NAME=$OS_PROJECT_NAME

Test run OpenStack from command line

Test with:

. keystonerc.sh
openstack server list

(It can be quite slow.)

Other test commands:

openstack image list
openstack flavor list
openstack network list
openstack keypair list
openstack security group list

openstack --help lists all the possible commands.

Task: Try to create an instance with openstack server create ..., then delete it. You can use the NREC Overview to see the results.

Terraform

Install Terraform

Install Terraform along the lines of this guide:

sudo apt update
sudo apt install  software-properties-common gnupg2 curl
curl https://apt.releases.hashicorp.com/gpg | gpg --dearmor > hashicorp.gpg
sudo install -o root -g root -m 644 hashicorp.gpg /etc/apt/trusted.gpg.d/
sudo apt-add-repository "deb [arch=$(dpkg --print-architecture)] https://apt.releases.hashicorp.com focal main"
sudo apt update
sudo apt install terraform

Configure Terraform

Create a configuration file info319-cluster.tf:

# configure the OpenStack provider
terraform {
  required_providers {
    openstack = {
      source = "terraform-provider-openstack/openstack"
    }
  }
}

provider "openstack" {
}

Initialise Terraform with:

terraform init

This should install the provider that connects Terraform with OpenStack. You may sometimes need to re-initialise Terraform, for example with:

terraform init -upgrade

This can happen if you run Terraform in the same shared folder from different local machines and is usually unproblematic.

Test run Terraform

Append something like this to info319-cluster.tf:

# test instance
resource "openstack_compute_instance_v2" "terraform-test" {
  name            = "terraform-test"
  image_name      = "GOLD Ubuntu 22.04 LTS"
  flavor_name     = "m1.large"
  security_groups = ["default", "info319-spark-cluster"]
  network {
    name = "dualStack"
  }
}

Try to run Terraform with:

terraform plan

terraform plan lists everything terraform will do if you apply it. This list is important to check so you do not permanently delete something critical, like disks/volumes with important data on them...

To create the new test instance:

terraform apply  # answer 'yes' to approve

Check Compute -> Instances in the NREC Overview to see the new instance appear.

(You can also run terraform plan -out plan.tf to save the plan to a file and then run it faster with terraform apply plan.tf.)

user-data.cfg

Later, we will use a tool called Ansible to install software and otherwise configure the new instances. But it is sometimes useful to do a few basic initialisation steps already when the instance is created. To do this, you can create in initialisation script called something like user-data.cfg, or example:

#cloud-config

apt_upgrade: true

packages:
- emacs

power_state:
  delay: "+3"
  mode: reboot

This script upgrades Ubuntu, waits 3 minutes to give the upgrade time, and restarts the new instance. (It also installs the text emacs editor.)

To run this script when a new instance is created, add the line

  user_data       = file("user-data.cfg")

to info319-cluster.tf' and run terraform apply. After a few minues, log in to check that new instances now has emacs installed. (If you try to connect too early you will receive a Connection closed by UNKNOWN port 65535 message.)

~/.config/openstack/clouds.yaml (optional)

Instead of keeping the Openstack configuration (which Terraform also needs) as environment variables (OS_*) defined in keystonerc.sh , you can keep them in a file.

On your local machine, create ~/.config/openstack/clouds.yaml with 0600 (or -rw-------) access, for example:

clouds:
  info319-cluster:
    auth:
      auth_url: https://api.nrec.no:5000/v3
      project_name: uib-info-YOUR_NREC_PROJECT
      username: YOUR_USER_NAME@uib.no
      password: g3...YOUR_PASSWORD...Qb
      user_domain_name: dataporten
      project_domain_name: dataporten
    identity_api_version: 3
    region_name: YOUR_REGION_EITHER_bgo_OR_osl
    interface: public
    operation_log:
      logging: TRUE
      file: openstackclient_admin.log
      level: info

You need to change your info319-cluster.tf file from

provider "openstack" {
}

to

provider "openstack" {
  cloud  = "info319-cluster"  # defined in a clouds.yaml file
}

The advantage of this setup is that it easier manage multiple clusters from the same local machine. You do not need to run the keystonerc.sh script, and you avoid problems with changed environment variables.

Note: There are now two ways to run openstack. One assumes that all the OS_* environment variables defined in keystonerc.sh are set. The other assumes that there is a ~/.config/openstack/clouds.yaml file and that OS_CLOUD is set, for example:

OS_CLOUD=info319-cluster openstack server list

Spark cluster

Use Terraform to create a Spark cluster, similar to the one in Exercise 4. The Resources menu in the left pane of this page lists all the available resources, but it is easier to list the most relevant ones with openstack resource_type list.

Create or import a key pair

Your terraform-test instance still has no keypair. To add a keypair to info319-cluster.tf, use the documentation here to import the public ~/.ssh/info319-spark-cluster.pub SSH key from before.

Add the keypair resource to the terraform-test resource with a line like this:

  key_pair        = "info319-spark-cluster"

Rerun terraform plan, terraform apply, and sometimes terraform destroy continuously as you build the cluster to check that things work.

Test login

Use

openstack server list  # or OS_CLOUD=info319-cluster openstack ...

to find the IPv4 and IPv6 addresses of terraform-test. Log in with for example:

ssh -i ~/.ssh/info319-spark-cluster ubuntu@158.37.165.222

and

ssh -i ~/.ssh/info319-spark-cluster -J YOUR_USERNAME@login.uib.no ubuntu@2001:700:2:8300::21f4

Create the Spark driver

The terraform-test resource can be renamed to for example terraform-driver and modified as you like according to the documentation here.

Create the Spark workers

You can create multiple terraform-worker- resources in a similar way, but adding a line

count            = 6

and using ${count.index} as a variable inside the worker name. Use IPv6 instead of dualStack.

Check that you can login to the new instances using ssh and a IPv6 JumpHost.

As info319-cluster.tf grows larger, you can ensure consistency by defining local variables in the beginning of the file, for example:

locals {
    cluster_prefix = "terraform-"
    num_workers = 6
    driver_name = "${local.cluster_prefix}driver"
    worker_prefix = "${local.cluster_prefix}worker-"
    worker_names = [
        for idx in range(local.num_workers) : 
            "${local.worker_prefix}${idx}"]
    all_names = concat([local.driver_name], local.worker_names)
}

Inside strings, you can refer to the variables using ${local.var_name}. Outside of string, you can write just local.var_name.

Create and attach volumes

Follow the guide here to attach volumes to the instances.

You can use local.num_workers and count= both to create multiple volumes and to attach them. For example, an expression like this can be used to connect all the worker volumes (after you have created them) to their respective workers:

resource "openstack_compute_volume_attach_v2" "attached-to-workers" {
  count       = local.num_workers
  instance_id = "${openstack_compute_instance_v2.terraform-workers[count.index].id}"
  volume_id   = "${openstack_blockstorage_volume_v2.terraform-worker-volumes[count.index].id}"
}

Log in with SSH and to ls /dev to see that the volumes are attached where they should (but not yet partitioned, formatted, and mounted).

Create hosts files

Make OpenStack write out the file ipv4-hosts looking like this:

158.37.65.58 terraform-driver
10.1.2.234 terraform-worker-0
...
10.1.2.63 terraform-worker-5

and ipv6-hosts looking like this:

2001:700:2:8300::28c4 terraform-driver
2001:700:2:8301::14a1 terraform-worker-0
...
2001:700:2:8310::120d terraform-worker-5

Tips:

you can define outputs from terraform like this:

output "terraform-driver-ipv4" {
    value = openstack_compute_instance_v2.terraform-driver.access_ip_v4
}

you can use the terraform console to explore expressions you can use to define outputs and locals, for example:

openstack_compute_instance_v2.terraform-workers[5]

when an output works as expected, you can redefine it as a local variable and use it to defined more complex outputs
a few useful functions are:

    concat([local.element], local.list)
    length(string)
    substr(string, offset, length)
    zipmap(list1, list2)

you can create lists using this syntax:

    terraform-workers-ipv6 = [
    	for idx in range(number):
	    list[idx].attribute
    ]

you can output to a file like this:

resource "local_file" "ipv4-hosts-file" {
    content = "${local.ipv4-hosts-string}\n"
    filename = "ipv4-hosts"
}

Configure SSH

On your local host, change info319-cluster.tf so it also writes a file like this to ~/.ssh/config.terraform-hosts:

Host terraform-driver
    Hostname 2001:700:2:8300::28c4

Host terraform-worker-0
    Hostname 2001:700:2:8301::14a1

...

Host terraform-worker-5
    Hostname 2001:700:2:8301::120d

On your local host, add these lines to ~/.ssh/config:

Host terraform-*
     User ubuntu
     IdentityFile ~/.ssh/info319-spark-cluster
     ProxyJump sinoa@login.uib.no
     StrictHostKeyChecking no
     UserKnownHostsFile /dev/null

Include ./config.terraform-hosts

This is a similar wildcard entry to the one in Exercise 4, but this time the host names and IPv6 addresses are included from an external file (you need a recent SSH version to do this).

You can now log into all the cluster machines by name, even after you increase the number of worker nodes:

ssh terraform-worker-4

@@ Line 164: / Line 164: @@
   openstack server list  # or OS_CLOUD=info319-cluster openstack ...
 to find the IPv4 and IPv6 addresses of ''terraform-test''. Log in with for example:
-  ssh -i ~/.ssh/info319-spark-cluster ubuntu@158.37.65.222
+  ssh -i ~/.ssh/info319-spark-cluster ubuntu@158.37.165.222
 and
-  ssh -i ~/.ssh/info319-spark-cluster -J YOUR_USERNAME@login.uib.no ubuntu@2001:700:2:8300::22f4
+  ssh -i ~/.ssh/info319-spark-cluster -J YOUR_USERNAME@login.uib.no ubuntu@2001:700:2:8300::21f4
 === Create the Spark driver ===