Create Spark cluster using Terraform: Difference between revisions

From info319
 
(5 intermediate revisions by the same user not shown)
Line 162: Line 162:
=== Test login ===
=== Test login ===
Use
Use
  openstack server list
  openstack server list # or OS_CLOUD=info319-cluster openstack ...
to find the IPv4 and IPv6 addresses of ''terraform-test''. Log in with for example:
to find the IPv4 and IPv6 addresses of ''terraform-test''. Log in with for example:
  ssh -i ~/.ssh/info319-spark-cluster ubuntu@158.37.65.222
  ssh -i ~/.ssh/info319-spark-cluster ubuntu@158.37.165.222
and
and
  ssh -i ~/.ssh/info319-spark-cluster -J YOUR_USERNAME@login.uib.no ubuntu@2001:700:2:8300::22f4
  ssh -i ~/.ssh/info319-spark-cluster -J YOUR_USERNAME@login.uib.no ubuntu@2001:700:2:8300::21f4


=== Create the Spark driver ===
=== Create the Spark driver and workers ===
The ''terraform-test'' resource can be renamed to for example ''terraform-driver'' and modified as you like according to the [https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs/resources/compute_instance_v2 documentation here].
The ''terraform-test'' resource can be renamed to for example ''terraform-driver'' and modified as you like according to the [https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs/resources/compute_instance_v2 documentation here].


=== Create the Spark workers ===
You can create multiple ''terraform-worker-'' resources in a similar way, but adding a line  
You can create multiple ''terraform-worker-'' resources in a similar way, but adding a line  
  count            = 6
  count            = 6
Line 178: Line 177:
Check that you can login to the new instances using '''ssh''' and a IPv6 JumpHost.
Check that you can login to the new instances using '''ssh''' and a IPv6 JumpHost.


=== Local Terraform variables ===
As ''info319-cluster.tf'' grows larger, you can ensure consistency by defining local variables in the beginning of the file, for example:
As ''info319-cluster.tf'' grows larger, you can ensure consistency by defining local variables in the beginning of the file, for example:
  locals {
  locals {
Line 189: Line 189:
     all_names = concat([local.driver_name], local.worker_names)
     all_names = concat([local.driver_name], local.worker_names)
  }
  }
Inside strings, you can refer to the variables using ''${local.var_name}''. Outside of string, you can write just ''local.var_name''.
Inside strings, you can refer to a variable as ''${local.var_name}''. Outside of strings, you can write just ''local.var_name''.


=== Create and attach volumes ===
=== Create and attach volumes ===
Line 201: Line 201:
  }
  }


Log in with SSH and to '''ls /dev''' to see that the volumes are attached where they should (but not yet partitioned, formatted, and mounted).
Log in with SSH and do '''ls /dev''' to see that the volumes are attached where they should (but they are not yet partitioned, formatted, and mounted).


=== Create ''hosts'' files ===
=== Create ''hosts'' files ===
Make OpenStack write out the file ''ipv4-hosts'' looking like this:
Make OpenStack write out the files ''ipv4-hosts'' looking like this:
  158.37.65.58 terraform-driver
  158.37.65.59 terraform-driver
  10.1.2.234 terraform-worker-0
  10.1.2.233 terraform-worker-0
  ...
  ...
  10.1.2.63 terraform-worker-5
  10.1.2.63 terraform-worker-5
and ''ipv6-hosts'' looking like this:
and ''ipv6-hosts'' looking like this:
  2001:700:2:8300::28c4 terraform-driver
  2001:700:2:8300::27c4 terraform-driver
  2001:700:2:8301::14a1 terraform-worker-0
  2001:700:2:8301::13a1 terraform-worker-0
  ...
  ...
  2001:700:2:8310::120d terraform-worker-5
  2001:700:2:8310::110d terraform-worker-5
   
   
Tips:
Tips:
Line 220: Line 220:
     value = openstack_compute_instance_v2.terraform-driver.access_ip_v4
     value = openstack_compute_instance_v2.terraform-driver.access_ip_v4
  }
  }
* you can use the '''[https://www.terraform.io/cli/commands/console terraform console]''' to explore expressions you can use to define outputs and locals, for example:
* you can use the '''[https://www.terraform.io/cli/commands/console terraform console]''' to explore expressions you can use to define outputs and locals, for example (inside the console):
  openstack_compute_instance_v2.terraform-workers[5]
  openstack_compute_instance_v2.terraform-workers[5]
* when an output works as expected, you can redefine it as a local variable and use it to defined more complex outputs
* when an output works as expected, you can redefine it as a local variable and use it to define more complex outputs
* a few useful functions are:
* a few useful functions (you may not need all of them) are:
    concat([local.element], local.list)
     length(string)
     length(string)
    join(string, list)
    concat([element], list)
     substr(string, offset, length)
     substr(string, offset, length)
     zipmap(list1, list2)
     zipmap(list1, list2)
Line 242: Line 243:
On your local host, change ''info319-cluster.tf'' so it also writes a file like this to ''~/.ssh/config.terraform-hosts'':
On your local host, change ''info319-cluster.tf'' so it also writes a file like this to ''~/.ssh/config.terraform-hosts'':
  Host terraform-driver
  Host terraform-driver
     Hostname 2001:700:2:8300::28c4
     Hostname 2001:700:2:8300::27c4
   
   
  Host terraform-worker-0
  Host terraform-worker-0
     Hostname 2001:700:2:8301::14a1
     Hostname 2001:700:2:8301::13a1
   
   
  ...
  ...
   
   
  Host terraform-worker-5
  Host terraform-worker-5
     Hostname 2001:700:2:8301::120d
     Hostname 2001:700:2:8301::110d


On your local host, add these lines to ''~/.ssh/config'':
On your local host, add these lines to ''~/.ssh/config'':
  Host terraform-*
  Host terraform-* localhost
       User ubuntu
       User ubuntu
       IdentityFile ~/.ssh/info319-spark-cluster
       IdentityFile ~/.ssh/info319-spark-cluster

Latest revision as of 13:38, 31 October 2022

OpenStack

Install OpenStack

On your local machine, create an exercise-5 folder, for example a subfolder of info319-exercises, and cd into it.

Install openstackclient. There are several ways to do this. On Ubuntu Linux, you can do:

sudo apt install python3-openstackclient

(You may have to reinstall python3-six and python3-urllib3. You may also need python3-dev.)

Other guides suggest you install it as a Python package (using a virtual environment if you want):

pip install python-openstackclient

Configure OpenStack for command line

If you want to run OpenStack from the command line, create the file keystonerc.sh on your local machine:

touch keystonerc.sh
chmod 0600 keystonerc.sh

Use this as a template:

export OS_USERNAME=YOUR_USER_NAME@uib.no
export OS_PROJECT_NAME=uib-info-YOUR_NREC_PROJECT
export OS_PASSWORD=g3...YOUR_PASSWORD...Qb
export OS_AUTH_URL=https://api.nrec.no:5000/v3
export OS_IDENTITY_API_VERSION=3
export OS_USER_DOMAIN_NAME=dataporten
export OS_PROJECT_DOMAIN_NAME=dataporten
export OS_REGION_NAME=YOUR_REGION_EITHER_bgo_OR_osl
export OS_INTERFACE=public
export OS_NO_CACHE=1
export OS_TENANT_NAME=$OS_PROJECT_NAME

Test run OpenStack from command line

Test with:

. keystonerc.sh
openstack server list

(It can be quite slow.)

Other test commands:

openstack image list
openstack flavor list
openstack network list
openstack keypair list
openstack security group list

openstack --help lists all the possible commands.

Task: Try to create an instance with openstack server create ..., then delete it. You can use the NREC Overview to see the results.

Terraform

Install Terraform

Install Terraform along the lines of this guide:

sudo apt update
sudo apt install  software-properties-common gnupg2 curl
curl https://apt.releases.hashicorp.com/gpg | gpg --dearmor > hashicorp.gpg
sudo install -o root -g root -m 644 hashicorp.gpg /etc/apt/trusted.gpg.d/
sudo apt-add-repository "deb [arch=$(dpkg --print-architecture)] https://apt.releases.hashicorp.com focal main"
sudo apt update
sudo apt install terraform

Configure Terraform

Create a configuration file info319-cluster.tf:

# configure the OpenStack provider
terraform {
  required_providers {
    openstack = {
      source = "terraform-provider-openstack/openstack"
    }
  }
}

provider "openstack" {
}

Initialise Terraform with:

terraform init

This should install the provider that connects Terraform with OpenStack. You may sometimes need to re-initialise Terraform, for example with:

terraform init -upgrade

This can happen if you run Terraform in the same shared folder from different local machines and is usually unproblematic.

Test run Terraform

Append something like this to info319-cluster.tf:

# test instance
resource "openstack_compute_instance_v2" "terraform-test" {
  name            = "terraform-test"
  image_name      = "GOLD Ubuntu 22.04 LTS"
  flavor_name     = "m1.large"
  security_groups = ["default", "info319-spark-cluster"]
  network {
    name = "dualStack"
  }
}

Try to run Terraform with:

terraform plan

terraform plan lists everything terraform will do if you apply it. This list is important to check so you do not permanently delete something critical, like disks/volumes with important data on them...

To create the new test instance:

terraform apply  # answer 'yes' to approve

Check Compute -> Instances in the NREC Overview to see the new instance appear.

(You can also run terraform plan -out plan.tf to save the plan to a file and then run it faster with terraform apply plan.tf.)

user-data.cfg

Later, we will use a tool called Ansible to install software and otherwise configure the new instances. But it is sometimes useful to do a few basic initialisation steps already when the instance is created. To do this, you can create in initialisation script called something like user-data.cfg, or example:

#cloud-config

apt_upgrade: true

packages:
- emacs

power_state:
  delay: "+3"
  mode: reboot

This script upgrades Ubuntu, waits 3 minutes to give the upgrade time, and restarts the new instance. (It also installs the text emacs editor.)

To run this script when a new instance is created, add the line

  user_data       = file("user-data.cfg")

to info319-cluster.tf' and run terraform apply. After a few minues, log in to check that new instances now has emacs installed. (If you try to connect too early you will receive a Connection closed by UNKNOWN port 65535 message.)

~/.config/openstack/clouds.yaml (optional)

Instead of keeping the Openstack configuration (which Terraform also needs) as environment variables (OS_*) defined in keystonerc.sh , you can keep them in a file.

On your local machine, create ~/.config/openstack/clouds.yaml with 0600 (or -rw-------) access, for example:

clouds:
  info319-cluster:
    auth:
      auth_url: https://api.nrec.no:5000/v3
      project_name: uib-info-YOUR_NREC_PROJECT
      username: YOUR_USER_NAME@uib.no
      password: g3...YOUR_PASSWORD...Qb
      user_domain_name: dataporten
      project_domain_name: dataporten
    identity_api_version: 3
    region_name: YOUR_REGION_EITHER_bgo_OR_osl
    interface: public
    operation_log:
      logging: TRUE
      file: openstackclient_admin.log
      level: info

You need to change your info319-cluster.tf file from

provider "openstack" {
}

to

provider "openstack" {
  cloud  = "info319-cluster"  # defined in a clouds.yaml file
}

The advantage of this setup is that it easier manage multiple clusters from the same local machine. You do not need to run the keystonerc.sh script, and you avoid problems with changed environment variables.

Note: There are now two ways to run openstack. One assumes that all the OS_* environment variables defined in keystonerc.sh are set. The other assumes that there is a ~/.config/openstack/clouds.yaml file and that OS_CLOUD is set, for example:

OS_CLOUD=info319-cluster openstack server list

Spark cluster

Use Terraform to create a Spark cluster, similar to the one in Exercise 4. The Resources menu in the left pane of this page lists all the available resources, but it is easier to list the most relevant ones with openstack resource_type list.

Create or import a key pair

Your terraform-test instance still has no keypair. To add a keypair to info319-cluster.tf, use the documentation here to import the public ~/.ssh/info319-spark-cluster.pub SSH key from before.

Add the keypair resource to the terraform-test resource with a line like this:

  key_pair        = "info319-spark-cluster"

Rerun terraform plan, terraform apply, and sometimes terraform destroy continuously as you build the cluster to check that things work.

Test login

Use

openstack server list  # or OS_CLOUD=info319-cluster openstack ...

to find the IPv4 and IPv6 addresses of terraform-test. Log in with for example:

ssh -i ~/.ssh/info319-spark-cluster ubuntu@158.37.165.222

and

ssh -i ~/.ssh/info319-spark-cluster -J YOUR_USERNAME@login.uib.no ubuntu@2001:700:2:8300::21f4

Create the Spark driver and workers

The terraform-test resource can be renamed to for example terraform-driver and modified as you like according to the documentation here.

You can create multiple terraform-worker- resources in a similar way, but adding a line

count            = 6

and using ${count.index} as a variable inside the worker name. Use IPv6 instead of dualStack.

Check that you can login to the new instances using ssh and a IPv6 JumpHost.

Local Terraform variables

As info319-cluster.tf grows larger, you can ensure consistency by defining local variables in the beginning of the file, for example:

locals {
    cluster_prefix = "terraform-"
    num_workers = 6
    driver_name = "${local.cluster_prefix}driver"
    worker_prefix = "${local.cluster_prefix}worker-"
    worker_names = [
        for idx in range(local.num_workers) : 
            "${local.worker_prefix}${idx}"]
    all_names = concat([local.driver_name], local.worker_names)
}

Inside strings, you can refer to a variable as ${local.var_name}. Outside of strings, you can write just local.var_name.

Create and attach volumes

Follow the guide here to attach volumes to the instances.

You can use local.num_workers and count= both to create multiple volumes and to attach them. For example, an expression like this can be used to connect all the worker volumes (after you have created them) to their respective workers:

resource "openstack_compute_volume_attach_v2" "attached-to-workers" {
  count       = local.num_workers
  instance_id = "${openstack_compute_instance_v2.terraform-workers[count.index].id}"
  volume_id   = "${openstack_blockstorage_volume_v2.terraform-worker-volumes[count.index].id}"
}

Log in with SSH and do ls /dev to see that the volumes are attached where they should (but they are not yet partitioned, formatted, and mounted).

Create hosts files

Make OpenStack write out the files ipv4-hosts looking like this:

158.37.65.59 terraform-driver
10.1.2.233 terraform-worker-0
...
10.1.2.63 terraform-worker-5

and ipv6-hosts looking like this:

2001:700:2:8300::27c4 terraform-driver
2001:700:2:8301::13a1 terraform-worker-0
...
2001:700:2:8310::110d terraform-worker-5

Tips:

  • you can define outputs from terraform like this:
output "terraform-driver-ipv4" {
    value = openstack_compute_instance_v2.terraform-driver.access_ip_v4
}
  • you can use the terraform console to explore expressions you can use to define outputs and locals, for example (inside the console):
openstack_compute_instance_v2.terraform-workers[5]
  • when an output works as expected, you can redefine it as a local variable and use it to define more complex outputs
  • a few useful functions (you may not need all of them) are:
    length(string)
    join(string, list)
    concat([element], list)
    substr(string, offset, length)
    zipmap(list1, list2)
  • you can create lists using this syntax:
    terraform-workers-ipv6 = [
    	for idx in range(number):
	    list[idx].attribute
    ]
  • you can output to a file like this:
resource "local_file" "ipv4-hosts-file" {
    content = "${local.ipv4-hosts-string}\n"
    filename = "ipv4-hosts"
}

Configure SSH

On your local host, change info319-cluster.tf so it also writes a file like this to ~/.ssh/config.terraform-hosts:

Host terraform-driver
    Hostname 2001:700:2:8300::27c4

Host terraform-worker-0
    Hostname 2001:700:2:8301::13a1

...

Host terraform-worker-5
    Hostname 2001:700:2:8301::110d

On your local host, add these lines to ~/.ssh/config:

Host terraform-* localhost
     User ubuntu
     IdentityFile ~/.ssh/info319-spark-cluster
     ProxyJump sinoa@login.uib.no
     StrictHostKeyChecking no
     UserKnownHostsFile /dev/null
Include ./config.terraform-hosts

This is a similar wildcard entry to the one in Exercise 4, but this time the host names and IPv6 addresses are included from an external file (you need a recent SSH version to do this).

You can now log into all the cluster machines by name, even after you increase the number of worker nodes:

ssh terraform-worker-4