Create Spark cluster using Terraform
OpenStack
Install OpenStack
On your local machine, create an exercise-5 folder, for example a subfolder of info319-exercises, and cd into it.
Install openstackclient. There are several ways to do this. On Ubuntu Linux, you can do:
sudo apt install python3-openstackclient
(You may have to reinstall python3-six and python3-urllib3. You may also need python3-dev.)
Other guides suggest you install it as a Python package (using a virtual environment if you want):
pip install python-openstackclient
Configure OpenStack for command line
If you want to run OpenStack from the command line, create the file keystonerc.sh on your local machine:
touch keystonerc.sh chmod 0600 keystonerc.sh
Use this as a template:
export OS_USERNAME=YOUR_USER_NAME@uib.no export OS_PROJECT_NAME=uib-info-YOUR_NREC_PROJECT export OS_PASSWORD=g3...YOUR_PASSWORD...Qb export OS_AUTH_URL=https://api.nrec.no:5000/v3 export OS_IDENTITY_API_VERSION=3 export OS_USER_DOMAIN_NAME=dataporten export OS_PROJECT_DOMAIN_NAME=dataporten export OS_REGION_NAME=YOUR_REGION_EITHER_bgo_OR_osl export OS_INTERFACE=public export OS_NO_CACHE=1 export OS_TENANT_NAME=$OS_PROJECT_NAME
Test run OpenStack from command line
Test with:
. keystonerc.sh openstack server list
(It can be quite slow.)
Other test commands:
openstack image list openstack flavor list openstack network list openstack keypair list openstack security group list
openstack --help lists all the possible commands.
Task: Try to create an instance with openstack server create ..., then delete it. You can use the NREC Overview to see the results.
Terraform
Install Terraform
Install Terraform along the lines of this guide:
sudo apt update sudo apt install software-properties-common gnupg2 curl curl https://apt.releases.hashicorp.com/gpg | gpg --dearmor > hashicorp.gpg sudo install -o root -g root -m 644 hashicorp.gpg /etc/apt/trusted.gpg.d/ sudo apt-add-repository "deb [arch=$(dpkg --print-architecture)] https://apt.releases.hashicorp.com focal main" sudo apt update sudo apt install terraform
Configure Terraform
Create a configuration file info319-cluster.tf:
# configure the OpenStack provider terraform { required_providers { openstack = { source = "terraform-provider-openstack/openstack" } } } provider "openstack" { }
Initialise Terraform with:
terraform init
This should install the provider that connects Terraform with OpenStack. You may sometimes need to re-initialise terraform, for example with:
terraform init -upgrade
This is usually unproblematic.
Test run Terraform
Append something like this to info319-cluster.tf:
# test instance resource "openstack_compute_instance_v2" "terraform-test" { name = "terraform-test" image_name = "GOLD Ubuntu 22.04 LTS" flavor_name = "m1.large" security_groups = ["default", "info319-spark-cluster"] network { name = "dualStack" } }
Try to run Terraform with:
terraform plan
terraform plan lists everything terraform will do if you apply it. This list is important to check so you do not permanently delete something critical, like disks/volumes.
To create the new test instance:
terraform apply # answer 'yes' to approve
Check Compute -> Instances in the NREC Overview to see the new instance appear.
user-data.cfg
Later, we will use a tool called Ansible to install software and otherwise configure the new instances. But it is useful to do a few initialisation steps already when an instance is created. To do this, you can create in initialisation script called something like user-data.cfg:
#cloud-config apt_upgrade: true packages: - emacs power_state: delay: "+3" mode: reboot
This script upgrades Ubuntu, waits 3 minutes to give the upgrade time, and restarts the new instance. (It also installs the text emacs editor.)
Add the line
user_data = file("user-data.cfg")
to info319-cluster.tf' and run terraform apply. After a few minues, log in to check that new instances now has emacs installed. (If you try to connect too early you will receive a Connection closed by UNKNOWN port 65535 message.)
~/.config/openstack/clouds.yaml (optional)
Instead of keeping the Openstack configuration (which Terraform also needs) as environment variables (OS_*) defined in keystonerc.sh , you can keep them in a file.
On your local machine, create ~/.config/openstack/clouds.yaml with 0600 (or -rw-------) access, for example:
clouds: info319-cluster: auth: auth_url: https://api.nrec.no:5000/v3 project_name: uib-info-YOUR_NREC_PROJECT username: YOUR_USER_NAME@uib.no password: g3...YOUR_PASSWORD...Qb user_domain_name: dataporten project_domain_name: dataporten identity_api_version: 3 region_name: YOUR_REGION_EITHER_bgo_OR_osl interface: public operation_log: logging: TRUE file: openstackclient_admin.log level: info
You need to change your info319-cluster.tf file from
provider "openstack" { }
to
provider "openstack" { cloud = "info319-cluster" # defined in a clouds.yaml file }
The advantage of this setup is that it easier manage multiple clusters from the same local machine, and you avoid problems with changed enrivonment variables.
Spark cluster
Use Terraform to create a Spark cluster, similar to the one in Exercise 4. The Resources menu in the left pane of this page gives you details.
Create or import a key pair
Your terraform-test instance still has no keypair. To add a keypair to info319-cluster.tf, use the documentation here to import the public ~/.ssh/info319-spark-cluster.pub SSH key from before.
Add the keypair resource to the terraform-test resource with a line like this:
key_pair = "info319-spark-cluster"
Rerun terraform plan and terraform apply continuously to check that things work.
Test login
Use
openstack server list
to find the IPv4 and IPv6 addresses of terraform-test. Log in with for example:
ssh -i ~/.ssh/info319-spark-cluster ubuntu@158.37.65.222
and
ssh -i ~/.ssh/info319-spark-cluster -J YOUR_USERNAME@login.uib.no ubuntu@2001:700:2:8300::22f4
Note: There are two ways to run openstack. One assumes that all the OS_* environment variables defined in keystonerc.sh are set. The other assumes that there is a ~/.config/openstack/clouds.yaml file and that OS_CLOUD is set, for example:
OS_CLOUD=info319-cluster openstack server list
Create the Spark driver
The terraform-test resource can be renamed to for example terraform-driver and modified as you like according to the documentation here.
Create the Spark workers
You can create multiple terraform-worker- resources in a similar way, but adding a line
count = 6
and using ${count.index} as a variable inside the worker name. Use IPv6 instead of dualStack.
Check that you can login to the new instances using ssh and a IPv6 JumpHost.
As info319-cluster.tf grows larger, you can ensure consistency by defining local variables in the beginning of the file, for example:
locals { cluster_prefix = "terraform-" num_workers = 6 driver_name = "${local.cluster_prefix}driver" worker_prefix = "${local.cluster_prefix}worker-" worker_names = [ for idx in range(local.num_workers) : "${local.worker_prefix}${idx}"] all_names = concat([local.driver_name], local.worker_names) }
Inside strings, you can refer to the variables using ${local.var_name}. Outside of string, you can write just local.var_name.
Create and attach volumes
Follow the guide here to attach volumes to the instances.
You can use local.num_workers and count= both to create multiple volumes and to attach them. For example, an expression like this can be used to connect all the worker volumes (after you have created them) to their respective workers:
resource "openstack_compute_volume_attach_v2" "attached-to-workers" { count = local.num_workers instance_id = "${openstack_compute_instance_v2.terraform-workers[count.index].id}" volume_id = "${openstack_blockstorage_volume_v2.terraform-worker-volumes[count.index].id}" }
Log in with SSH and to ls /dev to see that the volumes are attached where they should (but not yet partitioned, formatted, and mounted).
Create hosts files
Make OpenStack write out the file ipv4-hosts looking like this:
158.37.65.58 terraform-driver 10.1.2.234 terraform-worker-0 ... 10.1.2.63 terraform-worker-5
and ipv6-hosts looking like this:
2001:700:2:8300::28c4 terraform-driver 2001:700:2:8301::14a1 terraform-worker-0 ... 2001:700:2:8310::120d terraform-worker-5
Tips:
- you can define outputs from terraform like this:
output "terraform-driver-ipv4" { value = openstack_compute_instance_v2.terraform-driver.access_ip_v4 }
- you can use the terraform console to explore expressions you can use to define outputs and locals, for example:
openstack_compute_instance_v2.terraform-workers[5]
- when an output works as expected, you can redefine it as a local variable and use it to defined more complex outputs
- a few useful functions are:
concat([local.element], local.list) length(string) substr(string, offset, length) zipmap(list1, list2)
- you can create lists using this syntax:
terraform-workers-ipv6 = [ for idx in range(number): list[idx].attribute ]
- you can output to a file like this:
resource "local_file" "ipv4-hosts-file" { content = "${local.ipv4-hosts-string}\n" filename = "ipv4-hosts" }
Configure SSH
On your local host, change info319-cluster.tf so it also writes a file like this to ~/.ssh/config.terraform-hosts:
Host terraform-driver Hostname 2001:700:2:8300::28c4 Host terraform-worker-0 Hostname 2001:700:2:8301::14a1 ... Host terraform-worker-5 Hostname 2001:700:2:8301::120d
On your local host, add these lines to ~/.ssh/config:
Host terraform-* User ubuntu IdentityFile ~/.ssh/info319-spark-cluster ProxyJump sinoa@login.uib.no StrictHostKeyChecking no UserKnownHostsFile /dev/null
Include ./config.terraform-hosts
This is a similar wildcard entry to the one in Exercise 4, but this time the host names and IPv6 addresses are included from an external file (you need a recent SSH version to do this).
You can now log into all the cluster machines by name, even after you increase the number of worker nodes:
ssh terraform-worker-4