[LINUX] Let's make a supercomputer with xCAT

Beowulf with xCAT

The world is the cloud. However, depending on the application, you may want to run the pounding calculator at hand. Rather, it is supercomputing only by letting the computer growl in front of you. High Performance Computing [HPC] Cluster is used for this purpose. One of the ways to implement an HPC cluster (a term I haven't heard much recently) is Beowulf. Beowulf means "an HPC cluster that is based on free Unix and is realized by connecting a group of nodes prepared exclusively for calculation with a high-speed network." With this definition, the "supercomputer" in the bioinformatics area that I am familiar with Most of them will be Beowulf.

Here's a little more detail on why an on-premises HPC environment is (still) needed in the bioinformatics industry.

First of all, it is a story with the work related to next-generation sequencer data in mind, but there are many so-called embarrassingly parallel [^ 1] calculations. You can expect the calculation time to be 1/10 just by dividing the input into 10 equal parts without thinking about anything. Therefore, it is easy to benefit from Beowulf-style HPC clusters. Secondly, although the calculation speed is tolerant and there are many tasks to wait even if the CPU power is low, the input data size is huge, so the storage and network transfer amount tend to be large. This is completely incompatible with a cloud environment that consists of a low CPU price and a high storage network transfer volume. And, because of the characteristic that it is only necessary to recalculate from the worst raw data, do not require absolute reliability or high-speed access performance from the storage. You need cheap and big storage. Third, privacy-related issues in human-related research. The situation is changing, but retaining data outside the organization or abroad is still a challenge. As such, the choice of having an on-premises HPC cluster (and in some cases building your own) in the bioinformatics industry will not be so wrong in 2020.

[^ 1]: It seems that there are translations such as obvious parallelism, astonishing parallelism, stupid parallelism ……. According to English Wikipedia, an embarrassment of riches is a "rich property" and "richness". It is said that it comes from an ironic phrase such as "I'm sick of it." Of course, while saying embarrassingly parallel, in reality, I'm grateful for the ease of parallelization.

Well, there are some software packages to realize the construction of on-premises Beowulf type homebrew HPC cluster, but it is an open source product with a high degree of freedom based on the relatively latest Linux distro, and it is still actively developed. Perhaps the most famous of these is the IBM-sponsored project eXtreme Cloud Administration Kit (xCAT) https://xcat.org/.

The author made his own HPC with xCAT a few years ago, but recently it became necessary to revive the old Scyld (Penguin Computing's proprietary HPC system) with the new xCAT, so instead of a memo again. I decided to write this article. The xCAT documentation is better than before, but still difficult. I feel that the configuration tends to be addictive unless you read and understand everything before starting the installation. It is not straightforward because the necessary information is distributed, and high-level commands that implicitly set and low-level commands collide with each other. This article is the result of trial and error struggling with the documentation, so there are likely to be lies and misunderstandings. However, these notes would be meaningful if they were just for reference.

xCAT is free to choose between RedHat / CentOS and PowerPC-based Linux (if supported). For my bioinformatics-related work, I decided to use Ubuntu, which is relatively new, because it is necessary to build open source software from source that sometimes requires a relatively new environment even if it is called a "server". If proprietary tools are the main focus, CentOS-based may be more appropriate.

Install xCAT

Reference URI

--http://xcat.org/ xCAT project official page --https://xcat-docs.readthedocs.io/en/stable/ Official documentation

Hardware and networking

Set up Management Node

Since it is the latest version of Ubuntu supported by xCAT, I installed Ubuntu 18.04-4 LTS according to the law. However, I actually installed a derivative version of Lubuntu 18.04-4 for the purpose of the lightest UI. As will be described later, Ubuntu Server 18.04-** 2 ** had to be used for the compute node group.

In the case of the target machine this time, the symptom that only the mouse cursor is not displayed on the Lubuntu Torah can be avoided by specifying nomodeset in the kernel option. It seems that the Intel graphics function is disabled. After installation, complete host name setting (scyld.cluster), WAN side fixed IP address assignment, NTP setting, apt upgrade, minimum user / group setting, etc.

Don't forget to set up forwarding between the network inside the cluster and the network on the WAN side. The article Creating a NAT server with ufw on Ubuntu is helpful.

Install xCAT stable version

Start work after sudo su -. Use the latest stable version xCAT 2.15.1 (released March 6, 2020) at the time of writing.

# mkdir /root/xCATsetup #Unless otherwise stated, I will work here for everything
# cd /root/xCATsetup
# wget \
  https://raw.githubusercontent.com/xcat2/xcat-core/master/xCAT-server/share/xcat/tools/go-xcat \
  -O - > /tmp/go-xcat
# chmod +x /tmp/go-xcat
# /tmp/go-xcat install            #This time install xcat stable version

The default installation location for the xCAT system is / opt / xcat.

Installation verification

# source /etc/profile.d/xcat.sh #PATH setting
# lsxcatd -a #xCAT version display
Version 2.15.1 (git commit ab6295bdc1deb1827922d47755bfc68adec12389, built Wed Mar  4 16:45:39 EST 2020)
This is a Management Node
dbengine=SQLite

# tabdump site #Display the site table of the xCAT internal database
#key,value,comments,disable
"blademaxp","64",,
"fsptimeout","0",,
"installdir","/install",,
(Omitted below)

The xCAT settings are managed by the xCAT object and the xCAT database managed by two concepts. The former is operated by commands such as mkdef, lsdef, chdef, and rmdef. The latter is operated with commands such as tabdump, tabedit, and chtab. The xCAT database is actually a table managed by RDB (SQLite by default), and it seems that the objects and tables have a many-to-many association. It seems that the former or the latter is operated in a convenient way depending on the setting situation.

site object settings

Proceed with the settings according to Set attributes in the site table.

# chdef -t site domain="cluster"          #Domain name of nodes scyld.For cluster etc.
# chdef -t site forwarders="10.54.0.1"    #Default gateway seen from Compute Node group
# chdef -t site master="10.54.0.1"        #IP address of Management Node as seen from the Compute Node group
# chdef -t site nameservers="10.54.0.1"  #DNS group IP addresses separated by commas as seen from the Compute Node group
# makedns -s                #Initialization of DNS running on Management Node. I get an IPv6-related warning, but ignore it.
# nslookup scyld.cluster    #DNS operation check

networks table settings

Follow the Set attributes in the networks table to proceed with the network settings. First, check the automatic setting.

# tabdump networks
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,mtu,comments,disable
"10_0_0_0-255_255_0_0","10.0.0.0","255.255.0.0","bond0","<xcatmaster>",,"<xcatmaster>",,,,,,,,,,,"1500",,
"192_168_0_0-255_255_0_0","192.168.0.0","255.255.0.0","enp1s0f1","192.168.0.1",,"<xcatmaster>",,,,,,,,,,,"1500",,

This time, I would like to use DHCP to allocate the computer Nodes immediately after booting to the range from 10.54.3.1 to 10.54.3.254 on the internal network. In addition, the internal network is connected to the preset bond0 (using ethernet bonding / aggregation). Therefore, set as follows + start DHCP service.

First, the netname (object name) of the networks table is too long, so modify it as follows on the editor that is started tabedit network.

# tabdump networks
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,mtu,comments,disable
"clusternet","10.54.0.0","255.255.0.0","bond0","<xcatmaster>",,"<xcatmaster>",,,,,,,,,,,"1500",,
"wan","192.168.0.0","255.255.0.0","enp1s0f1","192.168.0.1",,"<xcatmaster>",,,,,,,,,,,"1500",,

# makehosts -n

You can now refer to the object name with clusternet and wan. Then modify the object type network and the object name clusternet. Configure DHCP to distribute 10.54.3.1 to 10.54.3.254.

# chdef -t network -o clusternet dynamicrange="10.54.3.1-10.54.3.254"
# makedhcp -n

passwd table settings

Set password for root account

# tabdump passwd
# chtab key=system passwd.username=root passwd.password=root
# chtab ip

Recognition and setting of Compute Nodes

Docs »Admin Guide» Manage Clusters »IBM POWER LE / OpenPOWER Follow the instructions below Proceed with the work. Due to the support of IBM, the xCAT project is described using a PowerPC-based IBM machine, but the work is almost the same in the Linux on x86_64 environment, which is the environment of most users.

First, see Hardware Discovery & Define Node. The choices of the method are manual setting, recognition by MTMS (Machine Type and MAchene Serial) as automatic setting, recognition by the connection port number to the switching hub, and order-dependent recognition.

This time the number of compute nodes is 10, so follow the documentation recommendations and follow the documentation recommendations [Authenticate with MTMS](https://xcat-docs.readthedocs.io/en/stable/guides/admin-guides/manage_clusters/ppc64le/discovery/mtms /index.html) is selected. The recommended method is "For the BMC / IPMI IP address, first receive the DCHP allocation and then set the static IP address of another subnet". In general, BMC / IPMI may often get an IP address dynamically from DHCP. However, in my environment, the static IP address of BMC / IPMI was set to 10.54.0.1-10 in advance, so I will use this as it is, although it is a little different from the document.

# bmcdiscover --range 10.54.0.1-254 -u ADMIN -p ADMIN -z -w
# bmcdiscover --range 10.54.0.1-254 -u adimn -p admin -z -w
Writing node-0025901d9c29 (10.54.150.10,Winbond,,admin,admin,mp,bmc,0025901d9c29,,) to database...
node-0025901d9c29:
        objtype=node
        groups=all
        bmc=10.54.150.10
        cons=ipmi
        mgt=ipmi
        mtm=Winbond
        bmcusername=admin
        bmcpassword=admin
(Omitted)
# lsdef /node-.*
Object name: node-0025901d8c1e
    bmc=10.54.150.3
    bmcpassword=admin
    bmcusername=admin
(Omitted)

By default, bmcdiscover refers to key = ipmi in the passwd table, but on some machines there were different or multiple User / Password combinations (machines after 2020). It seems that it varies from individual to individual), and bmcdiscover was issued twice in different combinations. Anyway, it seems that it could recognize the BMC / IPMI of all machines and write to the xCAT internal database. Create a static configuration file for each machine based on the information created in this way.

# bmcdiscover --range 10.54.0.1-254 -u ADMIN -p ADMIN -z > predefined.stanza
# bmcdiscover --range 10.54.0.1-254 -u adimn -p admin -z >> predefined.stanza

Edit predefined.stanza with an editor.

node-0025901d9e23:
        objtype=node
        groups=all
        bmc=10.54.150.1
        cons=ipmi
        mgt=ipmi
        mtm=Winbond
        bmcusername=admin
        bmcpassword=admin

To

cn02:
        ip=10.54.2.2
        netboot=xnba
        mac=XX:YY:ZZ:XX:YY:ZZ
        objtype=node
        groups=compute2,all
        chain="runcmd=bmcsetup"
        bmc=10.54.150.1
        cons=ipmi
        mgt=ipmi
        mtm=Winbond
        bmcusername=admin
        bmcpassword=admin

Change for each compute-node, like. ʻIp = 10.54.2.2is the static IP address of this compute node. Write these together like an internal database. However, in the case of the author, the serial number information could not be captured well. I also manually wrote the MAC address of the NIC (not IPMI). Also addedgroups = compute2, all and chain = "runcmd = bmcsetup" . Now you can refer to the entire compute node with compute2`. Since the author is operating another xCAT, "2" is added here.

# cat predefined.stanzas | mkdef -z

When there is a change, you can overwrite the setting with mkdef -f -z.

It is OK if the power status of all compute nodes can be displayed with rpower compute2 status.

# tabedit hosts #Edit with an editor as follows
# tabdump hosts
#node,ip,hostnames,otherinterfaces,comments,disable
"compute2","|\D+(\d+)|10.54.2.($1+0)|",,"|\D+(\d+)|cn($1)-ipmi:10.54.150.($1+0)|",,

Choosing an OS boot method for Compute Nodes

To send the OS to the Compute Nodes

  1. Diskful Installation
  2. Diskless Installation (Stateless Installation) There are two methods. The former is a method of installing an OS on each Compute Node local storage and booting from there. The latter sends an OS image ("netboot" osimage) to the Compute Nodes each time it boots, and then boots. Even the latter can access the local storage connected to each Compute Node (for example, by placing temporary directories and frequent access files). In this sense, Stateless may be more correct.

If your network configuration has enough bandwidth and each Compute Node has enough memory, iskless Installation, which can handle everything from the Management Node at once, is probably your first choice.

Creating an OS image

It seems that the OS supported by the current version of xCAT needs to be included in the package list that appears in ls -L / opt / xcat / netboot / * / *. pkglist. For Ubuntu, if you look at ls -L /opt/xcat/netboot/ubuntu/*.pkglist, it seems that 18.04 and 18.04-2 are supported. Also, it seems that Lubuntu's iso file is not recognized well, so here I gently download unbuntu-18.04.2-server.amd64.iso and then create an osimage.

# mkdir -p /install/iso
# cd /install/iso
# wget http://releases.ubuntu.com/18.04.2/ubuntu-18.04.2-server-amd64.iso
# copycds ubuntu-18.04.2-server-amd64.iso
# lsdef -t osimage
ubuntu18.04.2-x86_64-install-compute  (osimage)
ubuntu18.04.2-x86_64-install-service  (osimage)
ubuntu18.04.2-x86_64-netboot-compute  (osimage)

ʻUbuntu18.04.2-x86_64-netboot-compute` is the OS image for Diskless installation. Modifications such as adding packages according to the purpose can be added to this OS image. This time, I decided to make these necessary changes with a shell script (postscript) after OS boot, and proceed.

# genimage ubuntu18.04.2-x86_64-netboot-compute #It takes a while
# packimage ubuntu18.04.2-x86_64-netboot-compute
# mknb x86_64

The last mknb command can be read as if you don't need to start it manually, but if you don't run it the compute node will go to read first after booting/ tftpboot / xcat / xnba / nets / 10.54.0.0_16(And * .elilo, * .uefi) doesn't seem to be generated well.

Set OS image in compute node group

# chdef compute2 -p chain="bmcsetup,osimage=ubuntu18.04.2-x86_64-netboot-compute"
# makehosts -n compute2
# makedns -s ; makedhcp -n ; makedhcp -a
# rpower compute2 boot

It is safer to re-execute makedns -s; makedhcp -n; makedhcp -a before starting compute node. Well, if you can log in with ssh cn01, there is no problem, but it didn't work in my environment.

trouble shooting

Make the remote console of the compute node group available for debugging purposes. Set with makegocons and start withrcons <node>. The end of rcons is c. after ctrl-e.

# makegocons 
Starting goconserver service ...
cn10: Created
cn01: Created
(Omitted)
cn06: Created
 
# rcons cn01
[Enter `^Ec?' for help]
goconserver(2020-06-19T09:47:14+09:00): Hello 192.168.0.21:41898, welcome to the session of cn01

If compute node gets stuck in getdestiny, for example syslog

Jun 18 17:10:50 10.54.3.2 [localhost] xcat.genesis.doxcat: Getting initial certificate --> 10.54.0.1:3001
Jun 18 17:11:00 10.54.3.2 [localhost] xcat.genesis.doxcat: Running getdestiny --> 10.54.0.1:3001
Jun 18 17:11:10 10.54.3.2 [localhost] xcat.genesis.doxcat: Received destiny=
Jun 18 17:11:10 10.54.3.2 [localhost] xcat.genesis.doxcat: The destiny=, destiny parameters=
Jun 18 17:11:10 10.54.3.2 [localhost] xcat.genesis.doxcat: Unrecognized directive (dest=)
Jun 18 17:11:19 10.54.3.2 [localhost] xcat.genesis.doxcat: ... Will retry xCAT in 70 seconds

If you can't proceed due to such reasons, reconfirming that there are no dhcp or dns setting mistakes may solve the problem. Check for typos on tabedit site etc. makedns -s; makedhcp -n; makedhcp -a; rpower compute2 boot.

Also, Compute Node uses the IP assigned by DHCP instead of using the IP address specified for the main NIC (instead of BMC / IPMI) set by cat predefined.stanza | mkdef -z.

Jun 19 10:20:10 10.54.3.21 [localhost] xcat.genesis.dodiscovery: Beginning echo information to discovery packet file...
Jun 19 10:20:11 10.54.3.21 [localhost] xcat.genesis.dodiscovery: Discovery packet file is ready.
Jun 19 10:20:11 10.54.3.21 [localhost] xcat.genesis.dodiscovery: Sending the discovery packet to xCAT (10.54.0.1:3001)...
Jun 19 10:20:11 10.54.3.21 [localhost] xcat.genesis.dodiscovery: Sleeping 5 seconds...
Jun 19 10:20:11 10.54.3.21 [localhost] xcat.genesis.minixcatd: The request is processing by xCAT master...
Jun 19 10:20:12 scyld xcat[17879]: xcatd: Processing discovery request from 10.54.3.21
Jun 19 10:20:12 scyld xcat[17879]: xcat.discovery.aaadiscovery: (00:25:90:1d:6e:ae) Got a discovery request, attempting to d
Jun 19 10:20:12 scyld xcat[17879]: xcat.discovery.blade: (00:25:90:1d:6e:ae) Warning: Could not find any nodes using blade-b
Jun 19 10:20:12 scyld xcat[17879]: xcat.discovery.switch: (00:25:90:1d:6e:ae) Warning: Could not find any nodes using switch
Jun 19 10:20:12 scyld xcat[17879]: xcat.discovery.mtms: (00:25:90:1d:6e:ae) Warning: Could not find any node for Super Micro
Jun 19 10:20:12 scyld xcat[17879]: xcat.discovery.zzzdiscovery: (00:25:90:1d:6e:ae) Failed for node discovery.
Jun 19 10:20:12 scyld xcat[17879]: xcat.discovery.zzzdiscovery: Notify 10.54.3.21 that its findme request has been processed
Jun 19 10:20:11 10.54.3.21 [localhost] xcat.genesis.minixcatd: The request is already processed by xCAT master, but not matc

In situations where a Compute Node requests the Management Server to discover itself, but is rejected because its IP address is not that of a known compute node.

# chtab -t node -o cn01 mac="00:11:22:DD:EE:FF"
# makedns -s ; makedhcp -n ; makedhcp -a
# rinstall compute2 runcmd=bmcsetup,osimage=ubuntu18.04.2-x86_64-netboot-compute
# rpower cn01 boot

Respecify the MAC address again so that DHCP finds cn01 and its MAC address and assigns the correct IP address. Then I applied rpower compute2 boot and it worked.

to be continued

With this, I was able to set up the HPC cluster for the time being. xCAT takes care of control of compute nodes by BMC / IPMI, OS deployment, network management, management of compute nodes including user account management, and so on. After this, there are various things to prepare and set, such as Ganglia that displays the operating status of the cluster and Jobscheduler SGE, so I will write that as another article. Also, if you have any mistakes or unclear points in this article, I would appreciate it if you could point them out.

Recommended Posts

Let's make a supercomputer with xCAT
Let's make a GUI with python.
Let's make a breakout with wxPython
Let's make a graph with python! !!
Let's make a shiritori game with Python
Let's make a voice slowly with Python
Let's make a simple language with PLY 1
Let's make a web framework with Python! (1)
Let's make a tic-tac-toe AI with Pylearn 2
Let's make a Twitter Bot with Python!
Let's make a web framework with Python! (2)
Let's replace UWSC with Python (5) Let's make a Robot
Let's make a Discord Bot.
Let's make Othello with wxPython
Let's make dice with tkinter
Make a fortune with Python
Let's make a rock-paper-scissors game
Make a fire with kdeplot
Let's make a simple game with Python 3 and iPhone
Let's make dependency management with pip a little easier
Let's make a Mac app with Tkinter and py2app
Let's make a spherical grid with Rhinoceros / Grasshopper / GHPython
[Super easy] Let's make a LINE BOT with Python.
Let's make a websocket client with Python. (Access token authentication)
Let's make a remote rumba [Hardware]
Let's make a remote rumba [Software]
Make a sound with Jupyter notebook
Let's make a spot sale service 2
Let's make a spot sale service 1
Let's make Othello AI with Chainer-Part 1-
Make a recommender system with python
Make a filter with a django template
Let's make Othello AI with Chainer-Part 2-
Make a model iterator with PySide
Make a nice graph with plotly
Let's make a diagram that can be clicked with IPython
Let's make a spot sale service 3
Let's make a WEB application for phone book with flask Part 1
Let's make a cycle computer with Raspberry Pi Zero (W, WH)
Let's make a WEB application for phone book with flask Part 3
Let's make a WEB application for phone book with flask Part 4
Let's make a web chat using WebSocket with AWS serverless (Python)!
Let's create a free group with Python
Make a rare gacha simulator with Flask
Make a Notebook Pipeline with Kedro + Papermill
Make a partially zoomed figure with matplotlib
Let's scrape a dynamic site with Docker
Make a drawing quiz with kivy + PyTorch
[Python] Let's make matplotlib compatible with Japanese
Make a logic circuit with a perceptron (multilayer perceptron)
Make a Yes No Popup with Kivy
Let's make a multilingual site using flask-babel
Make a wash-drying timer with a Raspberry Pi
Make a GIF animation with folder monitoring
Let's make a combination calculation in Python
Make a desktop app with Python with Electron
Let's make a Backend plugin for Errbot
[Ev3dev] Let's make a remote control program by Python with RPyC protocol
A memorandum to make WebDAV only with nginx
[Piyopiyokai # 1] Let's play with Lambda: Creating a Lambda function
[Python] Make a game with Pyxel-Use an editor-