[PYTHON] A story that stopped my heart after upgrading OpenStack

This article is a December 3rd article of OpenStack Advent Calendar 2015.

Introduction

The other day, I upgraded the OpenStack environment from Juno to Liberty at once.

The reason I did that was because when I saw a session at OpenStack Summit Tokyo the other day that PayPal upgraded from Folsom to Kilo, I thought I could do it myself.

No, as a result, I was able to upgrade.

Well, that's why I think I've become a pillar of people, so I decided to write about that time here.

Click here for PayPal's session.

PayPal's Cloud Journey From Folsom to Kilo -- What We Learned in the Upgrade

Pre-upgrade configuration

Component Type
OS Ubuntu 14.04
OpenStack Juno
Hypervisor KVM
Neutron Driver VLAN + OVS (L3 Agent is not working)
Cinder Dunno

upgrade

I think it's common to do it in the order of Keystone-> Glance-> Nova-> Neutron. At the Design Summit, some companies said that the versions of each component were different, so the order is just for reference, but if you want to upgrade all the components, this order is good.

By the way, non-stop upgrade is absolutely impossible, so Akira Melon. That's what the hacker uncles at the Design Summit said.

You're right.

. .. _ Quiet talk _ .. .

Unlink the load balancer

Release the link to port 5000 of the load balancer on the front of Keystone, and completely stop communication from the outside. It seems that there is no problem with port 35357 as it is.

Liberty repository introduced

First, put the Liberty release repository on every server.

$ sudo echo "deb http://ubuntu-cloud.archive.canonical.com/ubuntu trusty-updates/liberty main" > /etc/apt/sources.list.d/cloudarchive-liberty.list
$ sudo apt-get update

Keystone preparation

Delete the tokens accumulated in the Keystone DB. By default, Liberty uses Memcached for token purposes, so I decided to use it. In other words, it is necessary to prepare a server for Memcached in advance.

$ sudo su -s /bin/sh -c "keystone-manage token_flush" keystone

After erasing the DB Token cleanly, the next stop is Cron, which was running regularly.

$ sudo crontab -e -u keystone

Delete the following line.

@hourly /usr/bin/keystone-manage token_flush > /var/log/keystone/keystone-tokenflush.log 2>&1

Be graceful and stop Keystone.

$ sudo service keystone stop

Database backup

Log in to the DB server and take a backup of the DB with the following command.

$ sudo mysqldump -uroot -p --opt --add-drop-database --single-transaction --master-data=2 keystone > liberty-keytone-db-backup.sql
$ sudo mysqldump -uroot -p --opt --add-drop-database --single-transaction --master-data=2 glance > liberty-glance-db-backup.sql
$ sudo mysqldump -uroot -p --opt --add-drop-database --single-transaction --master-data=2 nova > liberty-nova-db-backup.sql
$ sudo mysqldump -uroot -p --opt --add-drop-database --single-transaction --master-data=2 neutron > liberty-neutron-db-backup.sql

Keystone upgrade

Now upgrade Keystone. Various parameters are newly added or deprecated, but I can not write about them, so please use Configuration Reference. Please refer to it. The upgrade itself is as simple as ʻapt-get install XXXXand then rewritingkeystone.conf` to start the process. Basically, there is no problem if you follow the Installation procedure.

By the way, if you put the DEBUG option in conf, all the values set in each item of conf will be output to the log immediately after the process starts. Kindly, he also points out items that will be abolished. This is a common mechanism for all components, so it's worth remembering. Speaking of which, in Nova's ʻusername item, ʻusername is scheduled to be abolished, so stop using it and use ʻuser-name to log it, so if you stop using it and use ʻuser-name, you will get an error. Nova stuck. Tsundere?

The story went awry.

After placing keystone.conf, migrate the DB schema until the Liberty release with the following command. Since Kilo, Keystone has been running on Apache, so make sure the Keystone process is stopped in advance and start Apache.

$ sudo service keystone stop
$ sudo service apache2 restart
$ sudo su -s /bin/sh -c "keystone-manage db_sync" keystone

If there are no errors, the Keystone upgrade is complete.

For the time being, let's check if the openstack command can be executed.

$ ADMIN_TOKEN=${keystone.admin listed in conf_token string}
$ export OS_TOKEN=${ADMIN_TOKEN}
$ export OS_URL=http://${IP address of any server running Keystone}:35357/v3
$ export OS_IDENTITY_API_VERSION=3
$ openstack service list

Glance upgrade

As with Keystone, check the Configuration Reference and the official installation guide, and place the glance-api.conf and glance-registry.conf that match your environment.

Then do Glance's db sync.

$ sudo service glance-api restart
$ sudo service glance-registry restart
$ sudo su -s /bin/sh -c "glance-manage db_sync" glance

If there are no errors, the Glance upgrade is complete.

nova-api upgrade (Nova db sync failure)

The biggest challenge. If you can successfully migrate Nova's DB to Liberty, you can upgrade it. But the reality didn't go so well.

So, in this section, I decided to describe the situation when the upgrade did not go well.

First, try upgrading to Liberty at once

First, I upgraded nova-api to Liberty on the server running nova-api.

$ sudo apt-get install nova-api python-novaclient

After that, do db sync, and if there is no problem, upgrade other components as it is and it will be completed (should be).

So, I hit the following command,

$ sudo su -s /bin/sh -c "nova-manage db sync" nova

Error occurred: (

Strange nova-manage options

The content of the error is

Is.

However, __Actually, the nova-manage command in the Liberty release does not include this option __. I read the source code and looked it up, but it wasn't really there. I thought it was stupid, but at this point the DB schema itself had been changed quite halfway.

In this state, the following was done.

The next thing I suspected was the Kilo release. So I did the following:

nova-api, which was installed by adding the repository of the Kilo release, had migrate_flavor_data as an option of nova-manage. Kitakore! I thought, I immediately ran nova-manage db migrate_flavor_data. Then it was successful, so I decided to release it to the Kilo release and then to the Liberty release, so I db sync until the Kilo release.

Then I got another error: (

The error was that the schema of the table in flavor was incorrect. I thought it was a lie, but it was serious. To summarize the nature and status of the error, run nova-manage db migrate_flavor_data after Kilo release db sync and before Liberty release db sync.

In other words, the correct answer is this.

apt broken

At this point, the DB schema had been changed halfway again, so I returned to the time of Juno release, and dropped and restored the DB (second time). After that, I tried to return to nova-api at the time of Juno release, but _this time apt was broken .....

I had to go back and forth between Juno, Kilo and Liberty so many times that the dependent packages were weird and couldn't be installed or removed. I couldn't help it, so I used the dpkg command to remove all the dependent packages that have all three versions (Juno, Kilo, Liberty) dependent on Nova. Then apt started working, so I would have to go back to the Juno release and try again.

Actually, while I was doing this, the soft globe song "Let's go to school!" That I was doing on TV the other day was looping in my brain.

"It's stupid" "That's right, it's stupid"

It is a masterpiece.

. .. _ Quiet talk _ .. .

nova-api upgrade (Nova db sync success)

Based on the above, I will describe the successful procedure.

Install the Kilo release.

$ sudo echo "deb http://ubuntu-cloud.archive.canonical.com/ubuntu trusty-updates/kilo main" > /etc/apt/sources.list.d/cloudarchive-kilo.list
$ sudo apt-get update
$ sudo apt-get install nova-api python-novaclient

Migrate to the schema at the time of Kilo release.

$ sudo service nova-api restart
$ sudo su -s /bin/sh -c "nova-manage db sync" nova
$ sudo su -s /bin/sh -c "nova-manage db migrate_flavor_data" nova

Then raise to Liberty release.

$ sudo echo "deb http://ubuntu-cloud.archive.canonical.com/ubuntu trusty-updates/liberty main" > /etc/apt/sources.list.d/cloudarchive-liberty.list
$ sudo apt-get update
$ sudo apt-get install nova-api python-novaclient

Migrate to the schema at the time of Liberty release.

$ sudo service nova-api restart
$ sudo su -s /bin/sh -c "nova-manage db sync" nova

This completes the nova-api upgrade and Nova DB migration.

By the way, this DB migration procedure was actually described in only one line in the Kilo release notes. However, it's still not very kind, so I think I was addicted to it even if I checked it in advance.

neutron-server upgrade

Once you have upgraded nova-api, then upgrade neutron-server.

As with Keystone, check the Configuration Reference and the official installation guide, and place neutron.conf and other things that suit your environment. Note that Open vSwitch has a separate ml2 configuration file.

Then do a Neutron db sync.

$ sudo service neutron-server restart
$ sudo su -s /bin/sh -c "neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini upgrade head" neutron

If there are no errors, Neutron's DB has been migrated to the schema at the time of Liberty release.

Upgrade the remaining Nova / Neutron packages

Once you've done this, upgrade all the remaining packages. From here on, there should be no particular problem (should). What to upgrade and how to upgrade depends on each environment, so I will not describe it here.

Linking the load balancer

Associate a real server with port 5000 of the load balancer on the front of Keystone.

Horizon upgrade

Horizon doesn't matter if the version is inconsistent with other components, so it's okay to upgrade every time OpenStack is released.

But for the Liberty release, __, or for the Liberty release's deb package, the upgrade fails.

No details are known about why it didn't work. For the time being, I decided to write down what happened. I'm sorry.

deb Package mystery

This time I encountered an event where the deb package behaved differently during a new installation and an upgrade. Normally (probably) that's not the case (should), but as far as the Liberty release of Horizon's deb package is concerned, it behaves differently.

On a Horizon server running in the Juno release, if I upgrade the Horizon version with ʻapt-get upgrade or ʻapt-get install openstack-dashboard, the CSS doesn't load and Horizon does nothing around 2000. It will be designed like a simple web page.

Horizon also had more components and was messed up, so I thought for a moment that I had changed the design by saying Simple is Best, but that was not the case at all. No matter how much it goes against the times.

. .. _ Quiet talk _ .. .

I wondered if there was a problem with the upgrade process for the time being, so I added the deb-src line of the Liberty release to the repository and dropped the deb package of openstack-dashboard to find out.

$ sudo echo "deb-src http://ubuntu-cloud.archive.canonical.com/ubuntu trusty-updates/liberty main" >> /etc/apt/sources.list.d/cloudarchive-liberty.list
$ sudo apt-get update
$ sudo apt-get source openstack-dashboard
$ tar zxvf horizon_8.0.0-0ubuntu2~cloud0.debian.tar.gz
$ cd debian

When I opened README.source, the following was immediately written.

During the Juno/14.10 development cycle, use of xstatic packages was introduced
so that CSS, JS and other static assets did not have to be embedded in the
horizon source tree.

Suspicious, very suspicious.

By the way, when I go back to the Icehouse release and drop the source, this time the following is written in the file README.compression.

Until this can be scripted and integrated into package build, updating the
pre-compressed static CSS and JS requires a some manual steps:

   sudo apt-get install python-lesscpy python-openstack-auth python-compressor
   quilt pop top
   ./debian/rules refresh-static-assets

HM.

If you read the rules file properly, you will find out the proper cause.

But I was completely exhausted at this point.

The only thing I know is that once you remove all the Juno release Horizon (including Django) packages and reinsert the Liberty release packages, it works fine. Note that this event does not occur when raising from Juno to Kilo. It only happens when you upgrade from Juno to Liberty or from Kilo to Liberty.

So, if anyone knows the details, please let me know;)

Also, if you're in Canonical, please do something about this deb package:-(

End of upgrade

This completes the upgrade of the OpenStack environment. Of course, if I did this kind of work manually, I would eventually get __tenosynovitis __. If you don't like that, you should use Chef, Puppet, Ansible, or something like that. maybe.

Thank you for your hard work:-)

Subsequent story

RFC3442 (classless static root) problem

This is the story of neutron-dhcp-agent. Do you guys know RFC3442? I didn't know until recently (sorry). Basically, distribution of addresses and routes relies on DHCP. Therefore, the DHCP specification was extended, and a mechanism was introduced to distribute the static route table at the same time when assigning an IP address. That is the "classless static route" (RFC3442).

RFC 3442 - The Classless Static Route Option for Dynamic Host Configuration Protocol (DHCP) version 4

It seems that the RFC3442 implementation was supported at the time of Juno's release, but apparently it wasn't working properly. It was fixed at the time of the Kilo release, and it worked fine with the Liberty release.

What went wrong was that the implementation worked so well that it turned out that the External Network subnets I was using were misconfigured. I won't mention how I made a mistake, but one thing I can say is that it was hard.

For subnets, I found that changing the CIDR of the network would cure the event, but I also found that I couldn't change the CIDR from the Neutron API. I couldn't help it, so I solved it by taking the last resort of executing the UPDATE statement directly on the subnets table of the Neutron database (hey).

Lessons learned from the upgrade process

Prior verification is a must

Before the upgrade in the production environment, we built a new Liberty with the same configuration as the production once, created a verification environment with the same configuration as Juno in advance, and performed an upgrade test from there once. By the way, this story is a story that happened in the verification environment. Thanks to that, I was able to upgrade smoothly in the production environment.

If you don't really need an upgrade, it's best not to do it

Each version of OpenStack will reach EOL about 14 months after its release. In other words, Bugfix will be backported for about 14 months after its release, which may make the tea muddy. Alternatively, you can upgrade only Keystone for each release tying, and for other components, define Region properly and create a small OpenStack cluster. Naturally, it is necessary to define the EOL of the built OpenStack cluster itself.

Problems still occur after upgrade

There are also many issues that are only noticed after the upgrade. It's a good idea to check for bugs and Known Issues from Launchpad in advance and use them as criteria for deciding whether to upgrade. However, when it comes to upgrading shortly after release, it's just a pillar, and sometimes bugs aren't registered on Launchpad in the first place.

Finally

By the way, today is my birthday.

Recommended Posts

A story that stopped my heart after upgrading OpenStack
A story about my new study of Python after 3 years of MATLAB experience
A story that stumbled upon installing matplotlib
A story that stumbled upon a comparison operation
The story that scipy suddenly stopped loading
A story that turned light blue in 4 months after starting AtCoder with python