I spent most of today upgrading NeoPhi from OpenBSD 3.4 to OpenBSD 3.8. When I had the box in the colocation facility I wasn't too keen on attempting a remote upgrade and based on today's experiences that turned out to be a good idea. The first problem I encountered was that my server doesn't have a cdrom drive in it. It's only a 1U rack unit and the cdrom space is taken up by a second hard drive and RAID-1 controller.
I popped the case hooked up a cdrom drive and attempted to boot off of the OpenBSD 3.5 cdrom. The first time the boot hung. All subsequent tries it went into a reboot loop. It would start to boot from the cdrom and then reboot the machine. I also tried the 3.6 cdrom and didn't have any luck. At that point I switched to the less recommended in-place upgrade. After boot into single user mode I removed all of the packages I had previously installed. One of the upgrades I was planning changed gcc versions which was sure to introduce some incompatibilities. After removing about 60 packages I started the upgrade process.
I was very pleased with the instructions. Even for all of the warnings that they give you, the process is straight forward. Really the only tricky part comes in merging changes to /etc, which isn't that bad if you haven't done that much customization. I've tried to follow good practices with keeping my changes in .local files or linked somewhere else entirely. As a result I only had a few changes that I had to manually fix in /etc.
Since I was planning on using the boot images and just on-the-fly downloading of install files I had to side track for a bit to pull down the files for OpenBSD 3.6, 3.7, and 3.8. I already had 3.5 from when I previously looked at what was involved in upgrading. Thanks to pair.com for having a fast OpenBSD mirror. Each version upgrade required two reboots and a bunch of waiting for files to unpack. Besides that it was pretty smooth (minus the /etc merging) and I think it only took about two and half hours to do the four upgrades.
At this point the machine was in a state that I could get core services like sshd up and running. Considering that my machine was reporting a system temperature of 60 degrees, I was happy to get out of the basement and back up to my normal computer. Which was when I had my first scare. I couldn't connect via ssh. Back down stairs I quickly found out that my problem was that I had uninstalled the tcsh package, so I had no shell. I grabbed that and added that package. Now that I could really log in remotely, back up to the warm part of the house.
Since the versions changes on many of the packages I previously had installed and I like having a local copy, I spent the next hour just downloading updated packages. Some of the dependencies had changed (yes I should use the automatic dependency handler, but I'm still leery of those for some reason) which meant downloading additional packages to complete the install of the ones I already had. That ended up taking another couple of hours. At this point I had all of the software I was supposed to and just needed to make sure everything still worked.
Happily most things just did. Jumping all the way from 3.4 to 3.8 did produce these problems:
- /usr/local/bin/safe_mysqld became /usr/local/bin/mysqld_safe
- ntpdate handling in /etc/rc.local changed
- /usr/lib/apache/modules/libphp4.so changed to /usr/local/lib/php/libphp4.so
The next big problem I had was this nasty error message trying to send email to a mailman controlled list:
"/usr/local/lib/mailman/mail/mailman post test". Command output: Group
mismatch error. Mailman expected the mail wrapper script to be executed as
group "_mailman", but the system's mail server executed the mail script as
group "nobody". Try tweaking the mail server to run the script as group
"_mailman", or re-run configure, providing the command line option
I double checked that I had grabbed the correct packages. In this case it was mailman-2.1.6p1-postfix.tgz. Searching didn't turn up anything on interest. The only thing that I ran across was a comment in /usr/local/share/doc/mailman/README.OpenBSD:
Problem: I use Postfix for my MTA and the mail wrapper programs
are logging complaints about the wrong GID.
Solution: Install mailman with the following command:
% FLAVOR=postfix make install
I pulled down the ports package and can see that in the mailman Makefile, if the flavor is postfix that is sets up "--with-mail-gid=nobody". Since there wasn't a non postfix mailman-2.1.6p1 package I decided to pull down the 2.1.6p0 package. That installed and ran fine. I now need to look into what was changed going from p0 to p1 and also figure out why the gid stuff changed. Maybe I missed a mod to /etc along the way?
Update It turns out I'm not completely following the standard install, probably since I upgrade instead of doing a fresh install. My mailman aliases were merged in with my regular aliases. Since postfix looks at the owner/group of the aliases file to determine how the wrapper script gets run, this was leading to the problem. I must have missed in the docs where this was spelled out. Anyway I put my mailman aliases in a different file and followed the setup instructions and everything is working with the latest mailman package.
With mailman now working, I had one last issue. A custom compiled httpd was failing to start. A configtest against it said everything was okay. There were no error messages when trying to start it up or in the logs, yet nothing was running. Finally when I tried to run it in single threaded mode it core dumped. At this point I figured that some of the libraries had probably gotten out of sync enough that it just needed to be recompiled. Since it also needed SSL added to it I figured now was as good a time as any.
After pulling down apache, mod_ssl, and mod_perl I went through the standard combination build. Patch apache with mod_ssl then build it all from mod_perl. No dice. Trying to build it shared gave me unresolved symbols in mod_perl and the resulting httpd also core dumped. I ended up having to do a mod_ssl and apache build and then doing an APXS build of mod_perl. Weird stuff. One thing I did run across was the need to set the SSL_BASE when building. In a bash shell something like "SSL_BASE=/usr ./configure (...)" should do the trick.
Needless to say the upgrade took the normal 80/20 rule. What I thought was going to be 80% of the pain ended up only taking 20% of the time while the last 20% took 80% of the time. The end result is that the system is upgraded and everything appears to be working fine. Yay!