March 11, 2007

NeoPhi Upgrade 2

Most of this weekend was spent upgrading NeoPhi to OpenBSD 4.0. I figured since I was going to patch my machine for the timezone fix I might as well just patch everything. Just as last time I did the less recommended upgrading without install media approach. It really isn't that bad. While I did learn a few tricks from the last time, I managed to make some other ones...

First I forgot one of the userland packages. I guess at some point I added xbase to the set of userland packages I had installed. Thankfully the problem only played out when one of the optional packages failed to upgrade due to a missing dependency. Thankfully I found a post that spoke to the issue, installed both revs of it and things started working again. Guess I just need to keep better track of what I have installed.

I again had postfix upgrade issues. Thankfully this time I had shutoff incoming requests at the firewall so no messages were put into limbo. The problem was during the upgrade of either the userland packages or optional packages the preferred mail server gets reset. As a result you need to rerun the postfix-enable script.

Mailman didn't upgrade properly and I also forgot to test it post upgrade. As a result this generated some bounced messages. I finally tracked it down to a combination of a configuration and permission/ownership issue which means I should be fine for future upgrades. I added an update in my previous upgrade post as that includes the error message and other details.

I have to say the new package update scheme rocks. Using both PKG_PATH and PKG_CACHE I was able to simulate what I'd previously done manually. One problem is the pair.com OpenBSD mirror doesn't play well with pkg_add. I had to switch to another mirror to actually get it to work. I thought I was being cleaver and teeing the output of the pkg_add command. Alas it uses some funky screen redraws and my output file contained almost no useful information. This is kind of bad since unless you are paying attention there is some information that scrolls by that you would otherwise miss. Next time I'll have to look at other switches to pkg_add to maybe change its behavior Probably another reason to do a fresh install.

Once everything was updated I applied the most recent patches based on the errata list. For some reason though I didn't think to reboot. As a result running applications didn't pickup the new timezone information, which was the main reason I went through the entire exercise to begin with...

During the course of the many reboots I also checked in on my RAID disks. Turns out I don't think they have ever been doing the right thing! While OpenBSD does support 3Ware, the drivers don't do everything that a 3Ware setup really needs to do. There is some funky RAID Array rebuilding operation (which my disks are currently in) that has to be initiated by the driver. 3Ware has never supplied the OpenBSD developers with the needed documentation to add this support into the driver. So, should your machine ever experience a power outage you'll need to reboot into some other OS which 3Ware completely supports in order to actually get your RAID Array working again. Ludicrous!

Tags: neophi openbsd

February 28, 2006

NeoPhi Upgrade

I spent most of today upgrading NeoPhi from OpenBSD 3.4 to OpenBSD 3.8. When I had the box in the colocation facility I wasn't too keen on attempting a remote upgrade and based on today's experiences that turned out to be a good idea. The first problem I encountered was that my server doesn't have a cdrom drive in it. It's only a 1U rack unit and the cdrom space is taken up by a second hard drive and RAID-1 controller.

I popped the case hooked up a cdrom drive and attempted to boot off of the OpenBSD 3.5 cdrom. The first time the boot hung. All subsequent tries it went into a reboot loop. It would start to boot from the cdrom and then reboot the machine. I also tried the 3.6 cdrom and didn't have any luck. At that point I switched to the less recommended in-place upgrade. After boot into single user mode I removed all of the packages I had previously installed. One of the upgrades I was planning changed gcc versions which was sure to introduce some incompatibilities. After removing about 60 packages I started the upgrade process.

I was very pleased with the instructions. Even for all of the warnings that they give you, the process is straight forward. Really the only tricky part comes in merging changes to /etc, which isn't that bad if you haven't done that much customization. I've tried to follow good practices with keeping my changes in .local files or linked somewhere else entirely. As a result I only had a few changes that I had to manually fix in /etc.

Since I was planning on using the boot images and just on-the-fly downloading of install files I had to side track for a bit to pull down the files for OpenBSD 3.6, 3.7, and 3.8. I already had 3.5 from when I previously looked at what was involved in upgrading. Thanks to pair.com for having a fast OpenBSD mirror. Each version upgrade required two reboots and a bunch of waiting for files to unpack. Besides that it was pretty smooth (minus the /etc merging) and I think it only took about two and half hours to do the four upgrades.

At this point the machine was in a state that I could get core services like sshd up and running. Considering that my machine was reporting a system temperature of 60 degrees, I was happy to get out of the basement and back up to my normal computer. Which was when I had my first scare. I couldn't connect via ssh. Back down stairs I quickly found out that my problem was that I had uninstalled the tcsh package, so I had no shell. I grabbed that and added that package. Now that I could really log in remotely, back up to the warm part of the house.

Since the versions changes on many of the packages I previously had installed and I like having a local copy, I spent the next hour just downloading updated packages. Some of the dependencies had changed (yes I should use the automatic dependency handler, but I'm still leery of those for some reason) which meant downloading additional packages to complete the install of the ones I already had. That ended up taking another couple of hours. At this point I had all of the software I was supposed to and just needed to make sure everything still worked.

Happily most things just did. Jumping all the way from 3.4 to 3.8 did produce these problems:

  • /usr/local/bin/safe_mysqld became /usr/local/bin/mysqld_safe
  • ntpdate handling in /etc/rc.local changed
  • /usr/lib/apache/modules/libphp4.so changed to /usr/local/lib/php/libphp4.so

The next big problem I had was this nasty error message trying to send email to a mailman controlled list:

"/usr/local/lib/mailman/mail/mailman post test". Command output: Group mismatch error. Mailman expected the mail wrapper script to be executed as group "_mailman", but the system's mail server executed the mail script as group "nobody". Try tweaking the mail server to run the script as group "_mailman", or re-run configure, providing the command line option `--with-mail-gid=nobody'.

I double checked that I had grabbed the correct packages. In this case it was mailman-2.1.6p1-postfix.tgz. Searching didn't turn up anything on interest. The only thing that I ran across was a comment in /usr/local/share/doc/mailman/README.OpenBSD:

Problem: I use Postfix for my MTA and the mail wrapper programs are logging complaints about the wrong GID. Solution: Install mailman with the following command: % FLAVOR=postfix make install

I pulled down the ports package and can see that in the mailman Makefile, if the flavor is postfix that is sets up "--with-mail-gid=nobody". Since there wasn't a non postfix mailman-2.1.6p1 package I decided to pull down the 2.1.6p0 package. That installed and ran fine. I now need to look into what was changed going from p0 to p1 and also figure out why the gid stuff changed. Maybe I missed a mod to /etc along the way?

Update It turns out I'm not completely following the standard install, probably since I upgrade instead of doing a fresh install. My mailman aliases were merged in with my regular aliases. Since postfix looks at the owner/group of the aliases file to determine how the wrapper script gets run, this was leading to the problem. I must have missed in the docs where this was spelled out. Anyway I put my mailman aliases in a different file and followed the setup instructions and everything is working with the latest mailman package.

With mailman now working, I had one last issue. A custom compiled httpd was failing to start. A configtest against it said everything was okay. There were no error messages when trying to start it up or in the logs, yet nothing was running. Finally when I tried to run it in single threaded mode it core dumped. At this point I figured that some of the libraries had probably gotten out of sync enough that it just needed to be recompiled. Since it also needed SSL added to it I figured now was as good a time as any.

After pulling down apache, mod_ssl, and mod_perl I went through the standard combination build. Patch apache with mod_ssl then build it all from mod_perl. No dice. Trying to build it shared gave me unresolved symbols in mod_perl and the resulting httpd also core dumped. I ended up having to do a mod_ssl and apache build and then doing an APXS build of mod_perl. Weird stuff. One thing I did run across was the need to set the SSL_BASE when building. In a bash shell something like "SSL_BASE=/usr ./configure (...)" should do the trick.

Needless to say the upgrade took the normal 80/20 rule. What I thought was going to be 80% of the pain ended up only taking 20% of the time while the last 20% took 80% of the time. The end result is that the system is upgraded and everything appears to be working fine. Yay!

Tags: neophi openbsd