On Fedora updates

March 14th, 2010

There’s been a lot of talk recently about Fedora’s policy (or lack thereof) for shipping updates to existing stable releases. Rather than keep repeating the same points on the mailing lists ad nauseam, let me give my own $0.02 here. Keep in mind that these are my own personal opinions, and nobody else’s.

First off, I know in using Fedora that I am not using an “Enterprise” distribution that is intended to remain rock solid and stable for a long time without substantive changes. I’m ok with having to upgrade every 6 or 12 months, and I’m willing to deal with fixing breakage when that happens (though obviously in a perfect world the upgrade would be entirely seemless). What I am not ok with is updates shipping that cause any breakage or behavioral changes to my perfectly working system when I have not asked to perform a major upgrade. I expect that, if I upgrade my laptop system ten minutes before a meeting, then it will still work exactly as it did before. I don’t want to have to delay doing any updates – as I do now – for fear of the result.

Since time doesn’t stand still following a release, and bugs and regressions are found – and security issues are raised – a flow of updates to a “stable” release is both necessary and healthy to any distribution. But updates should be just that: updates. And also “necessary”. To me, an “update” is not shipping a major version bump on an existing piece of software, or replacing an entire stack (complete with all manner of behavioral changes – no matter how “small”: it does matter if that menu item moves around mid-release) – after the release. That’s called a new release. Or rawhide. Or whatever. The point is that a release has to have some kind of meaning for it to even be worth having a release. Otherwise, you may as well just call it “Forever Rawhide”.

Now I’m not saying there can’t be flexibility. For example, I don’t personally care at all about KDE update frequency. I’m sure the people who work on it (many of whom I have met over the years) are very nice people, and I know they do good work. But I don’t use KDE (other than a few specific pieces of software, such as k3b), and I haven’t for years. So if KDE is updated ten times a day, I’m not going to even notice. I’d rather, for the sake of the users have a consistent policy, but perhaps that stack could be excluded since its maintainers are quite vocal about being able to make a lot of updates. I would rather an exemption were made for their specific stack rather than have the rest of the distribution need to following the same rolling update trend.

What I am going to notice, however, is if any of the critical path components that I rely upon is broken, has a behavioral change, or is needlessly updated way too often to make any real sense. Needless includes pulling in some minor upstream bits that aren’t materially warrented by actual or likely bug reports. Those things are best done in rawhide where they belong, and where those who are more than willing to test as they go will happily help shake out issues. I myself run rawhide also, but on dedicated real or virtual machines that are only for testing and not intended to be used or relied upon for daily work. Even in the case of rawhide, I think things should be at least reasonably tested on a standalone system (more than just compile tested) before pushing if they stand a chance of breaking something fundamental in the distribution.

Think about it this way. The Fedora development cycle is about 6 months. If you are a user and really, badly need some major new feature, you might have to wait an average of 3 months. Even if that’s hardware enablement that makes the distribution otherwise inaccesible to you, I would rather that you have to wait 3 months for the new version (during which time you are free to try the pre-releases, alphas, betas, etc.) than ship an intrusive update that may negatively affect other users who already have working systems that can already make full use of what they have available. It’s simply not worth inconveniencing existing users of a stable release for the possible benefit of those who are not already using it and can wait until the next time.

Anyway. I think Fedora needs an update policy, and it needs a strong one. If you know me, you know I am far from a conservative guy, but I do think that stable Fedora updates should have a fairly conservative update policy for at least all critical path components. Those should never be updated unless necessary to fix specific bugs, and only in a fashion not likely to cause regressions for other users not affected by those bugs, or who rely upon specific behavior not to change (i.e. not a whole major version bump). The components not in the critical path can have more wiggle room if necessary, but I would still like to see far fewer updates in the stable releases.

Jon.

Remote kgdb target debugging via the Cyclades TS-3000 Terminal Server

February 12th, 2010

So I’ve been poking at Jason Wessel’s kgdb patches recently (specifically, the ones in kgdb-next – you do believe in kernel debuggers, right? Good). They came in very handy when trying to track down an obscure netfilter brokenness last week that was causing Fedora kernels to fall over reproducibly when running KVM. That particular issue was caused by libvirt’s namespace code that attempts to create additional network namespaces on startup, just to see if it’s possible (for optional containers support). After a very long weekend, I pointed out a number of bugs that got fixed. But it got me thinking about kgdb and being able to easily debug stuff that rolls over and plays dead.

Traditionally, I have used a (somewhat loud, and sometimes therefore unfortunately annoying) PC attached to my debugging target via a serial crossover cable. Actually, it’s the inverse of the usual setup in which said other PC is intended to be the target of experimental test kernels, with my desktop generally not being anticipated to fall over with kernel bugs (as it has been doing increasingly of late). In any case, it’s not optimal to leave that PC running and I prefer it being used for evil test experiments. An opportunity to buy random crap on eBay presented itself in the form of an awesome Cyclades (now some other random company) terminal server. I bought a TS-3000 for $115, which is less than a tenth of what they used to go for retail. 48 ports of serial terminal server goodness for the home.

Photo: My Cyclades TS-3000 sitting atop an APC Masterswitch Plus

I was never very good at waiting for santa. I was tracking this damned thing several times a day for the two days it was in transit. And when it arrived – shock! – it might not have the latest firmware! Quick! Time to fix that. I hadn’t even used it in anger before I managed to brick the thing with an update not intended for this model. Cursing myself, I figured I would just rescue it via TFTP. But that requires a special console cable (not quite the same as some others) in order to interrupt the standard boot. Obviously I had none of these cables, and all of the ones here were useless. And I wasn’t prepared to wait ten minutes to order another one. So I went to Microcenter, and bought two RJ45-DB9 generic converters you can click together to wire yourself.

I followed a diagram online to make the RJ45-DB9 cable for the Cyclades – twice. But all of the posted diagrams were incorrect (this is nothing like a Cisco cable, even if you’re a moron and think that it is when you incorrectly make a website with the wrong pinouts, especially if you’re Cyclades and write a manual with the wrong information contained within it…thanks a bunch!). Not to be discouraged, the soldering iron came out, and I rummaged around in a box of parts to find some serial connectors. Fortunately, I had a female DB9 and plenty of old crappyish network cables. I soldered, desoldered, and resoldered this thing about 4 times before finding the correct Cyclades console cable pinout (ADB0036 female DB9) (repeated below, for the benefit of others who read this). Finally, I reflashed the unit with the same firmware it had had when it arrived (zImage_ts_140-3.bin) – the “new” firmware was only for specific other units of which mine was not one thereof, there is a newer “GPL” kit I will poke at sometime – and booted it up.

Photo: A homebrew Cyclades ADB0036 Cable

RJ45 pin DB9 pin
1 8 (CTS)
2 1 (DCD) and 6 (DSR)
3 2 (RD)
4 5 (SGND)
5 7 (RTS)
6 3 (TD)
7 4 (DTR)
8 4 (DTR)

Figure: The correct pinout for a Cyclades ADB0036 console cable (RJ45 to Female DB9 connector)

Cyclades made good (fanless) hardware, but they were hardly the most adept at making configuration straightforward. Sure, you can configure the network easily (this one is called “morse” after the inventor – in the US – of the coding used for telegraphs, which are an ancient precursor to the RS232 standard used on modern serial ports), but when it comes to the port setup…what you want to know is that you’re looking for the “Socket SSH” option, set to increment (e.g. from “1″ – no need to use the “7001″ example, you’re not directly sshing into the port anyway, as with telnet), and based upon a simple “CAS profile” with local authentication (make sure you add a new “system” user for those SSH logins), unless you want to use RADIUS (I have home KRB5, but haven’t deployed RADIUS at the moment). Always make sure you “Run Configuration” before flashing – it seems the former writes to the actual config files that the latter will use, so you cannot necessarily flash and then “Run Configuration” that way around, depending upon the particular operation you are performing.

Once you have the terminal server running, you can talk to it:

$ ssh user_name:port_number@terminal_server.address

More importantly perhaps, you can use the gdb remote target:

(gdb) target remote | ssh -t -t user_name:port_number@terminal_server.address

Remember to tell ssh not to ruin the day (fail to allocate a pty for your friendly conversation) by specifying the “-t -t”, then you can talk to Jason’s kgdb stub.

Next steps? I need to make some more of these damned ADB0036 cables (or find some more on eBay – anyone want some useless Cisco cables I bought thinking they were the same?) and hook them up to all of my systems at home. They will then constantly log via the awesomeness of GNU screen to a remote VM, and I can jump in if something rolls over and catch it so I won’t miss panic/debug opportunities.

Kernel debuggers FTW.

Jon.

(Not) flying with American Airlines…

December 21st, 2009

So we’re flying to the UK for the holidays, and had booked with American as Virgin’s prices had gone up a little too much at the last minute. We left Cambridge on Saturday afternoon and got to the airport in time for our flight down to New York, from where we would fly out to London. There was some bad weather coming, so they were keen to leave prompty.

We left the gate at Logan, began taxiing, and then returned to the gate as we were “overweight” (the plane was half full) and needed to shed some weight “due to the bad weather” (translation: we don’t know how we’re going to get there so just ignore the fuel truck outside putting more fuel into the plane discretely). We did take off not much later than we were supposed to, and then played a game of Turbulance!(TM) as we tried the fun game of flying towards a ‘nor easter storm coming up the Eastern seaboard. Arriving in New York before much of the bad weather hit, we waited around for a couple of hours for a flight. So far American’s staff had behaved reasonably, and had even understood both the words “courteous”, and even “professional”.

The next flight was the fun part. We were supposed to leave at 19:10 for London. But the huge frigging snow storm getting going outside, tending toward blizzard conditions had other ideas in mind. We almost left the gate, but then sat (on the plane) at the gate for a total of around 5 hours while various airport officials closed down parts of the airport and the situation went from bad to worse. Airlines can’t do anything about the weather, but American also offered the bonus of the most surly, sassy New York flight crew you’re likely to meet.

Not only did this flight crew not care (being rude from the moment I got on board, including a look of obvious disdain when I asked after the veggie meal they had forgotten to provide yet again) but they regularly failed to make any announcements as to the flight’s progress (or lack thereof) leaving us to use various phone “apps” to know what was happening. Every 30 minutes or so, various people would get text alerts about another delay, culminating in notifications of a canceled flight about an hour before they decided to officially tell us. Meanwhile, we’re sat on the plane. The cabin crew did decide to feed us eventually (for which I am grateful), although they of course had no veggie options and weren’t willing to find any alternative to a couple of pieces of bread and a dessert.

After the flight was officially canceled a bit before midnight, we got off the plane and headed back into the terminal, where a total of one person from American Airlines met us and held up a handwritten sign with an “800″ number on it. They took some enjoyment in not being able to offer anything other than this, saying we should call or stand in line in the ticket hall. We decided, like most others, to do both of these things, and joined a line of literally a thousand people in the ticket hall of JFK terminal 8, to speak with the one (or perhaps two) members of AA staff able to help. I felt sorry for some of the foreign travelers who didn’t really understand that AA has no legal liability under bad weather to provide you with any level of service at all, and that they particularly enjoy rubbing this in your face as much as possible when it happens. They also had no idea what to do about the large numbers of people who had handed in I94 forms, or the person who’s visa was/or had expired and was en route to an embassy. Bad weather happens, and it’s nobody’s fault, but the worst possible customer service from bitter airline staff isn’t a requirement.

A little over an hour after making my second call to American Airlines (all the while standing in the same line) I got through to a human being. I was told there were no flights to London we could rebook on until about 4 days later, although I could fly on a flight from Boston in the morning. We met some other travelers in the same situation, and for a while were thinking of renting an SUV and driving all night to get back to Boston for the only flight. Even the one or two AA people wondering around the airport (who could not help, but seemed to be mostly surveying the little people) said this was a suicidal idea and that we should call back and rebook on the flight 4 days later on. I waited on the phone for 2 more hours (while still standing in the same line, that had gotten a little shorter, but which was now waiting for a closed AA counter that wouldn’t re-open for several more hours) to speak with another human being about my options.

As we stood in line with thousands of others, it seemed apparant that either JFK or AA – or both perhaps – had little in the way of any real emergency planning in place ahead of time. People were sleeping on the ticket stands, on the baggage belts, anywhere they could lie down (the baggage was ten feet from the gate inside the plane but would take a number of hours to retreive). There was very little in the way of any food available and no hot beverages. There were kids screaming, there were people watching their holiday plans become ruined around them, and all the while there were too few staff even for a regular non-busy day. Meanwhile, I was holding with American Airlines “800″ number, listening to the music and being told I could also go online [hint to AA: when there is some giant problem and thousands are holding, why not remove the most demoralizing messages?] to get various useless information.

That’s when Andre answered. He was the only person at American Airlines who was curteous, and pleasant to us in the whole time our of experience (and for which I told him I will write personally to the management at AA praising him). Not only was he sympathetic (after likely working with countless others for the past few hours), but he went and checked for real alternative options we could actually use, and used them. He found a flight from Boston on the Monday morning for both us and the British Virgin Islands family with two young children we had offered to help, and although there was no way they could fly us to Boston, there was a sense of a possibility to still make it over for the holidays. We went on a game of bag hunting (we found ours, but our travel friends didn’t manage to find one of theirs) amongst the large number of people wondering around or sleeping in the baggage hall and then figured out a plan for getting the train back up to Boston for the alternative flight.

Not only had the blizzard knocked out the airport for a while, but it had also taken out a number of the Airtrain cars in their system at JFK, resulting in a very reduced service where the doors could become stuck (and for which a staff member was carrying a WD-40-like product). We had a fun time getting to Jamaica station and had hoped to take the Long Island Railroad into Penn. station for a train to Boston, but had to change to the subway instead as the former wasn’t really operating. We got to Penn station a couple of hours after leaving JFK, and booked tickets for the 9am train to Boston. Amtrak was canceling trains, but they did manage to run the 9am train – only an hour later than that. We got back to Boston South Station around 14:30 and headed home for the first shower and real sleep in a day. We’ll try the whole thing again in another few hours.

I don’t blame the blizzard conditions for causing havoc. Most of the passengers understood that these things happen and we can only do our best in such circumstances. But American Airlines seem to have no procedure in place to handle emergencies at JFK. They also seem to have a training program intended to suck the humanity and common decency out of their New York based staff. As much as possible, they tried at every turn to be surly and rude (or a combination) rather than being helpful. I’m sure it was a stressful situation, but they get worse than a failing grade for even basic customer service in a situation for which they should have had some contingency plan in place, and the utter contempt and rudeness of their staff towards most of their passengers is some of the most disgracefully disgusting stuff I have seen in years of travelling. I will write to their CEO in a few days, demanding an explanation.

Jon.

Cloning a Fedora rawhide virtual machine

August 8th, 2009

Setting up a clone of a Fedora rawhide virtual machine is so simple…

  • Create a new virtual machine instance
  • Stop and then copy the disk image file for the previous VM
  • Boot the new VM in single user mode
  • Edit the /etc/sysconfig/network file to change the hostname
  • Edit the /etc/sysconfig/network-scripts/ifcfg-eth0 file to change the networking
  • Do exactly the same thing in /etc/udev/rules.d/70-persistent-net.rules
  • grep through the filesystem to see where else network data is duplicated.

Notice how more and more abstraction of network configuration does not a simpler system make. At least I don’t care about sound on my virtual machines, so to avoid that fun I simply delete the sound device whenever I create a new VM. I never use NetworkManager on boxes with fixed IPs – somehow I don’t think cloning would get any easier (unless I used DHCP, which does work here but I prefer being certain the box has a fixed configuration when used for testing) with that turned on.

Jon.

Fencing your entire apartment with X10

August 3rd, 2009

So I’ve begun using X10 for some non-essential systems at home (AC, lights, etc.). I am using a firecracker and the following script, which I will put a nice little iPhone-compatible interface around:

#!/bin/sh
#
# The X10 devices are controlled via a "firecracker" on ttyS1
#
# Copyright (C) 2009 Jon Masters 
#
# Distributed under GNU General Public License version 2.
#

X10_FC_PORT="/dev/ttyS1"
X10_BR_COMMAND="/usr/bin/br --port $X10_FC_PORT"

# e.g. TARGETS=("device1" "device2"...)
TARGETS=("device1")
# e.g. MODULES=("A1" "A2"...)
MODULES=("A1")

TARGET="$1"
COMMAND="$2"

x10_set_module_state() {
        module=$1
        state=$2

        if [ "xon" == "x$state" ]
        then
                $X10_BR_COMMAND $module on
        else
                if [ "xoff" == "x$state" ]
                then
                        $X10_BR_COMMAND $module off
                fi
        fi
}

x10_set_module_state_all() {
        state=$1

        for ((i=0;i< $((${#TARGETS[@]}));i++)); do
                echo "setting module ${TARGETS[${i}]} to $state"
                x10_set_module_state ${MODULES[${i}]} $state
                echo "waiting for module ${TARGETS[${i}]} to settle"
                usleep 500000
        done
}

in_array() {
        haystack=( "$@" )
        haystack_size=( "${#haystack[@]}" )
        needle=${haystack[$((${haystack_size}-1))]}
        for ((i=0;i<$(($haystack_size-1));i++)); do
                h=${haystack[${i}]};
                [ "x$h" == "x$needle" ] && return $i
        done
        return 255
}

if [ "x$TARGET" == "x" ] || [ "x$COMMAND" == "x" ];
then
        echo "Usage: ippower  |  "
        echo ""
        echo "TARGETS: ${TARGETS[@]}"
        echo "COMMANDS: on off"
        echo ""
        exit 1
fi

if [ "xall" == "x$TARGET" ]
then
        if [ "xon" == "x$COMMAND" ]
        then
                echo "turning on all modules"
                x10_set_module_state_all on
        elif [ "xoff" == "x$COMMAND" ]
        then
                echo "turning off all modules"
                x10_set_module_state_all off
        fi
else
        in_array "${TARGETS[@]}" "$TARGET"; item=$?

        if [ 255 -ne $item ]
        then
                MODULE="${MODULES[$item]}"
                #echo "TARGET: $TARGET"
                #echo "MODULE: $MODULE"
                if [ "xon" == "x$COMMAND" ]
                then
                        x10_set_module_state $MODULE on
                        echo "requested module $TARGET on"
                elif [ "xoff" == "x$COMMAND" ]
                then
                        x10_set_module_state $MODULE off
                        echo "requested module $TARGET off"
                fi
        else
                echo "Unknown target: $TARGET"
        fi
fi

Jon.

Remote fencing with an APC Masterswitch Plus (with an AP9606)

June 28th, 2009

Photo: APC Masterswitch Plus (with an AP9606)

As I mentioned before, I’ve been fencing most of my home/office systems (and even lights) these days. The problem is that cheaper power switches like the IP Power 9258 can be damaged quite easily. Two of mine have failed under a particular load element and I’m not saying in that case that it’s not my fault (I still like those units), but it’s clear that having something more “household name” can be a good idea. So I looked on ebay and discovered that old APC Masterswitches now often go for similar money to other more expensive kit.

I bought an 8-port Masterswitch Plus (with an AP9606) this week. Previously these went for up to $1000, but can now be had for even a tenth of that much. And they do telnet/SNMP (and ssh, if you upgrade them – not so much of a concern in this particular out-of-band configuration). I looked around for fencing scripts and obviously found the Red Hat Cluster Suite fence_apc stuff but I don’t want to install lots of stuff, and I don’t want to talk over telnet if I’ve got a private SNMP community configured and am reasonably comfortable with that. So I updated my previous script to talk to APC Masterswitch units.

APC Masterswitch Plus (with an AP9606) fencing script.

Jon.

ZNC awayping plugin (now with improved “antiping”)

June 22nd, 2009

Code: http://jonmasters.org/pub/util/awayping/awayping.txt

Do you constantly get harassed on IRC with “ping?” (insert no context whatsoever here), of course you do. And then you come back later with a bunch of “ping” and no idea what the person wanted.

For those who just bought a computer ten minutes ago (I know there are still a few people out there), here’s an example of fail:

<someone> jonmasters: ping

That is utterly useless. It results in a ping/pong/ping cycle that can go on at some length, and then probably an accompanying email cycle, and maybe worse. Multiply that by a half dozen-dozen different pings and you’ve wasted a fair chunk of time just to find out what someone wants – and have no ability to prioritize or even know if the issue is still even an issue when you read a ping even a few minutes later. Here’s an example of non-fail:

<someone> jonmasters: some useful contextual message here?

I know many of you gave up even listening to these contextless “ping” messages years ago (because we’ve spoken about it at some length), or you don’t bother to leave anything connected to IRC if you’re not in front of it, or you just don’t care (hoping that people will learn how to use a computer and try again). But in case you still do care, I would like to share a plugin I wrote for ZNC called “awayping”. Away ping texts (a single line), emails you (full IRC transcripts), and tweets you (by private message) when you are detached or after a configurable idle period. It’s better than simply “autoaway”.

Awayping is getting slightly more clever over time, and the new “antiping” feature enhances awayping by also politely educating those who “ping” you (by private message) that leaving a message is infinitely more helpful later than simply 5 “ping”s on the screen. It might also encourage a few people to consider that they could send you email instead.

Here’s an example “antiping” reply:

<jonmasters> *********************************************************
<jonmasters> *** This user is marked as busy. A text message just  ***
<jonmasters> *** got sent with your 'ping'. But 'ping' alone isn't ***
<jonmasters> *** useful in a text/log message. Can you let me know ***
<jonmasters> *** what your ping was about? Your reply will be sent ***
<jonmasters> *** along so I can respond appropriately upon return. ***
<jonmasters> *********************************************************

With “awayping”, you can get email or text alerts of pending “ping” messages, and encourage people to use the internet responsibly, so you don’t have to constantly check IRC and can do something more useful instead. Because, let’s face it, they’re just going to email you anyway.

Jon.