So COMSTAR is this great COMmon Scsi TARget subsystem in Illumos that allows you to turn the box into a true SAN array. It has interconnect support for iSCSI, FC, SRP and iSER, but for our purposes I'm just going to focus on iSCSI, since that's the one I'm most familiar with.
iSCSI is really just a method of sending SCSI commands over TCP/IP, allowing you to provide storage services to other devices on a TCP/IP network. This article isn't primarily intended to teach you all the ins and outs of iSCSI, so if you want to know more, I suggest you head over to your friendly professor Wikipedia and learn all about iSCSI.
The primary problem with COMSTAR is that its configuration is kind of, well, let's say "clumsy". The configuration store is part of the SMF service database (which is stored in SQLite), and even if we could get at it by using the svccfg(1M) command, the contents itself is a bunch of packed nvlists and various binary blobs. This is further complicated by the fact that we can't just write out a slightly modified configuration to the SMF service store and have the kernel pick up the differences easily. What the COMSTAR administration commands do is they actually tell the kernel to set up each portion of the stored configuration using specific ioctl() calls. This makes programmatic modification of only portions of the running configuration on a system very complicated.
To circumvent this, I've resorted to a different approach. Instead of keeping the stored COMSTAR configuration as authoritative and then attempt to somehow programmatically modify it and then hope to get its run-time reconfiguration right via the tons of undocumented or poorly documented ioctl() interfaces, I've resorted to ignore the stored configuration entirely. Luckily COMSTAR supports a "no persistence" option in the service configuration, so that any configuration commands issued don't actually modify the persistent configuration in the SMF configuration store. This pretty much means that any time the machine is rebooted, the COMSTAR configuration will be entirely empty and the machine won't try to do anything. That's good and what we want, because in the next step we're going to tell it what to do from our cluster control software. This is similar to what we do in the Heartbeat resource script for ZFS, which explicitly ignores the ZFS cache file to avoid machines auto-importing pools at boot up.
The next step involved writing a program that is capable of using the standard COMSTAR administration commands to set up a running state in COMSTAR to our liking. Naturally, we need to store the desired SCSI target and LU configuration in some place, and what better place to choose than the ZFS pool from which we'll be exporting volumes and migrating between clustered machines. That's why I wrote a simple(ish) shell script called stmf-ha that can be invoked by cluster control software to reconstruct the running state of COMSTAR when we want to import the pool and tear it down when want to we export the pool.
Integrating stmf-ha into the cluster
In part 1 of this guide we've set up Heartbeat and Pacemaker to provide clustering services to our storage array. We've installed the custom ZFS resource script from stmf-ha into Heartbeat to teach our clustering software how to import & export ZFS pools and then we've set up one or more ZFS pools to work on. For NFS this would have been enough, since the NFS configuration is stored on the pool itself and Illumos automatically restores it at pool import, but for COMSTAR we need do this ourselves.
The stmf-ha package includes a script called zfs-helper. Copy this file into the /opt/ha/lib/ocf/lib/heartbeat/helpers directory (create it if necessary) and of course the stmf-ha script itself into some place where it can be invoked with a standard PATH environment variable for root (e.g. /usr/sbin) - alternatively, you can modify the STMF_HA variable in the zfs-helper script to point to where you've placed stmf-ha. The helper script is invoked by the ZFS resource script in Heartbeat to perform additional setup and teardown operations before and after pool import and export. The helper script then invokes stmf-ha after import has succeeded and just prior to export, passing it the pool name we're manipulating.
Configuring iSCSI resources in stmf-ha
So now that we've got stmf-ha installed and integrated into the clustering software, we can begin to create ZFS volumes and exporting them via iSCSI to initiators. I will assume you are familiar with general iSCSI nomenclature and the principles of how to configure iSCSI in COMSTAR.
The simplest way to start testing is to simply create an empty "stmf-ha.conf" file in the root of the ZFS pool. This simply tells stmf-ha that you want to export all of the ZFS volumes on that pool as iSCSI LUs under a default iSCSI target without any access restrictions. This is good for testing, but once you get things going, you'll probably want to lock the setup down a little bit better.
See the manpage of stmf-ha(1M) (copy stmf-ha.1m to /usr/share/man/man1m on your machine and then type "man stmf-ha") - it explains all the special cases and methods of how to configure your pool, your target portal groups and various other access criteria. Also have a look at the sample configuration file which will help you get started fairly quickly.
Once a pool is imported, you can also make changes to both the stmf-ha configuration and to the list of exported ZFS volumes. To reload configuration changes to the script, or e.g. when creating a new ZFS volume you want to export, simply issue the "stmf-ha start <poolname>" command again. The stmf-ha script will re-read the configuration file, the running state of COMSTAR and the pool state and reapply things so that everything that should be exported is exported. Again, please read the manpage, there's lots of info there on what stmf-ha can do and where you'll have to nurse it a bit.
Please keep in mind that stmf-ha and COMSTAR configuration takes some time. This is especially evident when trying to fail over a pool that's taking a lot of load, since offlining a heavily loaded LU takes a few seconds while we wait for I/O to the LU to cease. In most cases this shouldn't be an issue, especially if your initiators know how to handle targets that go away for a while to do some cluster fail-over (e.g. VMware will hold VM I/O for up to ~120s before declaring the datastore inaccessible), but keep in mind to test, test, test prior to deployment in production - try pulling power cords, network links, hard drives and killing the odd process on the box to simulate out-of-memory conditions. Ultimately there's nothing you can do to prepare yourself for every eventuality out there, but you at least want to understand and verify how the system behaves in the most common failure scenarios. In clustering, predictability is the name of the game, so when you're unsure what's going on, don't change anything.