Copyright © 2001, 2002, 2003, 2004 Douglas Gilbert
2003-08-24
| Revision History | ||
|---|---|---|
| Revision 2.1 | 2004-08-24 | Revised by: dpg |
| scsihosts change -> run mkinitrd, lk 2.4.21,22 | ||
| Revision 2.0 | 2003-05-04 | Revised by: dpg |
| lk2.4.20, linuxdoc->tldp, sATA and SAS, last sector on raw devs, blockdev | ||
| Revision 1.9 | 2002-11-20 | Revised by: dpg |
| convert to xml, lk2.4.19, spelling | ||
| Revision 1.8 | 2002-05-05 | Revised by: dpg |
| scsihosts comma delimiter, grub+lilo | ||
| Revision 1.7 | 2002-04-27 | Revised by: dpg |
| mkinitrd, scsi_debug, 2.4.18, more ATAPI | ||
| Revision 1.6 | 2002-01-26 | Revised by: dpg |
| ATAPI cdrom selection | ||
| Revision 1.5 | 2001-12-21 | Revised by: dpg |
| 16 byte SCSI commands, SCSI_IOCTL_GET_PCI | ||
| Revision 1.4 | 2001-08-26 | Revised by: dpg |
| spelling, dd_rescue, mkinitrd example, lk 2.4 changes, 1394. | ||
| Revision 1.3 | 2001-08-26 | Revised by: dpg |
| ATAPI CDROM section, alter title, U320, iSCSI. | ||
| Revision 1.2 | 2001-03-25 | Revised by: dpg |
| Information about scu, dt, "Alt" sequences, more notes. | ||
| Revision 1.1 | 2001-01-22 | Revised by: dpg |
| Add osst description, _EXTRA_DEVS limitations. | ||
This document describes the SCSI subsystem as the Linux kernel enters the 2.4 production series. An external view of the SCSI subsystem is the main theme. Material is included to help the system administration of the Linux SCSI subsystem. There are also brief descriptions of ioctl()s and interfaces that may be relevant to those writing applications that use this subsystem.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.
For an online copy of the license see www.fsf.org/copyleft/fdl.html.
This document describes the SCSI subsystem as the Linux kernel enters the 2.4 production series.
An external view of the SCSI subsystem is the main theme. Material is included to help the system administration of the Linux SCSI subsystem. There are also brief descriptions of ioctl()s and interfaces that may be relevant to those writing applications that use this subsystem. However internal data structures and design issues are not addressed [see reference W2]. To unclutter the presentation, compile options and system calls (including ioctl()s) have been placed in Appendix E. Although not strictly part of the SCSI subsystem, there is also a description of raw devices in Chapter 11.
For those who have no interest in the SCSI subsystem and just want to get their ATAPI cd writer going, see Section 9.2.4. It may also be useful to browse Chapter 2.
This document follows on from one written five years ago by Drew Eckhardt called the SCSI-HOWTO [see reference W7]. That document described the SCSI subsystem in Linux kernel 1.2 and 1.3 series. It is still available from the Linux Documentation Project [LDP, see reference W8] in its "unmaintained" section. Both documents have roughly similar structures although Drew's document has a lot of information on the adapter drivers.
This document can be found in electronic form at www.tldp.org/HOWTO/SCSI-2.4-HOWTO. The home site and perhaps the most up to date version of this document can be found at www.torque.net/scsi/SCSI-2.4-HOWTO (this is the multi-page html version). At that location this document is rendered in txt, pdf, ps, a single (long) page of html as well as multi-page html. For example, a pdf version is at www.torque.net/scsi/SCSI-2.4-HOWTO.pdf).
This document was last altered on 24th August 2004.
The SCSI subsystem has a 3 level architecture with the "upper" level being closest to the user/kernel interface while the "lower" level is closest to the hardware. The upper level drivers are commonly known by a terse two letter abbreviation (e.g. "sd" for SCSI disk driver). The names of the corresponding module drivers which, for historical reasons, sometimes differ from the built in driver names are shown in braces in the following diagram.

The 3 level driver architecture of the SCSI subsystem.
The upper level supports the user-kernel interface. In the case of sd and sr this is a block device interface while for st and sg this is a character device interface. Any operation using the SCSI subsystem (e.g. reading a sector from a disk) involves one driver at each of the 3 levels (e.g. sd, SCSI mid level and aic7xxx drivers).
As can be seen from the diagram, the SCSI mid level is common to all operations. The SCSI mid level defines internal interfaces and provides common services to the upper and lower level drivers. Ioctls provided by the mid level are available to the file descriptors belonging to any of the 4 upper level drivers.
The most common operation on a block device is to "mount" a file system. For a sd device typically a partition is mounted (e.g. mount -t ext2 /dev/sda6 /home). For a sr device usually the whole device is mounted (e.g. mount -t iso9660 /dev/sr0 /mnt/cdrom). The dd command can be used to read or write from block devices. In this case the block size argument ("bs") needs to be set to the block size of the device (e.g. 512 bytes for most disks) or an integral multiple of that device block size (e.g. 8192 bytes). A recent addition to the block subsystem allows a device (or partition) to be mounted more than once, at different mount points.
Sd is a member of the generic disk family, as is the hd device from the IDE subsystem. Apart from mounting sd devices, the fdisk command is available to view or modify a disk's partition table. Although the hdparm command is primarily intended for ATA disks (also known as IDE or EIDE disks), some options work on SCSI disks.
Sr is a member of the CD-ROM subsystem. Apart from mounting file systems (e.g. iso9660), audio CDs can also be read. The latter action does not involve mounting a file system but typically by invoking some ioctls. General purpose Linux commands such as dd cannot be used on audio CDs.
St is a char device for reading and writing tapes. Typically the mt command is used to perform data transfers and other control functions.
Sg is a SCSI command pass through device that uses a char device interface. General purpose Linux commands should not be used on sg devices. Applications such as SANE (for scanners), cdrecord and cdrdao (for cd writers) and cdparanoia (for reading audio CDs digitally) use sg.
This section covers the various naming schemes that exist in Linux and the SCSI worlds and how they interact.
Linux has a four level hierarchical addressing scheme for SCSI devices:
SCSI adapter number [host]
channel number [bus]
id number [target]
lun [lun]
"Lun" is the common SCSI abbreviation of Logical Unit Number. The terms in brackets are the name conventions used by device pseudo file system (devfs). "Bus" is used in preference to "channel" in the description below.
The SCSI adapter number is typically an arbitrary numbering of the adapter cards on the internal IO buses (e.g. PCI, PCMCIA, ISA etc) of the computer. Such adapters are sometimes termed as HBAs (host bus adapters). SCSI adapter numbers are issued by the kernel in ascending order starting with 0.
Each HBA may control one or more SCSI buses. The various types of SCSI buses are listed in Appendix A.
Each SCSI bus can have multiple SCSI devices connected to it. In SCSI parlance the HBA is called the "initiator" and takes up one SCSI id number (typically 7). The initiator [1] talks to targets which are commonly known as SCSI devices (e.g. disks). On SCSI parallel buses the number of ids is related to the width. 8 bit buses (sometimes called "narrow") can have 8 SCSI ids of which 1 is taken by the HBA leaving 7 for SCSI devices. Wide SCSI buses are 16 bits wide and can have a maximum of 15 SCSI devices (targets) attached. The SCSI 3 draft standard allows a large number of ids to be present on a SCSI bus.
Each SCSI device can contain multiple Logical Unit Numbers (LUNs). These are typically used by sophisticated tape and cdrom units that support multiple media.
So Linux's flavour of SCSI addressing is a four level hierarchy:
<scsi(_adapter_number), channel, id, lun> |
<host, bus, target, lun> |
A device name can be thought of as a gateway to a kernel driver that controls a device rather than the device itself. Hence there can be multiple device names some of which may offer slightly different characteristics, all mapping to the same actual device.
The device names of the various SCSI devices are found within the /dev directory. Traditionally in Linux, SCSI devices have been identified by their major and minor device number rather than their SCSI bus addresses (e.g. SCSI target id and LUN). The device pseudo file system (devfs) moves away from the major and minor device number scheme and for the SCSI subsystem uses device names based on the SCSI bus addresses [discussed later in Section 3.3 and see reference: W5]. Alternatively, there is a utility called scsidev which addresses this issue within the scope of the Linux SCSI subsystem and thus does not have the same system wide impact as devfs. Scsidev is discussed later in Section 3.4 and ref: W6.
Eight block major numbers are reserved for SCSI disks: 8, 65, 66, 67, 68, 69, 70 and 71. Each major can accommodate 256 minor numbers which, in the case of SCSI disks, are subdivided as follows:
[b,8,0] /dev/sda [b,8,1] /dev/sda1 .... [b,8,15] /dev/sda15 [b,8,16] /dev/sdb [b,8,17] /dev/sdb1 .... [b,8,255] /dev/sdp15 |
The disk device names without a trailing digit refer to the whole disk (e.g. /dev/sda) while those with a trailing digit refer to one of the 15 allowable partitions [2] within that disk.
The remaining 7 SCSI disk block major numbers follow a similar pattern:
[b,65,0] /dev/sdq [b,65,1] /dev/sdq1 .... [b,65,159] /dev/sdz15 [b,65,160] /dev/sdaa [b,65,161] /dev/sdaa1 .... [b,65,255] /dev/sdaf15 [b,66,0] /dev/sdag [b,66,1] /dev/sdag1 .... [b,66,255] /dev/sdav15 .... [b,71,255] /dev/sddx15 |
So there are 128 possible disks (i.e. /dev/sda to /dev/sddx) each having up to 15 partitions. By way of contrast, the IDE subsystem allows 20 disks (10 controllers each with 1 master and 1 slave) which can have up to 63 partitions each.
SCSI CD-ROM devices are allocated the block major number of 11. Traditionally sr has been the device name but scd probably is more recognizable and is favoured by several recent distributions. 256 SCSI CD-ROM devices are allowed:
[b,11,0] /dev/scd0 [or /dev/sr0] [b,11,255] /dev/scd255 [or /dev/sr255] |
SCSI tape devices are allocated the char major number of 9. Up to 32 tape devices are supported each of which can be accessed in one of four modes (0, 1, 2 and 3), with or without rewind. The devices are allocated as follows:
[c,9,0] /dev/st0 [tape 0, mode 0, rewind] [c,9,1] /dev/st1 [tape 1, mode 0, rewind] .... [c,9,31] /dev/st31 [tape 31, mode 0, rewind] [c,9,32] /dev/st0l [tape 0, mode 1, rewind] .... [c,9,63] /dev/st31l [tape 31, mode 1, rewind] [c,9,64] /dev/st0m [tape 0, mode 2, rewind] .... [c,9,96] /dev/st0a [tape 0, mode 3, rewind] .... [c,9,127] /dev/st31a [tape 31, mode 3, rewind] [c,9,128] /dev/nst0 [tape 0, mode 0, no rewind] .... [c,9,160] /dev/nst0l [tape 0, mode 1, no rewind] .... [c,9,192] /dev/nst0m [tape 0, mode 2, no rewind] .... [c,9,224] /dev/nst0a [tape 0, mode 3, no rewind] .... [c,9,255] /dev/nst31a [tape 31, mode 3, no rewind] |
The SCSI generic (sg) devices are allocated the char major number of 21. There are 256 possible SCSI generic (sg) devices:
[c,21,0] /dev/sg0 [c,21,1] /dev/sg1 .... [c,21,255] /dev/sg255 |
Note that the SCSI generic device name's use of a trailing letter (e.g. /dev/sgc) is deprecated.
Each SCSI disk (but not each partition), each SCSI CD-ROM and each SCSI tape is mapped to an sg device. SCSI devices that don't fit into these three categories (e.g. scanners) also appear as sg devices.
Pseudo devices [see Section 10.1] can cause devices that are usually not considered as SCSI to appear as SCSI device names. For example an ATAPI CD-ROM may be picked up by the ide-scsi pseudo driver and mapped to /dev/scd0 .
The linux/Documentation/devices.txt file supplied within the kernel source is the definitive reference for Linux device names and their corresponding major and minor number allocations.
The device pseudo file system can be mounted as /dev in which case it replaces the traditional Linux device subdirectory. Alternatively it can be mounted elsewhere (e.g. /devfs) and supplement the existing device structure.
Without devfs, devices names are typically maintained in the dev directory of the root partition. Hence the device names (and their associated permissions) have file system persistence. The existence of a device name does not necessarily imply such a device (or even its driver) is present. To save users having to create device name entries (with the mknod command) most Linux distributions come with thousands of device names defined in the /dev directory. When applications try to open() the device name then an errno value of ENODEV indicates there is no corresponding device (or driver) currently available.
Devfs takes a different approach in which the existence of the device name is directly related to the presence of the corresponding device (and its driver).
Assuming devfs is mounted on /dev then SCSI devices have primary device names that might look like this:
/dev/scsi/host0/bus0/target1/lun0/disc [whole disk] /dev/scsi/host0/bus0/target1/lun0/part6 [partition 6] /dev/scsi/host0/bus0/target1/lun0/generic [sg device for disk] /dev/scsi/host1/bus0/target2/lun0/cd [CD reader or writer] /dev/scsi/host1/bus0/target2/lun0/generic [sg device for cd] /dev/scsi/host2/bus0/target0/lun0/mt [tape mode 0 rewind] /dev/scsi/host2/bus0/target0/lun0/mtan [tape mode 3 no rewind] /dev/scsi/host2/bus0/target0/lun0/generic [sg device for tape] |
[Notice the spelling of "disc" as the devfs author favours English spelling over the American variant.] It can be seen that devfs's naming scheme closely matches the SCSI addressing discussed in Section 3.1. It is worth noting that the IDE subsystem uses a similar devfs device naming scheme with the word "scsi" replaced with "ide". Devfs is discussed further in Chapter 12.
A utility program called scsidev adds device names to the /dev/scsi directory that reflect the SCSI address of each device. The first 2 letters of the name are the upper level SCSI driver name (i.e. either sd, sr, st or sg). The number following the "h" is the host number while the number following the "-" is meant for host identification purposes. For PCI adapters this seems to be always 0 while for ISA adapters it is their IO address. [Perhaps this field could be made more informative or dropped.] The numbers following the "c", "i" and "l" are channel (bus), target id and lun values respectively. Raw disks are shown without a trailing partition number while partitions contained within them are shown with the partition number following a "p".
The scsidev would typically be run as part of the boot up sequence. It may also be useful to run it after the SCSI configuration has changed (e.g. adding or removing lower level driver modules, or the use of the add/remove-single-device command). After scsidev has been run on my system which contains 2 disks, a cd reader and writer plus a scanner, then the following names were added in the /dev/scsi directory:
$ ls -l /dev/scsi/ # abridged total 0 brw------- 8, 0 Sep 2 11:56 sdh0-0c0i0l0 brw------- 8, 1 Sep 2 11:56 sdh0-0c0i0l0p1 ... brw------- 8, 8 Sep 2 11:56 sdh0-0c0i0l0p8 brw------- 8, 16 Sep 2 11:56 sdh0-0c0i1l0 brw------- 8, 17 Sep 2 11:56 sdh0-0c0i1l0p1 ... brw------- 8, 24 Sep 2 11:56 sdh0-0c0i1l0p8 crw------- 21, 0 Sep 2 11:56 sgh0-0c0i0l0 crw------- 21, 1 Sep 2 11:56 sgh0-0c0i1l0 crw------- 21, 2 Sep 2 11:56 sgh1-0c0i2l0 crw------- 21, 3 Sep 2 11:56 sgh1-0c0i5l0 crw------- 21, 4 Sep 2 11:56 sgh1-0c0i6l0 br-------- 11, 0 Sep 2 11:56 srh1-0c0i2l0 br-------- 11, 1 Sep 2 11:56 srh1-0c0i6l0 |
The scsidev package also includes the ability to introduce names like /dev/scsi/scanner by manipulating the /etc/scsi.alias configuration file. The package also includes the useful rescan-scsi-bus.sh utility. For further information about scsidev see W6. On my system, both devfs and scsidev co-exist happily.
The Linux kernel configuration is usually found in the kernel source in the file: /usr/src/linux/.config . It is not recommended to edit this file directly but to use one of these configuration options:
make config - starts a character based questions and answer session
make menuconfig - starts a terminal-oriented configuration tool (using ncurses)
make xconfig - starts a X based configuration tool
The descriptions of these selections that is displayed by the associated help button can be found in the flat ASCII file: /usr/src/linux/Documentation/Configure.help
Ultimately these configuration tools edit the .config file. An option will either indicate some driver is built into the kernel ("=y") or will be built as a module ("=m") or is not selected. The unselected state can either be indicated by a line starting with "#" (e.g. "# CONFIG_SCSI is not set") or by the absence of the relevant line from the .config file.
The 3 states of the main selection option for the SCSI subsystem (which actually selects the SCSI mid level driver) follow. Only one of these should appear in an actual .config file:
CONFIG_SCSI=y CONFIG_SCSI=m # CONFIG_SCSI is not set |
Some other common SCSI configuration options are:
CONFIG_BLK_DEV_SD [disk (sd) driver]
CONFIG_SD_EXTRA_DEVS [extra slots for disks added later]
CONFIG_BLK_DEV_SR [SCSI cdrom (sr) driver]
CONFIG_BLK_DEV_SR_VENDOR [allow vendor specific cdrom commands]
CONFIG_SR_EXTRA_DEVS [extra slots for cdroms added later]
CONFIG_CHR_DEV_ST [tape (st) driver]
CONFIG_CHR_DEV_OSST [OnSteam tape (osst) driver]
CONFIG_CHR_DEV_SG [SCSI generic (sg) driver]
CONFIG_DEBUG_QUEUES [for debugging multiple queues]
CONFIG_SCSI_MULTI_LUN [allow probes above lun 0]
CONFIG_SCSI_CONSTANTS [symbolic decode of SCSI errors]
CONFIG_SCSI_LOGGING [allow logging to be runtime selected]
CONFIG_SCSI_<ll_driver> [numerous lower level adapter drivers]
CONFIG_SCSI_DEBUG [lower level driver for debugging]
CONFIG_SCSI_PPA [older parallel port zip drives]
CONFIG_SCSI_IMM [newer parallel port zip drives]
CONFIG_BLK_DEV_IDESCSI [ide-scsi pseudo adapter]
CONFIG_I2O_SCSI [scsi command set over i2o bus]
CONFIG_SCSI_PCMCIA [for SCSI HBAs on PCMCIA bus]
CONFIG_USB_STORAGE [usb "mass storage" type]
CONFIG_MAGIC_SYSRQ [Alt+SysRq+S for emergency sync]
[Alt+SyrRq+U for emergency remount ro]
|
If the root file system is on a SCSI disk then it makes sense to build into the kernel the SCSI mid level, the sd driver and the host adapter driver that the disk is connected to. In most cases it is usually safe to build the sr, st and sg drivers as modules so that they are loaded as required. If a device like a scanner is on a separate adapter then its driver may well be built as a module. In this case, that adapter driver will need to be loaded before the scanner will be recognized.
Linux distributions have many of the SCSI subsystem drivers built as modules since building all of them in would lead to a very large kernel that would exceed the capabilities of the boot loader. This leads to a "chicken and the egg" problem in which the SCSI drivers are needed to load the root file system and vice versa. The 2 phase load used by the initrd device addresses this problem (see Chapter 6 for more details).
On a PC the motherboard's BIOS together with the SCSI BIOS provided by most SCSI host adapters takes care of the problem of loading the boot loader's image from a SCSI disk into memory and executing it. This may require some settings to be changed in the motherboard's BIOS. When more than one SCSI adapter is involved, the SCSI BIOS settings may need to change to indicate which one contains the disk with the boot image. The boot image make also come from an ATA (IDE) disk, a bootable CD-ROM or a floppy.
Both lilo and grub are commonly used boot loaders with Linux. Their configuration files are in /etc/lilo.conf and /etc/grub.conf [3] respectively. One difference is that after changing lilo's configuration the lilo command must be executed for the changes to take effect (and there is no equivalent requirement for grub). See their "man" pages for usage information. An excellent paper on lilo and the Linux bootup sequence can be found ftp://icaftp.epfl.ch/pub/people/almesber/booting/bootinglinux-0.ps.gz. For further information on grub see www.gnu.org/software/grub.
Some boot parameters related to the SCSI subsystem:
single [enter single user mode]
<n> [enter run level <n> {0..6}]
root=/dev/sda6 [*]
root=/dev/scsi/host0/bus0/target0/lun0/part6 [*]
root=/dev/sd/c0b0t0u0p6 [*]
devfs=mount [overrides CONFIG_DEVFS_MOUNT=n]
devfs=nomount [overrides CONFIG_DEVFS_MOUNT=y]
init=<command> [executes <command> rather than init]
quiet [reduce output to console during boot]
debug [increase output to console during boot]
nmi_watchdog=0 [turn off NMI watchdog on a SMP machine]
max_scsi_luns=1 [limits SCSI bus scans to lun==0]
scsi_allow_ghost_devices=<n>
|
The "root=" argument may also be a hex number. For example, if the root partition is on /dev/sda3 then "root=803" is appropriate. The last two digits are the minor device number discussed in an earlier section.
The default argument to the "init" parameter is /sbin/init (see man (8) init). If files such as /etc/fstab have incorrect entries, it may be useful to drop directly into a shell with "init=/bin/bash". However if shared libraries files or their paths are inappropriate this may also fail. That leaves "init=/sbin/sash" which is a statically linked shell with many useful commands (for repairing a system) built in (see man (8) sash).
When Linux fails to boot after reporting a message like:
VFS: Cannot open root device 08:02 |
Lilo's configuration file /etc/lilo.conf can take the "root=" option in two ways. The normal way is a line like: 'root=/dev/sda2'. In this case /dev/sda2 is converted into major and minor numbers based on the state of the system when the lilo command is executed. This can be a nuisance, especially if hardware is going to be re-arranged. The other way is a line of the form: 'append="root=/dev/sda2"' In this case the /dev/sda2 is passed through to the kernel the next time it is started. This is the same as giving the "root=/dev/sda2" string at the kernel boot time prompt. It is interpreted by the kernel at startup (once the HBAs and their attached devices have been recognized) and thus is more flexible.
There are many SCSI related modules. The mid and upper level modules are listed below:
scsi_mod.o
sd_mod.o
sr_mod.o
st.o [osst.o]
sg.o
Notice that the first 3 have "_mod" appended to their normal driver names. Lower level drivers tend to use the name (or an abbreviation) of the HBA's manufacturer (e.g. advansys) plus optionally the chip number of the major controller chip (e.g. sym53c8xx for symbios controllers based on the NCR 53c8?? family of chips).
All SCSI modules depend on the mid level. This means if the SCSI mid level is not built into the kernel and if scsi_mod.o has not already been loaded then a command like modprobe st will cause the scsi_mod.o module to be loaded. There could well be other dependencies, for example modprobe sr_mod will also cause the cdrom module to be loaded if it hasn't been already. Also if the SCSI mid level is a module, then all other SCSI subsystem drivers must be modules (this is enforced by the kernel build configuration tools).
Modules can be loaded with the modprobe <module_name> command which will try to load any modules that the nominated <module_name> depends on. Also <module_name> does not need the trailing ".o" extension which is assumed if not given. The insmod <module_name> command will also try and load <module_name> but without first loading modules it depends on. Rules for how modules can cause other modules to be loaded (with appropriate parameters appended) are usually placed in the file /etc/modules.conf. [Note that in earlier Linux kernels this file was often called /etc/conf.modules.] For further information about the format of this file try man modules.conf.
Any module can have its allowable command line parameters queried with this command: modinfo -p <module_name>.
When upper level drivers are initialized and if there are no hosts active then the mid level will attempt to load a module called "scsi_hostadapter". An "alias" can then be used to associate "scsi_hostadapter" with the actual name of the lower level (adapter) driver. For example, a line like "alias scsi_hostadapter aic7xxx" in the /etc/modules.conf file would cause the aic7xxx module to be loaded (if there were no lower level drivers already active). [4]
There is a special relationship between the module parameter "scsi_hostadapter" and the initrd file system. For more information see man initrd and man mkinitrd. [5]
The proc pseudo file system provides some useful information about the SCSI subsystem. The kernel configuration option that selects "proc_fs" is CONFIG_PROC_FS and in almost all cases it should be selected. SCSI specific information is found under the directory /proc/scsi. Probably the most commonly accessed entry is cat /proc/scsi/scsi which lists the attached SCSI devices. See Section 8.3 for more details.
The lower level drivers are allocated proc_fs entries of the form:
/proc/scsi/<driver_name>/<scsi_adapter_number> |