热插拔的一些问题

一台 R510,12 块硬盘,在启动进行 SAS 检查的时候只能检查到 11 块硬盘,slot 7的硬盘缺失(编号从 0 开始,到 slot 11),最初没有理会这个问题,系统照装,安装完毕后磁盘的编号如下:
$ ls /dev/sd*
sda   sdc   sdc2  sdd   sde   sdf   sdg   sdh   sdi   sdj   sdk   sdl   sdm
sdb   sdc1  sdc5  sdd1  sde1  sdf1  sdg1  sdh1  sdi1  sdj1  sdk1  sdl1  sdm1

系统启动加载硬盘的信息:
$ dmesg  | grep disk
[    9.008580] sd 1:0:0:0: [sda] Attached SCSI removable disk
[    9.133147] sd 2:0:0:1: [sdb] Attached SCSI removable disk
[   13.128908] sd 0:0:1:0: [sdd] Attached SCSI disk
[   13.131742] sd 0:0:2:0: [sde] Attached SCSI disk
[   13.133006] sd 0:0:0:0: [sdc] Attached SCSI disk
[   13.133979] sd 0:0:3:0: [sdf] Attached SCSI disk
[   13.137656] sd 0:0:4:0: [sdg] Attached SCSI disk
[   13.141043] sd 0:0:5:0: [sdh] Attached SCSI disk
[   13.142132] sd 0:0:6:0: [sdi] Attached SCSI disk
[   13.145329] sd 0:0:7:0: [sdj] Attached SCSI disk
[   13.150285] sd 0:0:8:0: [sdk] Attached SCSI disk
[   13.150904] sd 0:0:9:0: [sdl] Attached SCSI disk
[   13.153535] sd 0:0:10:0: [sdm] Attached SCSI disk

从内核动态的 scsi 文件中可以看到硬盘的排列是从 0-10 的:
$ cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: iDRAC    Model: LCDRIVE          Rev: 0323
  Type:   Direct-Access                    ANSI  SCSI revision: 00
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: iDRAC    Model: Virtual CD       Rev: 0323
  Type:   CD-ROM                           ANSI  SCSI revision: 00
Host: scsi2 Channel: 00 Id: 00 Lun: 01
  Vendor: iDRAC    Model: Virtual Floppy   Rev: 0323
  Type:   Direct-Access                    ANSI  SCSI revision: 00
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 03 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 04 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 05 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 06 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 07 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 08 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 09 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 10 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05

后来将 slot 7 的硬盘取出,换上新的硬盘,此时从 scsi 上读到的是新硬盘的 ID 为 11,磁盘编号变为了 sdn。显然,我们想要的是 slot 11 的磁盘编号对应 sdn,slot 11 对应 sdm … slot 7 对应的则是 sdj。然后做了下面这个实验,系统盘是在 slot 0 上,其余从 slot 1 到 slot 11 都是做数据,没有 RAID。将数据的 11 块硬盘全部格式化,依照对应的 slot 进行重新划分,也就是说:
slot 0     – sdc
slot 1     – sdd
slot 2     – sde
slot 3     – sdf
slot 4     – sdg
slot 5     – sdh
slot 6     – sdi
slot 7     – sdj *
slot 8     – sdk
slot 9     – sdl
slot 10 – sdm
slot 11 – sdn

然后 mount -a,此时很完美,/proc/partitions, /proc/scsi/scsi 显示的都是理想的对应情况。接下来,我将 slot 7 上的硬盘给直接拔出,scsi 显示如下信息:
$ cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: iDRAC    Model: LCDRIVE          Rev: 0323
  Type:   Direct-Access                    ANSI  SCSI revision: 00
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: iDRAC    Model: Virtual CD       Rev: 0323
  Type:   CD-ROM                           ANSI  SCSI revision: 00
Host: scsi2 Channel: 00 Id: 00 Lun: 01
  Vendor: iDRAC    Model: Virtual Floppy   Rev: 0323
  Type:   Direct-Access                    ANSI  SCSI revision: 00
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 03 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 04 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 05 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 06 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 08 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 09 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi0 Channel: 00 Id: 10 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05

$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdc5            576671280 149679332 397698740  28% /
none                  24769492       304  24769188   1% /dev
none                  24773936         0  24773936   0% /dev/shm
none                  24773936        72  24773864   1% /var/run
none                  24773936         0  24773936   0% /var/lock
none                  24773936         0  24773936   0% /lib/init/rw
/dev/sdc1               188403     36940    141735  21% /boot
/dev/sdd1            576862176    202304 547356912   1% /data/data1
/dev/sde1            576862176    202304 547356912   1% /data/data2
/dev/sdf1            576862176    202304 547356912   1% /data/data3
/dev/sdg1            576862176    202304 547356912   1% /data/data4
/dev/sdh1            576862176    202304 547356912   1% /data/data5
/dev/sdi1            576862176    202304 547356912   1% /data/data6
/dev/sdj1            576862176    202304 547356912   1% /data/data7
/dev/sdk1            576862176    202304 547356912   1% /data/data8
/dev/sdl1            576862176    202304 547356912   1% /data/data9
/dev/sdm1            576862176    202304 547356912   1% /data/data10
/dev/sdn1            576862176    202304 547356912   1% /data/data11

可以看到 slot 7 上的硬盘确实被拔出了:
$ cd /data/data11
$ ll
ls: reading directory .: Input/output error
total 0
$ sudo umont /data/data11

接下来,我再准备将其再插上去,我肯定希望系统能再次识别出如下的信息:
Host: scsi0 Channel: 00 Id: 07 Lun: 00
  Vendor: SEAGATE  Model: ST3600057SS      Rev: ES65
  Type:   Direct-Access                    ANSI  SCSI revision: 05

于是 g 了一下,说可以像下面这样操作:
1.将新硬盘插到机器上
2.echo "scsi add-single-device 0 0 7 0" > /proc/scsi/scsi

上面的几个数字分别代表 Host, Channel, Id, Lun,其中第一个 Host 也就是 SAS 控制器。参照 scsi 里面其余的硬盘就知道规律了。按照上面说的做了,但是奇怪的是,slot 7 的那块硬盘并没有像想象的那样安排在 Id 为 07 的位置上,而是显示为了 12。
考虑到我刚刚是直接硬拔出来的,先进行如下操作,再将 slot 7 的硬盘再次拔出:
# echo "scsi remove-single-device 0 0 12 0" > /proc/scsi/scsi

然后在插上,但是此时 Id 又变成了 13。最后 reboot 了一下,又恢复到了我们的理想状态,slot 7 的 Id 恢复为了 7。尽管这个对系统运行没有什么影响,但是每拔一次 Id 有要变一次的情况,看着实在是不舒服。

ref:
https://raid.wiki.kernel.org/articles/h/a/r/Hardware_issues.html
http://www-uxsup.csx.cam.ac.uk/pub/doc/suse/suse9.3/suselinux-adminguide_en/cha.hotplug.html
http://rackerhacker.com/2009/04/23/re-scan-the-scsi-bus-in-linux-after-hot-swapping-a-drive/
http://serverfault.com/questions/5336/how-do-i-make-linux-recognize-a-new-sata-dev-sda-drive-i-hot-swapped-in-without