Thursday, 1 December 2011

Solaris - Replace faulty mirorred root disk


Before shutdown the machine and physically replace the disk

1. Make a backup of the following files
# cp /etc/vfstab /etc/vfstab.before_mirror
# metastat –c > /var/crash/metastatC.out
# metastat > /var/crash/metastat.out
# metadb > /var/crash/metadb.out
# cp /etc/system /var/crash

In this procedure we will assume that /dev/dsk/c0t1d0 disk failed.

2. Check defective drive:

# iostat –En /dev/dsk/c0t1d0 [You will see errors]

c0t1d0 Soft Errors: 0 Hard Errors: 102 Transport Errors: 231
Vendor: SEAGATE Product: ST914602SSUN146G Revision: 0400 Serial No: 070490N5V8
Size: 146.80GB <146800115712 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 102 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

# cfgadm –al

Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 disk connected configured unknown
c0::dsk/c0t1d0 disk connected configured unknown

# metastat –c [It will show that bad disk is in maintenance state]

d7 m 10GB d17 d27 (maint)
d17 s 10GB c0t0d0s7
d27 s 10GB c0t1d0s7 (maint)
d4 m 40GB d14 d24 (maint)
d14 s 40GB c0t0d0s4
d24 s 40GB c0t1d0s4 (maint)
d3 m 40GB d13 d23 (maint)
d13 s 40GB c0t0d0s3
d23 s 40GB c0t1d0s3 (maint)
d1 m 2.0GB d11 d21 (maint)
d11 s 2.0GB c0t0d0s1
d21 s 2.0GB c0t1d0s1 (maint)
d0 m 3.0GB d10 d20 (maint)
d10 s 3.0GB c0t0d0s0
d20 s 3.0GB c0t1d0s0 (maint)
d5 m 40GB d15 d25 (maint)
d15 s 40GB c0t0d0s5
d25 s 40GB c0t1d0s5 (maint)

3. Remove mirror information from bad disk

# metadb -d /dev/dsk/c0t1d0s6 /dev/dsk/c0t1d0s6
# metadetach -f d5 d25
# metadetach -f d0 d20
# metadetach -f d1 d21
# metadetach -f d3 d23
# metadetach -f d4 d24
# metadetach -f d7 d27
# metaclear d25
# metaclear d20
# metaclear d21
# metaclear d23
# metaclear d24
# metaclear d27

4. Check the successful mirror reduction:
  
# metastat –c
# metadb

5. Unconfigure disk in Solaris

# cfgadm -c unconfigure c0::dsk/c0t1d0

Shutdown the machine and physically replace the disk

6. Shutdown the machine and physically replace the faulty disk. [On some servers we do not need to shutdown the machine, disk are hot swappable, please consult machine documentation]
# init 5

7. Configure new disk

# cfgadm -c configure c0::dsk/c0t1d0 

8. Verify that disk is visible and there is no error

# echo | format [It will show you c0t1d0 disk]
# iostat –En /dev/dsk/c0t1d0
# cfgadm -al

9. Copy partition table from root disk [in this case we assume it is /dev/dsk/c0t0d0]

# prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2

10. Install boot block
   
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s2

11. Create state database replicas on new disk

# metadb –a –c 2 c0t1d0s6 [Verify from old metadb output that how many replicas you need on new disk and on which slice. If need three replicas run same command with –c 3]

 12. Check that replicas created

# metadb [It should show you same number of replicas on both disks and on same slice]

13. Create meta devices on new disk
# metainit -f d20 1 1 c0t1d0s0
# metainit -f d21 1 1 c0t1d0s1
# metainit -f d23 1 1 c0t1d0s3
# metainit -f d24 1 1 c0t1d0s4
# metainit -f d25 1 1 c0t1d0s5
# metainit -f d27 1 1 c0t1d0s7

14. Create mirror or synchronize data on new disk
# metattach d0 d20
# metattach d5 d25
# metattach d1 d21
# metattach d3 d23
# metattach d4 d24
# metattach d7 d27

15. Check that mirror is sync’ing

# metastat –c [It will tell you how much data has been sync’ed on each slice]