DRAFT, not yet done...

Dynamic Reconfiguration (DR) of a Sun Server

Forward I resonantly bought an used Sun E3000 so that I could play around with it's AP (Alternate Pathing) and DR (Dynamic Reconfiguration) capabilities.  AP and DR are not well known, so this a history of what I have found out (starting with DR, I'll work with AP later).
General Info Sun mid-range servers have the ability to hot swap their CPU/Memory and I/O boards.  This is because of their architecture.  They are a chassis type server that has a backplane (Gigaplane) that has the major components plugged into it (system boards).  There are several types of system boards: Clock, CPU/Memory, SBUS I/O, PCI I/O, and Graphics I/O.  Examples of system boards are the CPU/Memory board which can have up to 2 CPUs and 2 banks of memory or the SBUS I/O board which has 3 SBUS slots (Sun expansion cards like a quad fast ethernet or SCSI controller), a Fast-Wide SCSI controller, and a 10/100 ethernet port.
What servers can do DR? The entry level server that can do DR is the E3000.  Anything above it can also do DR (E3000 - E6500).  The E10000 (Starfire) can also do DR, but it is different since it is almost a supercomputer.

Front and back views of the Sun E3000


A pdf file on DR from Sun is located here.

First, the system that this documentation is based on is a Sun E3000 (it is a 4 slot server).  It has:

Obviously, since this machine only has one CPU/Memory board I won't be able to DR a CPU/Memory board.  Instead, I will focus on the I/O boards (since the machine has two of them).

Step 1:  Make sure that the PROM (fiirmware) version on all of the boards support DR.  You can do this by typing ".version" at the OK prompt or by looking at "dmesg".  My dmesg says:

May 12 13:28:48 bigmac sysctrl: [ID 979883 kern.info] NOTICE: Firmware supports
Dynamic Reconfiguration of CPU/Memory boards.
May 12 13:28:48 bigmac sysctrl: [ID 787141 kern.info] NOTICE: Firmware supports
Dynamic Reconfiguration of I/O board types 1, 4.

Now, you need to make sure that the kernel is set for DR:

From /etc/system:
set soc:soc_enable_detach_suspend=1
set pln:pln_enable_detach_suspend=1
set kernel_cage_enable=1
Note: the kernel cage only needs to be enabled if you are going DR CPU/Memory boards.

If you were going to DR CPU/Memory boards you would have to also make sure that memory interleaving was set to "min".  However, I won't be DR'ing a CPU/Memory board so I will leave mine set to max.  The interleave is set in the eeprom, my eeprom shows:

# eeprom
disabled-memory-list: data not available.
disabled-board-list: data not available.
memory-interleave=max
configuration-policy=component
scsi-initiator-id=7
keyboard-click?=false
keymap: data not available.
ttyb-rts-dtr-off=false
ttyb-ignore-cd=true
ttya-rts-dtr-off=false
ttya-ignore-cd=true
ttyb-mode=9600,8,n,1,-
ttya-mode=9600,8,n,1,-
sbus-specific-probe: data not available.
sbus-probe-default=d3120
mfg-mode=off
diag-level=min
powerfail-time=0
#power-cycles=1426063405
fcode-debug?=false
output-device=screen
input-device=keyboard
load-base=16384
boot-command=boot
auto-boot?=true
watchdog-reboot?=false
diag-file: data not available.
diag-device=disk diskbrd diskisp disksoc net
boot-file: data not available.
boot-device=diskbrd:a disk diskbrd diskisp disksoc net
local-mac-address?=false
ansi-terminal?=true
screen-#columns=80
screen-#rows=34
silent-mode?=false
use-nvramrc?=false
nvramrc: data not available.
security-mode=none
security-password: data not available.
security-#badlogins=2684354560
oem-logo: data not available.
oem-logo?=false
oem-banner: data not available.
oem-banner?=false
hardware-revision: data not available.
last-hardware-update=
diag-switch?=false
#

And prtdiag shows:
System Configuration:  Sun Microsystems  sun4u 4-slot Sun Enterprise 3000
System clock frequency: 82 MHz
Memory size:  512Mb

========================= CPUs =========================

                    Run   Ecache   CPU    CPU
Brd  CPU   Module   MHz     MB    Impl.   Mask
---  ---  -------  -----  ------  ------  ----
 7    14     0      248     1.0   US-II    1.1
 7    15     1      248     1.0   US-II    1.1
 

========================= Memory =========================

                                              Intrlv.  Intrlv.
Brd   Bank   MB    Status   Condition  Speed   Factor   With
---  -----  ----  -------  ----------  -----  -------  -------
 7     0     256   Active      OK       60ns    2-way     A
 7     1     256   Active      OK       60ns    2-way     A

========================= IO Cards =========================

     Bus   Freq
Brd  Type  MHz   Slot  Name                              Model
---  ----  ----  ----  --------------------------------  ----------------------
 1   SBus   25     2   cgsix                             SUNW,501-2325
 1   SBus   25     3   SUNW,hme
 1   SBus   25     3   SUNW,fas/sd (block)
 1   SBus   25    13   SUNW,soc                          501-2069
 3   SBus   25     3   SUNW,hme
 3   SBus   25     3   SUNW,fas/sd (block)
 3   SBus   25    13   SUNW,soc                          501-2069

No failures found in System
===========================

No System Faults found
======================

And, cfgadm shows:
# cfgadm -l
Ap_Id                          Type         Receptacle   Occupant     Condition
ac0:bank0                      memory       connected    configured   ok
ac0:bank1                      memory       connected    configured   ok
c0                                     scsi-bus     connected    configured   unknown
c1                                      scsi-bus     connected    unconfigured unknown
sysctrl0:slot1                 dual-sbus    connected    configured   ok
sysctrl0:slot3                 dual-sbus    connected    configured   ok
sysctrl0:slot5                 unknown      empty        unconfigured unknown
sysctrl0:slot7                 cpu/mem      connected    configured   ok

A test of quiece shows that that machine can actually do DR:
# cfgadm -x quiesce-test sysctrl0:slot1

NOTE: The machine will freeze for up to 1 minute when you do this.  However, since it came back I know that quiesce works!

Now, alittle hardware reconfiguration.  There are several restrictions (until I get AP working):

I need to move my primary interface to hme0 ....  OK, done...

Ok, the main network connection (hostname: bigmac) has been moved to hme0 on the first I/O board (slot1).  The second I/O board is now free (slot3).

Now, unconfigure slot3:
bigmac# cfgadm -c unconfigure sysctrl0:slot3

And, dmesg shows that slot3 is unconfigured:
May 15 20:38:15 bigmac pseudo: [ID 129642 kern.info] pseudo-device: devinfo0
May 15 20:38:15 bigmac genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo@0
May 15 20:38:15 bigmac sysctrl: [ID 523642 kern.notice] NOTICE: unconfiguring dual-sbus board in slot 3
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,hme@3,8c00000 (hme1) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@0,0 (sd15) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@1,0 (sd16) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@2,0 (sd17) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@3,0 (sd18) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@4,0 (sd19) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@5,0 (sd20) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@6,0 (sd21) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@8,0 (sd22) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@9,0 (sd23) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@a,0 (sd24) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@b,0 (sd25) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@c,0 (sd26) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@d,0 (sd27) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@e,0 (sd28) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000/sd@f,0 (sd29) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000 (fas1) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@7,0 (sbus3) offline
May 15 20:38:15 bigmac genunix: [ID 408114 kern.info] /sbus@6,0 (sbus2) offline
May 15 20:38:15 bigmac sysctrl: [ID 549876 kern.notice] NOTICE: dual-sbus board in slot 3 is unconfigured
bigmac[barnesr]27:

Now, detach slot3:
bigmac# cfgadm -c disconnect sysctrl0:slot3

And, dmesg shows that slot3 is disconnected:
May 15 20:41:36 bigmac sysctrl: [ID 523642 kern.notice] NOTICE: disconnecting dual-sbus board in slot 3
May 15 20:41:36 bigmac genunix: [ID 408114 kern.info] /fhc@6,f8800000/ac@0,1000000 (ac2) offline
May 15 20:41:36 bigmac genunix: [ID 408114 kern.info] /fhc@6,f8800000/environment@0,400000 (environ2) offline
May 15 20:41:36 bigmac genunix: [ID 408114 kern.info] /fhc@6,f8800000 (fhc2) offline
May 15 20:41:37 bigmac sysctrl: [ID 549876 kern.notice] NOTICE: dual-sbus board in slot 3 is disconnected
May 15 20:41:37 bigmac sysctrl: [ID 258214 kern.notice] NOTICE: board 3 is ready to remove
May 15 20:41:37 bigmac sysctrl: [ID 404430 kern.notice] NOTICE: Redundant power available

And, cfgadm -l now shows:
bigmac# cfgadm -l
Ap_Id                          Type         Receptacle   Occupant     Condition
ac0:bank0                      memory       connected    configured   ok
ac0:bank1                      memory       connected    configured   ok
c0                                     scsi-bus     connected    configured   unknown
sysctrl0:slot1                 dual-sbus    connected    configured   ok
sysctrl0:slot3                 dual-sbus    disconnected unconfigured unknown
sysctrl0:slot5                 unknown      empty        unconfigured unknown
sysctrl0:slot7                 cpu/mem      connected    configured   ok
bigmac#

The status LEDs on board 3 (slot3) have now changed to off, on (orange), off, orange indicating a hardware fault.  The status LEDs on the other boards continue normally (on, off, on (blink)).  The board in slot 3 can now safely be removed from the system WHILE THE SYSTEM CONTINUES TO RUN!!!!  This is SOOO COOL!

Now, it's time to bring board 3 back online.

First,  reconnect it:
bigmac# cfgadm -c connect sysctrl0:slot3
system will be temporarily suspended to connect a board: proceed (yes/no)? yes
bigmac#

And, dmesg shows:
May 15 20:53:24 bigmac sysctrl: [ID 523642 kern.notice] NOTICE: connecting dual-sbus board in slot 3
May 15 20:53:36 bigmac rootnex: [ID 349649 kern.info] fhc2 at root: UPA 0x6 0xf8800000
May 15 20:53:36 bigmac genunix: [ID 936769 kern.info] fhc2 is /fhc@6,f8800000
May 15 20:53:36 bigmac genunix: [ID 408114 kern.info] /fhc@6,f8800000 (fhc2) online
May 15 20:53:36 bigmac genunix: [ID 936769 kern.info] ac2 is /fhc@6,f8800000/ac@0,1000000
May 15 20:53:36 bigmac genunix: [ID 408114 kern.info] /fhc@6,f8800000/ac@0,1000000 (ac2) online
May 15 20:53:36 bigmac genunix: [ID 936769 kern.info] environ2 is /fhc@6,f8800000/environment@0,400000
May 15 20:53:36 bigmac genunix: [ID 408114 kern.info] /fhc@6,f8800000/environment@0,400000 (environ2) online
May 15 20:53:36 bigmac sysctrl: [ID 549876 kern.notice] NOTICE: dual-sbus board in slot 3 is connected
May 15 20:53:37 bigmac sysctrl: [ID 459609 kern.warning] WARNING: Redundant power lost
May 15 20:53:38 bigmac hme: [ID 517527 kern.info] SUNW,hme0 : Internal Transceiver Selected.
May 15 20:53:38 bigmac hme: [ID 517527 kern.info] SUNW,hme0 : Auto-Negotiated  100 Mbps Half-Duplex Link Up

Now,  configure the board:
bigmac# cfgadm -c configure sysctrl0:slot3
bigmac#

And, dmesg shows:
May 15 20:55:37 bigmac sysctrl: [ID 523642 kern.notice] NOTICE: configuring dual-sbus board in slot 3
May 15 20:55:37 bigmac rootnex: [ID 349649 kern.info] sbus2 at root: UPA 0x6 0x0 ...
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] sbus2 is /sbus@6,0
May 15 20:55:37 bigmac genunix: [ID 408114 kern.info] /sbus@6,0 (sbus2) online
May 15 20:55:37 bigmac sbus: [ID 349649 kern.info] sbusmem0 at sbus0: SBus0 slot 0x1 offset 0x0
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] sbusmem0 is /sbus@2,0/sbusmem@1,0
May 15 20:55:37 bigmac sbus: [ID 349649 kern.info] sbusmem1 at sbus0: SBus0 slot 0x2 offset 0x0
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] sbusmem1 is /sbus@2,0/sbusmem@2,0
May 15 20:55:37 bigmac sbus: [ID 349649 kern.info] sbusmem2 at sbus0: SBus0 slot 0xd offset 0x0
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] sbusmem2 is /sbus@2,0/sbusmem@d,0
May 15 20:55:37 bigmac sbus: [ID 349649 kern.info] sbusmem3 at sbus1: SBus1 slot 0x0 offset 0x0
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] sbusmem3 is /sbus@3,0/sbusmem@0,0
May 15 20:55:37 bigmac sbus: [ID 349649 kern.info] sbusmem4 at sbus1: SBus1 slot 0x3 offset 0x0
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] sbusmem4 is /sbus@3,0/sbusmem@3,0
May 15 20:55:37 bigmac sbus: [ID 349649 kern.info] sbusmem5 at sbus2: SBus2 slot 0x1 offset 0x0
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] sbusmem5 is /sbus@6,0/sbusmem@1,0
May 15 20:55:37 bigmac sbus: [ID 349649 kern.info] sbusmem6 at sbus2: SBus2 slot 0x2 offset 0x0
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] sbusmem6 is /sbus@6,0/sbusmem@2,0
May 15 20:55:37 bigmac sbus: [ID 349649 kern.info] sbusmem7 at sbus2: SBus2 slot 0xd offset 0x0
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] sbusmem7 is /sbus@6,0/sbusmem@d,0
May 15 20:55:37 bigmac genunix: [ID 408114 kern.info] /sbus@2,0/sbusmem@1,0 (sbusmem0) online
May 15 20:55:37 bigmac genunix: [ID 408114 kern.info] /sbus@2,0/sbusmem@2,0 (sbusmem1) online
May 15 20:55:37 bigmac genunix: [ID 408114 kern.info] /sbus@2,0/sbusmem@d,0 (sbusmem2) online
May 15 20:55:37 bigmac genunix: [ID 408114 kern.info] /sbus@3,0/sbusmem@0,0 (sbusmem3) online
May 15 20:55:37 bigmac genunix: [ID 408114 kern.info] /sbus@3,0/sbusmem@3,0 (sbusmem4) online
May 15 20:55:37 bigmac genunix: [ID 408114 kern.info] /sbus@6,0/sbusmem@1,0 (sbusmem5) online
May 15 20:55:37 bigmac genunix: [ID 408114 kern.info] /sbus@6,0/sbusmem@2,0 (sbusmem6) online
May 15 20:55:37 bigmac genunix: [ID 408114 kern.info] /sbus@6,0/sbusmem@d,0 (sbusmem7) online
May 15 20:55:37 bigmac soc: [ID 854183 kern.info] ID[SUNWssa.soc.driver.1010] soc0:: host adapter fw date code: Wed Jan 17 20:34:59 1996
May 15 20:55:37 bigmac
May 15 20:55:37 bigmac sbus: [ID 349649 kern.info] soc0 at sbus0: SBus0 slot 0xd offset 0x10000 Onboard device sparc9 ipl 5
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] soc0 is /sbus@2,0/SUNW,soc@d,10000
May 15 20:55:37 bigmac soc: [ID 854183 kern.info] ID[SUNWssa.soc.driver.1010] soc1:: host adapter fw date code: Wed Jan 17 20:34:59 1996
May 15 20:55:37 bigmac
May 15 20:55:37 bigmac sbus: [ID 349649 kern.info] soc1 at sbus2: SBus2 slot 0xd offset 0x10000 Onboard device sparc9 ipl 5
May 15 20:55:37 bigmac genunix: [ID 936769 kern.info] soc1 is /sbus@6,0/SUNW,soc@d,10000
May 15 20:55:37 bigmac genunix: [ID 408114 kern.info] /sbus@6,0/SUNW,soc@d,10000 (soc1) online
May 15 20:55:48 bigmac rootnex: [ID 349649 kern.info] sbus3 at root: UPA 0x7 0x0 ...
May 15 20:55:48 bigmac genunix: [ID 936769 kern.info] sbus3 is /sbus@7,0
May 15 20:55:48 bigmac genunix: [ID 408114 kern.info] /sbus@7,0 (sbus3) online
May 15 20:55:48 bigmac sbus: [ID 349649 kern.info] sbusmem8 at sbus3: SBus3 slot 0x0 offset 0x0
May 15 20:55:48 bigmac genunix: [ID 936769 kern.info] sbusmem8 is /sbus@7,0/sbusmem@0,0
May 15 20:55:48 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/sbusmem@0,0 (sbusmem8) online
May 15 20:55:48 bigmac sbus: [ID 349649 kern.info] sbusmem9 at sbus3: SBus3 slot 0x3 offset 0x0
May 15 20:55:48 bigmac genunix: [ID 936769 kern.info] sbusmem9 is /sbus@7,0/sbusmem@3,0
May 15 20:55:48 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/sbusmem@3,0 (sbusmem9) online
May 15 20:55:48 bigmac hme: [ID 517527 kern.info] SUNW,hme1 : Sbus (Rev Id = 22) Found
May 15 20:55:48 bigmac sbus: [ID 349649 kern.info] hme1 at sbus3: SBus3 slot 0x3 offset 0x8c00000 and slot 0x3 offset 0x8c02000 and slot 0x3 offset 0x8c04000 and slot 0x3 offset 0x8c06000 and slot 0x3 offset 0x8c07000 SBus level 4 sparc9 ipl 7
May 15 20:55:48 bigmac genunix: [ID 936769 kern.info] hme1 is /sbus@7,0/SUNW,hme@3,8c00000
May 15 20:55:48 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,hme@3,8c00000 (hme1) online
May 15 20:55:48 bigmac scsi: [ID 365881 kern.info] /sbus@7,0/SUNW,fas@3,8800000 (fas1):
May 15 20:55:48 bigmac  rev 2.2 FEPS chip
May 15 20:55:48 bigmac sbus: [ID 349649 kern.info] fas1 at sbus3: SBus3 slot 0x3 offset 0x8800000 and slot 0x3 offset 0x8810000 SBus level 3 sparc9 ipl 5
May 15 20:55:48 bigmac genunix: [ID 936769 kern.info] fas1 is /sbus@7,0/SUNW,fas@3,8800000
May 15 20:55:48 bigmac genunix: [ID 408114 kern.info] /sbus@7,0/SUNW,fas@3,8800000 (fas1) online
May 15 20:55:58 bigmac sysctrl: [ID 549876 kern.notice] NOTICE: dual-sbus board in slot 3 is configured

And, cfgadm -l shows:
bigmac# cfgadm -l
Ap_Id                          Type         Receptacle   Occupant     Condition
ac0:bank0                      memory       connected    configured   ok
ac0:bank1                      memory       connected    configured   ok
c0                             scsi-bus     connected    configured   unknown
c1                             scsi-bus     connected    unconfigured unknown
sysctrl0:slot1                 dual-sbus    connected    configured   ok
sysctrl0:slot3                 dual-sbus    connected    configured   ok
sysctrl0:slot5                 unknown      empty        unconfigured unknown
sysctrl0:slot7                 cpu/mem      connected    configured   ok

The LEDs on all of the boards have returned to normal (on, off, on (blink)) and the system is ready for business!

Next up, AP (Alternative Pathing) configuration.....  On another day...
 
 


Copyright © 1993-2001 by Robert Barnes

Return to Unixhub's home page