exadata(硬件更换文档部分)

Posted 2020-09-20 mfmdaoyou

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了exadata(硬件更换文档部分)相关的知识，希望对你有一定的参考价值。

Maintaining Flash Disks

Replacing a Flash Disk Due to Flash Disk Failure

Each Exadata Storage Server is equipped with four F20 PCIe cards. Each card has four flash disks (FDOMs) for a total of 16 flash disks. The four F20 PCIe cards are present in PCI slot numbers 1, 2, 4, and 5. The F20 PCIe cards are not hot-pluggable such that Exadata Storage Server must be powered down before replacing the flash disks or cards.

To identify a failed flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS=critical DETAIL

         name:                   [9:0:2:0]
         diskType:               FlashDisk
         id:                     508002000092e70FMOD2
         luns:                   1_2
         makeModel:              "MARVELL SD88SA02"
         physicalFirmware:       D20R
         physicalInsertTime:     2009-10-27T13:11:16-07:00
         physicalInterface:      sas
         physicalSerial:         508002000092e70FMOD2
         physicalSize:           22.8880615234375G
         slotNumber:             "PCI Slot: 1; FDOM: 2"
         status:                 critical

The slotNumber attribute shows the PCI slot and the FDOM number.

If an flash disk is detected to have failed, then an alert is generated indicating that the flash disk, as well as the LUN on it, has failed. The alert message includes the PCI slot number of the flash card, and the exact FDOM number. These numbers uniquely identify the field replaceable unit (FRU). If you have configured the system for alert notification, then the alert will be sent by e-mail message to the designated address.

A flash disk outage can cause reduction in performance and data redundancy. The failed disk should be replaced with a new flash disk at the earliest opportunity. If the flash disk is used for flash cache, then the effective cache size for the cell is reduced. If the flash disk is used for grid disks, then the Oracle ASM disks associated with these grid disks are automatically dropped with the FORCE option from the Oracle ASM disk group and an Oracle ASM rebalance will ensue to restore the data redundancy.

The following procedure describes how to replace an FDOM due to disk failure

Inactivate all grid disks on the cell.
Shut down the cell.
Replace the failed flash disk based on the PCI number and FDOM number.
Power up the cell. The cell services are started automatically.
Bring all grid disks online using the following command:
```
CellCLI> ALTER GRIDDISK ALL ACTIVE
```
Verify that all grid disks have been successfully put online using the following command:
```
CellCLI> LIST GRIDDISK ATTRIBUTES asmmodestatus
```
Wait until asmmodestatus shows ONLINE for all grid disks.

The new flash disk is automatically used by the system. If the flash disk is used for flash cache, then the effective cache size will increase. If the flash disk is used for grid disks, then the grid disks will be re-created on the new flash disk. If those grid disks were part of an Oracle ASM disk group, then they will be added back to the disk group and the data will be rebalanced on them based on the disk group redundancy and ASM_POWER_LIMIT parameter.

Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the rebalance, do the following:

The rebalance operation may have been successfully run. Check the Oracle ASM alert logs to confirm.
The rebalance operation may be currently running. Check the GV$ASM_OPERATION view to determine if the rebalance operation is still running.
The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view to determine if the rebalance operation failed.
Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations are queued.

Replacing a Flash Disk Due to Flash Disk Problems

Exadata Storage Server is equipped with four F20 PCIe cards. Each card has four flash disks (FDOMs) for a total of 16 flash disks. The four F20 PCIe cards are present on PCI slot numbers 1, 2, 4, and 5. The F20 PCIe cards are not hot-pluggable such that Exadata Storage Server must be powered down before replacing the flash disks or cards.

You may need to replace a flash disk because the disk is in predictive failure status or poor performance status.

To identify a predictive failure flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS=‘predictive          failure‘ DETAIL

         name: [9:0:2:0]
         diskType: FlashDisk
         id: 508002000092e70FMOD2
         luns: 1_2
         makeModel: "MARVELL SD88SA02"
         physicalFirmware: D20R
         physicalInsertTime: 2009-10-27T13:11:16-07:00
         physicalInterface: sas
         physicalSerial: 508002000092e70FMOD2
         physicalSize: 22.8880615234375G
         slotNumber: "PCI Slot: 1; FDOM: 2"
         status: predictive failure

To identify a poor performance flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS= ‘poor performance‘ DETAIL

         name: [9:0:2:0]
         diskType: FlashDisk
         id: 508002000092e70FMOD2
         luns: 1_2
         makeModel: "MARVELL SD88SA02"
         physicalFirmware: D20R
         physicalInsertTime: 2009-10-27T13:11:16-07:00
         physicalInterface: sas
         physicalSerial: 508002000092e70FMOD2
         physicalSize: 22.8880615234375G
         slotNumber: "PCI Slot: 1; FDOM: 2"
         status: poor performance

An alert is generated when a flash disk is in predictive failure or poor performance status. The alert includes specific instructions for replacing the flash disk. If you have configured the system for alert notifications, then the alerts are sent by e-mail message to the designated address.

Flash disk predictive failure status indicates that the flash disk will fail soon, and should be replaced at the earliest opportunity. If the flash disk is used for flash cache, then it will continue to be used as flash cache. If the flash disk is used for grid disks, then the Oracle ASM disks associated with these grid disks are automatically dropped and Oracle ASM rebalance will relocate the data from the predictively failed disk to other disks. Wait until the Oracle ASM disks have been successfully dropped by querying the V$ASM_DISK_STAT view before proceeding with the flash disk replacement. If the normal drop did not complete before the flash disk fails, then the Oracle ASM disks will be automatically dropped with FORCE option from the Oracle ASM disk group.

If the DROP command did not complete before the flash disk fails, then refer to "Replacing a Flash Disk Due to Flash Disk Failure".

Flash disk poor performance status indicates that the flash disk demonstrates extremely poor performance, and should be replaced at the earliest opportunity. If the flash disk is used for flash cache, then flash cache will be dropped from this disk thus reducing the effective flash cache size for Exadata Storage Server. If the flash disk is used for grid disks, then the Oracle ASM disks associated with the grid disks on this flash disk are automatically dropped with FORCE option if possible. If DROP...FORCE cannot succeed due to offline partners, then the grid disks will be automatically dropped normally and Oracle ASM rebalance will relocate the data from the poor performance disk to other disks.

The following procedure describes how to replace a flash disk due to disk problems:

Shut down the cell.
Replace the failed flash disk based on the PCI number and FDOM number.
Power up the cell. The cell services are started automatically.
Bring all grid disks are online using the following command:
```
CellCLI> ALTER GRIDDISK ALL ACTIVE
```
Verify that all grid disks have been successfully put online using the following command:
```
CellCLI> LIST GRIDDISK ATTRIBUTES asmmodestatus
```
Wait until asmmodestatus shows ONLINE for all grid disks.

The new flash disk is automatically used by the system. If the flash disk is used for flash cache, then the effective cache size will increase. If the flash disk is used for grid disks, then the grid disks will be re-created on the new flash disk. If those gird disks were part of an Oracle ASM disk group, then they will be added back to the disk group and the data will be rebalanced on them based on the disk group redundancy and ASM_POWER_LIMIT parameter.

Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the rebalance, do the following:

The rebalance operation may have been successfully run. Check the Oracle ASM alert logs to confirm.
The rebalance operation may be currently running. Check the GV$ASM_OPERATION view to determine if the rebalance operation is still running.
The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view to determine if the rebalance operation failed.
Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains Oracle ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations are queued.

Removing Flash Disk Due to Bad Performance

A single bad flash disk can degrade the performance of other good flash disks. It is better to remove the bad flash disk from the system than let it remain. To identify a bad flash disk, use the CALIBRATE command, and look for very low throughput and IOPS for each flash disk.

If a flash disk exhibits extremely poor performance, then it will be marked as poor performance. The flash cache on that flash disk will be automatically disabled, and the grid disks on that flash disk will be automatically dropped from the Oracle ASM disk group.

The following procedure describes how to remove a flash drive once the bad flash disk has been identified:

If the flash disk is used for flash cache, then disable flash cache that is part of the flash disk using the following commands:
```
CellCLI > DROP FLASHCACHE
CellCLI > CREATE FLASHCACHE CELLDISK=‘fd1,fd2,fd3,fd4, ...‘ 
```
Note:
Do not include the bad flash disk when creating the new flash cache.
If the flash disk is used for grid disks, then use the following command to direct Oracle ASM to stop using the bad disk at once:
```
SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name FORCE 
```
It is possible that the DROP command with FORCE option could fail due to offline partners. Either restore the Oracle ASM data redundancy by correcting other cell or disk failures and retry DROP...FORCE, or use the following command to direct Oracle ASM to rebalance the data out of the bad disk:
```
SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name  NOFORCE
```
Wait until the Oracle ASM disks associated with the grid disks on this bad flash disk have been successfully dropped. Oracle Exadata Storage Server Software will automatically send an alert when it is safe to replace the flash disk.
Shut down the cell.

See Also:
"Shutting Down Exadata Storage Server"
Remove the bad flash disk and replace it with a new flash disk.
Power up the cell. The cell services are started automatically.
Add the new flash disk to flash cache using the following commands:
```
CellCLI> DROP FLASHCACHE
CellCLI> CREATE FLASHCACHE ALL
```
Bring all grid disks online using the following command:
```
CellCLI> ALTER GRIDDISK ALL ACTIVE
```
Verify that all grid disks have been successfully put online using the following command:
```
CellCLI> LIST GRIDDISK ATTRIBUTES asmmodestatus
```
Wait until asmmodestatus shows ONLINE for all grid disks.

The flash disks are added as follows:

If the flash disk is used for grid disks, then the grid disks will be re-created on the new flash disk.
If these grid disks were part of an Oracle ASM disk group and DROP...FORCE was used in Step 2, then they will be added back to the disk group and the data will be rebalanced on based on disk group redundancy and the ASM_POWER_LIMIT parameter.
If DROP...NOFORCE was used in Step 2, then you must manually add the grid disks back to the Oracle ASM disk group.

Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the rebalance, do the following:

The rebalance operation may have been successfully run. Check the Oracle ASM alert logs to confirm.
The rebalance operation may be currently running. Check the GV$ASM_OPERATION view to determine if the rebalance operation is still running.
The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view to determine if the rebalance operation failed.
Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains Oracle ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations are queued.

总结：更换缓存卡要停存储节点！且处理影响的对应的计算节点。

Maintaining the Physical Disks of Exadata Storage Servers

The first two disks of Exadata Storage Server are system disks. Oracle Exadata Storage Server Software system software resides on a portion of each of the system disks. These portions on both system disks are referred to as the system area. The nonsystem area of the system disks, referred to as data partitions, is used for normal data storage. All other disks in the cell are called data disks.

You can monitor a physical disk by checking its attributes with the CellCLI LIST PHYSICALDISK command. For example, a physical disk status equal to critical, or predictive failure is probably having problems and needs to be replaced. The disk firmware maintains the error counters, and marks a drive with Predictive Failure when internal thresholds are exceeded. The drive, not the cell software, determines if it needs replacement.

When disk I/O errors occur, Oracle ASM performs bad extent repair for read errors due to media errors. The disks will stay online, and no alerts are sent. When Oracle ASM gets a read error on a physically-addressed metadata block, it does not have mirroring for the blocks, and takes the disk offline. Oracle ASM force drops the disk.

Note:

Oracle Exadata Rack is online and available while replacing the physical disks of Exadata Storage Server.

This section contains the following topics:

Replacing a Physical Disk Due to Disk Failure

A physical disk outage can cause a reduction in performance and data redundancy. Therefore, the disk should be replaced with a new disk as soon as possible. When the disk fails, the Oracle ASM disks associated with the grid disks on the physical disk are automatically dropped with FORCE option, and an Oracle ASM rebalance follows to restore the data redundancy.

An Exadata alert is generated when a disk fails. The alert includes specific instructions for replacing the disk. If you have configured the system for alert notifications, then the alert is sent by e-mail to the designated address.

After the physical disk is replaced, the grid disks and cell disks that existed on the previous disk in that slot is re-created on the new physical disk. If those grid disks were part of an Oracle ASM group, then they will be added back to the disk group and the data will be rebalanced on them, based on the disk group redundancy and ASM_POWER_LIMIT parameter.

Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the rebalance, do the following:

The rebalance operation may have been successfully run. Check the Oracle ASM alert logs to confirm.
The rebalance operation may be currently running. Check the GV$ASM_OPERATION view to determine if the rebalance operation is still running.
The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view to determine if the rebalance operation failed.
Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations will be queued.

The following procedure describes how to replace a disk due to disk failure:

Determine the failed disk using the following command:

CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status=critical DETAIL

The following is an example of the output from the command. The slot number shows the location of the disk, and the status shows that the disk has failed.

CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status=critical DETAIL

         name:                   28:5
         deviceId:               21
         diskType:               HardDisk
         enclosureDeviceId:      28
         errMediaCount:          0
         errOtherCount:          0
         foreignState:           false
         luns:                   0_5
         makeModel:              "SEAGATE ST360057SSUN600G"
         physicalFirmware:       0705
         physicalInterface:      sas
         physicalSerial:         A01BC2
         physicalSize:           558.9109999993816G
         slotNumber:         5
        status:             critical

Replace the physical disk on Exadata Storage Server and wait for three minutes. The physical disk is hot-pluggable, and can be replaced when the power is on.
Confirm the disk is online.

When you replace a physical disk, the disk must be acknowledged by the RAID controller before you can use it. This does not take long. Use the LIST PHYSICALDISKcommand similar to the following to ensure the status is NORMAL.
```
CellCLI> LIST PHYSICALDISK WHERE name=28:5 ATTRIBUTES status
```
Verify the firmware is correct using the ALTER CELL VALIDATE CONFIGURATION command.

In rare cases, the automatic firmware update may not work, and the LUN will not be rebuilt. This can be confirmed by checking the ms-odl.trc file.

See Also:

"Parts for Exadata Storage Servers"
Oracle Database Reference for information about the V$ASM_OPERATION view
Oracle Automatic Storage Management Administrator‘s Guide

Removing a Physical Disk Due to Bad Performance

A single bad physical disk can degrade the performance of other good disks. It is better to remove the bad disk from the system than let it remain. To identify a bad physical disk, use the CALIBRATE command, and look for very low throughput and IOPS for each physical disk.

The following procedure describes how to remove a physical disk once the bad disk has been identified:

Illuminate the physical drive service LED to identify the drive to be replaced using the following command:
```
cellcli -e ‘alter physicaldisk disk_name serviceled on‘
```
In the preceding command, disk_name is the name of the physical disk to be replaced, such as 20:2.
Find all the grid disks on the bad disk. Use the following command to direct Oracle ASM to stop using the bad disk immediately:
```
ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name
```
Wait until the Oracle ASM disks associated with the grid disks on the bad disk have been successfully dropped by querying the V$ASM_DISK_STAT view.
Remove the badly-performing disk. When you remove the disk, you get an alert.
When a new disk is available, install the new disk in the system. The cell disks and grid disks are automatically created on the new physical disk.

Note:
When you replace a physical disk, the disk must be acknowledged by the RAID controller before you can use it. This does not take long, but you should use the LIST PHYSICALDISK command to ensure the status is NORMAL.

See Also:

Moving All Drives from One Exadata Storage Server to Another Exadata Storage Server

You may need to move all drives from one Exadata Storage Server to another Exadata Storage Server. This may be necessary when there is a chassis-level component failure, such as a motherboard or ILOM failure, or when troubleshooting a hardware problem.

The following procedure describes how to move the drives:

Back up the files in the following directories:
- /etc/hosts
- /etc/modprobe.conf
- /etc/sysconfig/network
- /etc/sysconfig/network-scripts
Refer to "Shutting Down Exadata Storage Server" to safely inactivate all grid disks and shut down Exadata Storage Server. Makes sure the Oracle ASMdisk_repair_time attribute is set sufficiently long enough so Oracle ASM does not drop the disks before the grid disks can be activated in another Exadata Storage Server.
Move the physical disks, flash disks, disk controller and USB flash drive from the original Exadata Storage Server to the new Exadata Storage Server.

Caution:
Ensure the first two disks, which are the system disks, are in the same first two slots. Failure to do so causes the Exadata Storage Server to function improperly.
Ensure the flash cards are installed in the same PCIe slots as the original Exadata Storage Server.
Power on the new Exadata Storage Server using either the service processor interface or by pressing the power button.
Log in to the console using the service processor or the KVM switch.
Check the files in the following directories. If they are corrupted, then restore them from the backups.
- /etc/hosts
- /etc/modprobe.conf
- /etc/sysconfig/network
- /etc/sysconfig/network-scripts

Use the ifconfig command to retrieve the new MAC address for eth0, eth1, eth2, and eth3. The following is an example of retrieving the eth0 MAC address:

# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:14:4F:CA:D9:AE
          inet addr:10.204.74.184  Bcast:10.204.75.255  Mask:255.255.252.0
          inet6 addr: fe80::214:4fff:feca:d9ae/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:141455 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6340 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:9578692 (9.1 MiB)  TX bytes:1042156 (1017.7 KiB)
          Memory:f8c60000-f8c80000

Edit the ifcfg-eth0 file, ifcfg-eth1 file, ifcfg-eth2 file, and ifcfg-eth3 file in the /etc/sysconfig/network-scripts directory to change the HWADDR value based on the output from step 7. The following is an example of the ifcfg-eth0 file:

#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=eth0
BOOTPROTO=static
ONBOOT=yes
IPADDR=10.204.74.184
NETMASK=255.255.252.0
NETWORK=10.204.72.0
BROADCAST=10.204.75.255
GATEWAY=10.204.72.1
HOTPLUG=no
IPV6INIT=no
HWADDR=00:14:4F:CA:D9:AE

Restart Exadata Storage Server.
Activate the grid disks using the following command:
```
CellCLI> ALTER GRIDDISK ALL ACTIVE
```
If the Oracle ASM disk on the disks on this cell have not been dropped, then they will be changed to online automatically, and start getting used.
Validate the configuration using the following command:
```
ALTER CELL VALIDATE CONFIGURATION
```
Activate the ILOM for ASR as described in "Task 4 Activating ASR Assets".

See Also:

Repurposing a Physical Disk

You may want to delete all data on a disk, and then use the disk for another purpose. Before doing so, ensure that you have copies of the data that is on the disk.

The following procedure describes how to repurpose a disk:

Use the CellCLI LIST command to display the Exadata Storage Server objects. You must identify the grid disks and cell disks on the physical drive. For example:
```
CellCLI> LIST PHYSICALDISK
         20:0   D174LX    normal
         20:1   D149R0    normal
         ...
```
Determine the cell disks and grid disks on the LUN, using a command similar to the following:
```
CellCLI> LIST LUN ATTRIBUTES name, cellDisk WHERE physicalDrives=‘20:0‘
```
From Oracle ASM, drop the Oracle ASM disks on the physical disk using the following command:
```
ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name
```
From Exadata Storage Server, drop the cell disks and grid disks on the physical disk using the following command:
```
DROP CELLDISK celldisk_on_this_lun FORCE 
```
Note:
To overwrite all data on the cell disk, use the ERASE option with the DROP CELLDISK command. The following is an example of the command:
```
CellCLI> DROP CELLDISK CD_03_cell01 ERASE=1pass NOWAIT
CellDisk CD_03_cell01 erase is in progress
```
See Oracle Exadata Storage Server Software User‘s Guide for additional information.
Remove the disk to be repurposed, and insert a new disk.
Wait for the new physical disk to be added as a LUN.
```
CellCLI> LIST LUN
```
The cell disks and grid disks are automatically be created on the new physical disk, and the grid disks are added to the Oracle ASM group.

Removing and Replacing the Same Physical Disk

If you inadvertently remove the wrong physical disk, then put the disk back. It will automatically be added back in the Oracle ASM disk group, and its data will be resynchronized.

Note:

When replacing disk due to disk failure or disk problems, the LED is lit on the disk for identification.

总结：硬盘能够在线更换。

Replacing a Power Distribution Unit

A power distribution unit (PDU) can be replaced while Oracle Exadata Rack is online. The second PDU in the rack maintains the power to all components in the rack except for the KVM. The KVM is a non-critical item that is powered from the PDU-B side of the rack. PDU-A is on the left, and PDU-B is on the right when viewing the rack from the rear.

Reviewing the PDU Replacement Guidelines

Before replacing a PDU, the following guidelines should be reviewed to ensure the procedure is safe and does not disrupt availability:

Unlatching the InfiniBand cables while removing or inserting PDU-A may cause a loss of service due to nodes being removed from the cluster. This could cause the rack to be unavailable. Care should be taken when handling the InfiniBand cables, which are normally latched securely. Do not place excessive tension on the InfiniBand cables by pulling them.
Unhooking the wrong power feeds causes the rack to shut down. Trace the power cables running from the PDU that will be replaced to the power source, and only unplug those feeds.
Allow time to unpack and repack the PDU replacement parts. Note how the power cords are coiled in the packaging so the failed unit can be repacked the same way.
Removal of the side panel lessens the amount of time needed to replace the PDU. It is not necessary to remove the side panel to replace the PDU.
Use of a cordless drill or power screwdriver lessens the amount of time needed to replace the PDU. Allow more time for the replacement if using the hand wrench tool provided with the replacement rack. If using your own screwdriver, ensure you have Torx T30 and T25 bits.
It may be necessary to remove the server cable arms to move the power cables. If that is the case, then twist the plug connection and flex the cable arm connector to avoid having to unclip the cable arm. If it is necessary to unclip the cable arm, then support the cables with one hand, remove the power cord, and then clip the cable arm. Do not leave the cable arm hanging.
When removing the T30 screws from the L-bracket, do not remove the T25 screws or nuts that attach the PDU to the bracket until the PDU is out of the rack.

Replacing a PDU

The following procedure describes how to replace a PDU:

Use the PDU monitor to identify its network settings if it is not the reason for the PDU replacement as follows:
1. Press the reset button until it starts to count from 5 to 0. While it is counting down, release the button, and then press it once.
  
  Note:
  You must press the reset button for 20 seconds in order for the countdown to begin.
2. Record the network settings, firmware version, and so on, displayed on the LCD screen as the monitor restarts.
  
  Note:
  If the PDU monitor is not working, then retrieve the network settings connecting to the PDU over the network, or from the network administrator.
Turn off all the PDU breakers.
Unplug the PDU power plugs from the AC outlets.
Note:
- If the power cords use overhead routing, then put the power plugs in a location where they will not fall or hit anyone.
- If the rack is on a raised floor, then move the power cords out through the floor cutout. It may be necessary to maneuver the rack over the cutout in order to move the power cords out.
Do the following procedure for a PDU-B replacement when there is not side panel access, and the rack does not have an InfiniBand cable harness:

Note:
Do not unstrap any cables attached to the cable arms.
1. Unscrew the T25 screws holding the square cable arms to the rack.
2. Move the InfiniBand cables to the middle, out of the way.
Unplug all power cables going from the servers and switches to the PDU. Keep the power cables together in group bundles.
Remove the T30 screws from the top and bottom of the L-bracket, and note where the screws go.
Note where the PDU sits in the rack frame. It is usually 1 inch back from the rack frame to allow access to the breaker switches.
Angle and maneuver the PDU out of the rack.
Hold the PDU or lay it down, if there is enough room, while maneuvering the AC power cords through the rack. It may be necessary to cut the cable ties that hold the AC cord flush with the bottom side of the PDU.
Pull the cords as near to the bottom or top of the rack as possible where there is more room between the servers to get the outlet plug through the routing hole.
Remove the smaller Torx T25 screws, and loosen the nut on the top and bottom to remove the PDU from the L-bracket. The nut does not have to be removed.
Attach the L-bracket to the new PDU.
Lay the new PDU next to the rack.
Route the AC cords through the rack, and to where the outlets are.

Note:
Do not cable tie the AC cord to the new PDU at this time.
Place the new PDU in the rack by angling and maneuvering it until the L-brackets sit on the top and bottom rails.
Line up the holes and slots so that the PDU sits about 1 inch back from the rack frame.
Attach the power cords using the labels on the cords as a guide. For example, G5-0 indicates PDU group 5 outlet 0 on the PDU.
Attach the InfiniBand cable holders if they were removed in step 4. Oracle recommends screwing the holders in by hand at first to avoid stripping the screws.
Attach the AC power cords to the outlets.
Turn on the breakers.
Cable and program the PDU monitor for the network, as needed.

See Also:
Oracle Sun Rack II Power Distribution Units User‘s Guide for information about programming the PDU monitor at
http://docs.oracle.com/cd/E19657-01/E23956-01/z40002661017674.html#scrolltoc
总结：pdu一般都是能够在线更换！

以上是关于exadata(硬件更换文档部分)的主要内容，如果未能解决你的问题，请参考以下文章

Android：更换片段时如何停止音乐？

如何减少Exadata计算节点CPU的Core数量

Exadata 12.2.1.1.0 Highlights

Rac 如何修改默认监听端口号

Exadata X8M支持的数据库版本

更换 Fragment 并按回