Showing posts with label ACI leaf replacement.. Show all posts
Showing posts with label ACI leaf replacement.. Show all posts

Tuesday, 28 April 2026

How to Safely Decommission a Leaf Switch in Cisco ACI — Step-by-Step Guide with Checklist

 Decommissioning a leaf switch in a Cisco ACI fabric is a high‑impact operational activity. Doing it correctly avoids outages, policy corruption, ghost nodes, and rediscovery issues. Doing it incorrectly can result in traffic loss, unresolved faults, and extended downtime.

Many engineers think decommissioning a leaf is as simple as clicking a button in APIC. In reality, Cisco ACI follows a strict dependency and lifecycle model, and a leaf must be carefully prepared before removal.

This blog explains how to decommission a leaf switch from an ACI fabric safely, with real‑world checks, APIC steps, CLI verification, and best practices. It is written for production environments, troubleshooting scenarios, and interviews.


What Does “Decommissioning a Leaf Switch” Mean in ACI?

In Cisco ACI, decommissioning a leaf switch means:

  • Removing the switch from fabric membership
  • Withdrawing all policies and certificates
  • Detaching the switch logically from the fabric control plane
  • Preparing it for power‑off, replacement, or reuse

Decommissioning is a logical operation, not just a hardware action.


Why Proper Decommissioning Is Critical

Incorrect decommissioning can lead to:

  • Traffic outages due to active endpoints
  • Broken vPC or port‑channel configurations
  • L3Out or BGP/OSPF failures
  • “Ghost nodes” still visible in APIC
  • Problems when re‑adding or repurposing the switch

In production data centers, a clean decommission is often part of:

  • Hardware refresh
  • RMA replacement
  • Capacity rebalancing
  • Fabric redesign

Pre‑Decommission Checklist (Very Important)

Before you touch APIC, validate all dependencies. This is where most failures occur.

1. Confirm No Active Endpoints on the Leaf

A leaf with active endpoints must not be decommissioned.

APIC GUI path:

Fabric → Inventory → Pod → Node → Leaf → Endpoints

✅ Endpoint count should be 0

Optional APIC CLI:

Shell
moquery -c fvCEp | grep node-<leaf-id>

If endpoints are present:

  • Migrate workloads
  • Shut down ports
  • Move static bindings

2. Remove Static EPG Bindings

Static path bindings directly tie an EPG to a leaf port.

APIC GUI path:

Tenant → Application Profile → EPG → Static Ports

Remove:

  • Access ports
  • Port‑channels
  • vPC bindings

⚠️ A leaf with static bindings will fail decommission.


3. Handle vPC and Port‑Channels Properly

If the leaf is part of a vPC pair:

  • Remove vPC associations
  • Remove port‑channels
  • Ensure services are moved to the peer leaf

Never decommission one side of an active vPC without cleanup.


4. Check if the Leaf Is a Border Leaf (L3Out)

If the leaf is used for L3Out:

  • Remove it from:
    • Logical Node Profile
    • Logical Interface Profile
  • Ensure routing is functional on alternate border leaves
  • Verify BGP/OSPF is stable

Decommissioning a border leaf without migration can cause external connectivity outages.


Step‑by‑Step: Decommissioning a Leaf Switch from ACI Fabric

Step 1: Verify Fabric Health (Recommended)

Before removing infrastructure components, ensure fabric health is stable.

Fabric → Inventory → Fabric Membership
  • No critical fabric‑wide faults
  • Controllers and spines healthy

This reduces unexpected behavior during changes.


Step 2: Decommission the Leaf from APIC

This is the main and officially supported step.

APIC GUI path:

Fabric → Inventory → Fabric Membership
  1. Select the leaf switch
  2. Click Actions
  3. Choose Decommission
  4. Confirm the action

APIC will:

  • Withdraw policies
  • Remove certificates
  • Update fabric membership state

⏱️ This usually takes 1–2 minutes.


Step 3: Verify Decommission Status

After completion, confirm the state.

APIC GUI:

Fabric → Inventory → Fabric Membership

Leaf should show:

  • Decommissioned or Removed

Optional APIC CLI:

Shell
moquery -c fabricNode | grep <leaf-id>

Step 4 (Optional but Strongly Recommended): Clean the Switch

Once decommissioned, APIC no longer manages the switch.
If the switch will be reused or re‑added, you must clean it locally.

How to Clean the Decommissioned Leaf

Access via:

  • Console
  • OOB management
  • CIMC / KVM (if available)

Run:

Shell
acidiag touch clean
reload

This removes:

  • Fabric certificates
  • Node ID
  • ACI state information

After reboot, the switch will be ready for fresh discovery.


What NOT to Do (Common Mistakes)

MistakeImpact
Decommission with live endpointsTraffic outage
Skip static path cleanupDecommission failure
Forget L3Out dependenciesExternal routing outage
Power off without decommissionGhost node in APIC
Skip cleaning before reuseRediscovery failures

Leaf vs Spine Decommissioning (Quick Comparison)

ItemLeafSpine
Endpoint check required✅ Yes❌ No
Policy dependency cleanup✅ Mandatory❌ Minimal
L3Out impact✅ Possible❌ None
Redundancy consideration✅ Workloads✅ Fabric
Last node restriction❌ Leaf allowed❌ Never remove last spine

Real‑World Decommission Scenarios

Scenario 1: Hardware Refresh

A leaf is replaced due to lifecycle expiry.
Decommission old leaf → clean → rack new leaf → approve membership.

Scenario 2: RMA Replacement

Failed leaf is decommissioned logically, replaced physically, and re‑added with the same or new node ID.

Scenario 3: Fabric Re‑design

Leaf removed as part of capacity reshaping or topology optimization.


Troubleshooting Decommission Failures

If decommission fails:

  • Check for:
    • Active endpoints
    • Static bindings
    • vPC remnants
  • Look at Faults under the leaf
  • Verify no L3Out or service graph references remain

ACI always points to what dependency is blocking the operation.


Interview‑Ready Questions and Answers

Q: How do you decommission a leaf switch in Cisco ACI?
A: Remove all endpoints and policy dependencies, then decommission the leaf from Fabric Membership in APIC.

Q: Can you decommission a leaf with active endpoints?
A: No, endpoints must be removed first.

Q: Why run acidiag touch clean after decommission?
A: To remove fabric identity and prepare the switch for reuse or rediscovery.


Best Practices Summary

  • Always validate endpoints and bindings first
  • Treat border leaves with extra caution
  • Use vPC and redundancy wisely
  • Clean switches before reuse
  • Document node IDs and reasons for decommission

Final One‑Line Summary

In Cisco ACI, a leaf switch must be carefully prepared, cleaned of dependencies, and decommissioned through Fabric Membership in APIC to ensure a safe and outage‑free fabric operation.

Tuesday, 26 August 2025

ACI Leaf Switch Replacement

 To replace a Cisco ACI leaf switch, follow these step-by-step instructions to ensure a smooth transition without disrupting your fabric:


🛠️ Preparation

  1. Document the existing switch details:
    • POD ID
    • Node ID
    • Node Name
    • Serial Number 
  1. Ensure the replacement switch is in ACI mode:
    • Connect via console and run show version.
    • If in NX-OS mode, convert to ACI mode using Cisco's documented procedure 
    • Before adding the new leaf switch to the fabric, ensure it's manually upgraded to the target image or one with a direct upgrade path. Avoid using intermediate images that require multiple upgrade steps, as they can trigger issues and impact your production environment. A final upgrade via policy helps ensure BIOS and FPGA components are properly updated.
  1. Clean up the replacement switch:
    • Run setup-clean-config.sh and then reload to remove any existing configuration 

🔄 Decommission the Faulty Leaf Switch

  1. Go to APIC GUI:
    Fabric > Inventory > Fabric Membership
  2. Right-click the faulty switch → Select Decommission.
  3. Once decommissioned, Remove from Controller and confirm the action 
  4. Physically disconnect and unmount the old switch.

🔌 Install and Connect the New Leaf Switch

  1. Mount the new switch and connect uplinks to spine switches. DONOT CONNECT DOWNLINK AT THIS STAGE
  2. Power on the switch.
  3. In APIC GUI, go to:
    Fabric > Inventory > Fabric Membership > Nodes Pending Registration
  4. Verify serial number, then Register the switch:
    • Use the same POD ID, Node ID, and Node Name as the old switch 
  1. Once registered, go to:
    Fabric > Inventory > Fabric Membership > Registered Nodes
    → Right-click → Select Commission.
  2. Wait for the switch to reach Active state.

🔍 Post-Replacement Validation

  1. Connect downlink cables (after switch is active).
  2. Go to:
    Fabric > Inventory > Topology
    → Verify the switch is visible and operational.
  3. SSH into APIC and run:

→ Confirm switch status is active 

  1. If you get SSH warnings (e.g., DNS spoofing), update the known_hosts file:

🧩 Troubleshooting Tips

  • Switch not discovered: Check LLDP neighbors and cable connections.
  • Switch shows "Not Supported": Upgrade APIC firmware to match switch model.
  • No TEP IP assigned: May be a DHCP issue—contact Cisco TAC.
  • SSL issues: Check for established sessions on port 12215