Showing posts with label Cisco ACI Leaf. Show all posts
Showing posts with label Cisco ACI Leaf. Show all posts

Friday, 8 May 2026

Cisco ACI Decommission Only vs Remove vs Secure Remove — Complete Guide with Management IP Behavior

Introduction

In Cisco ACI environments, switch lifecycle management is a critical operational task. Whether you are performing a node ID change, replacing hardware, or decommissioning a switch permanently, understanding the differences between Decommission Only, Decommission & Remove, and Decommission & Secure Remove is essential.

Many engineers misunderstand these options, especially when it comes to switch behavior, APIC interaction, and management IP handling, which can lead to unexpected outages or onboarding failures.

In this detailed guide, we will break down each option with real-world behavior, management IP impact, and best-use scenarios.

What Happens During Decommission in Cisco ACI

Decommissioning in ACI involves three key components:

  1. Switch (Leaf/Spine) behavior
  2. APIC database behavior
  3. Fabric identity and configuration handling

Each decommission option treats these components differently.

1. Decommission Only (Reset but Keep Identity)

Behavior

  • Switch is wiped and reloaded
  • Node ID and serial number mapping is retained in APIC
  • APIC retains configuration for that node

After Reload

  • Switch boots in clean state
  • Automatically rejoins the fabric
  • No manual intervention required

Management IP Behavior

  • OOB Management IP (mgmt0):
    • Retained and reused
    • No need to reconfigure
  • TEP IP:
    • Re-established automatically

Key Advantage

  • Fast recovery without reconfiguration

Use Case

  • Fixing switch issues without changing node identity
  • Restarting node cleanly
  • Troubleshooting fabric inconsistencies

Important Insight

This is the only mode where the switch auto-rejoins the fabric without manual setup.

2. Decommission & Remove (Reset + Remove Identity)

Behavior

  • Switch is wiped and reloaded
  • Node registration is removed from APIC
  • APIC retains logical configuration (policies, tenants)

After Reload

  • Switch appears as:
    • Unregistered node
  • Requires manual onboarding via:
    • Fabric → Inventory → Node Setup

Management IP Behavior

  • OOB Management IP (mgmt0):
    • Not retained logically in APIC
    • Needs to be re-entered or reconfigured
  • TEP IP:
    • Reassigned during recommission

Key Advantage

  • Allows fresh onboarding of the same hardware

Use Case

  • Node ID change
  • Recommissioning switch
  • Hardware replacement (same fabric)
  • Moving switch within fabric design

Critical Step

You must perform:

run bash
setup-clean-config.sh
reload

Practical Risk

If you skip cleanup:

  • Residual configs may remain
  • Fabric join issues can occur

3. Decommission & Secure Remove (Full Secure Wipe)

Behavior

  • Switch is:
    • Securely wiped (deep erase)
    • Reloaded
  • Removes:
    • Configuration
    • Certificates
    • Encryption keys
    • Fabric identity

After Reload

  • Switch becomes:
    • Factory-like device
  • Cannot join fabric directly

Management IP Behavior

  • OOB Management IP:
    • Completely erased
  • TEP IP:
    • Fully removed

Key Requirement

  • Requires:
    • ACI image validation/reload (if removed)
    • Complete day-0 onboarding

Key Advantage

  • Ensures zero residual data

Use Case

  • Device disposal
  • RMA return
  • Moving switch to a different customer/fabric
  • Security compliance requirements

Side-by-Side Comparison

FeatureDecommission OnlyDecommission & RemoveSecure Remove
Switch ReloadYesYesYes
Config WipeYes (normal)Yes (normal)Full secure wipe
Node ID RetainedYesNoNo
Auto Rejoin FabricYesNoNo
Manual RecommissionNoYesYes
Mgmt IP (OOB)RetainedNeeds reconfigFully erased
TEP IPAuto restoredReassignedRemoved
Secure Data EraseNoNoYes

Real-World Decision Guide

Use Decommission Only When:

  • You want quick reset
  • You are not changing Node ID
  • You want automatic fabric rejoin

Use Decommission & Remove When:

  • You are:
    • Changing Node ID
    • Rebuilding switch
    • Replacing hardware
  • You are okay with manual recommission

Use Secure Remove When:

  • Device is leaving environment
  • Security wipe is required
  • Moving to new fabric/customer

Common Mistakes to Avoid

1. Assuming “Remove” wipes everything

It does not remove all residual configs. Always run cleanup script.

2. Forgetting Management IP

  • After Decommission & Remove:
    • Mgmt IP must be planned and reconfigured
  • After Secure Remove:
    • Completely lost

3. Using Decommission Only for Node ID Change

This will fail because:

  • Node identity is preserved
  • APIC will not allow new ID

Pro Tip (Very Important for Production)

Before any decommission:

✅ Note down:

  • Node ID
  • Serial number
  • Management IP
  • TEP pool

✅ Ensure:

  • Console access is available

✅ Plan:

  • Multiple reload windows

Conclusion

Understanding the differences between Decommission Only, Decommission & Remove, and Secure Remove is crucial for smooth Cisco ACI operations.

  • Decommission Only keeps identity and allows auto-rejoin
  • Decommission & Remove resets switch and requires manual setup
  • Secure Remove completely wipes the device for disposal or reuse

The biggest differentiator in real environments is management IP behavior and node identity retention, which directly impacts how the switch rejoins the fabric.


Please refer to below blog for Leaf Node ID swap

Networklearner: Leaf Node ID Swap in Cisco ACI: Risks, Precautions, and Steps

Sunday, 26 April 2026

Cisco ACI “Unknown” Leaf State Explained: Certificates, LLDP, Software, and Hardware Issues

  In a Cisco ACI fabric, one of the most frustrating issues during initial fabric bring‑up, expansion, or node replacement is seeing a leaf switch stuck in an “Unknown” state. When a leaf is in an unknown state, it means the APIC cannot fully discover, authenticate, or manage the node, preventing it from joining the fabric and participating in traffic forwarding.

This issue can occur during initial fabric deployment, adding a new leaf to an existing fabric, replacing failed hardware, performing software upgrades, or moving switches between fabrics.

Understanding why a leaf enters the “Unknown” state is critical for fast recovery. In most cases, the root cause is not a single configuration mistake but a failure in communication, authentication, compatibility, or initialization.

This article explains the most common causes of the “Unknown” leaf state in Cisco ACI, why they happen, and how to systematically troubleshoot them in real‑world environments.

1. What Does “Unknown” Leaf State Mean in Cisco ACI?

When a leaf is shown as “Unknown” in the APIC GUI, it indicates that the APIC can see the node attempting discovery, but the node cannot complete secure authentication or critical control‑plane messaging has failed.

At this stage, the leaf is not operational, not programmable, and cannot forward production traffic.

2. Certificate Issues Between Leaf and APIC

Cisco ACI uses mutual certificate‑based authentication between the APIC controllers and fabric nodes. Every leaf switch must present a valid certificate chain that is signed and trusted by the APIC.

If the certificate exchange fails, the leaf cannot authenticate correctly, and APIC marks it as Unknown.

Common certificate‑related problems include an invalid or corrupted certificate on the leaf, the leaf previously belonging to another ACI fabric, expired or mismatched certificates due to time drift, or incomplete cleanup after node replacement.

These issues are often seen when hardware is reused without full re‑initialization.

The most reliable resolution is to completely wipe and reinitialize the leaf switch, ensure it boots in ACI mode, and allow APIC to generate and install a fresh certificate.

3. LLDP Mismatch or LLDP Failure

Cisco ACI relies heavily on LLDP for fabric discovery and adjacency validation. LLDP is mandatory in ACI for identifying correct topological relationships between leaf and spine switches.

If LLDP is not exchanged correctly, discovery fails and the leaf remains in an Unknown state.

Typical LLDP problems include LLDP being disabled on connected devices, LLDP filtered due to security policies, incorrect cabling such as connecting a leaf to something other than a spine, or the switch running in NX‑OS mode instead of ACI mode.

Symptoms include missing neighbor information, partial discovery, or interfaces appearing operationally down.

To resolve LLDP issues, ensure LLDP is enabled end‑to‑end, verify correct cabling from leaf to spine only, confirm the switch is running in ACI mode, and check optics and interfaces on both ends.

4. Firmware or Software Incompatibility

ACI fabric components are designed to work within a compatible software matrix. Significant software mismatches between the APIC, leaf, and spine can prevent successful node onboarding.

This often occurs when a leaf is running an unsupported ACI version, the APIC has been upgraded but the leaf image was not updated, or an incorrect software image is installed on the switch.

Typical symptoms include the leaf being detected but never transitioning from Unknown to Active, along with compatibility or image‑related faults.

Resolution requires verifying Cisco’s supported version matrix and ensuring that the leaf software version is compatible with both the APIC and spine versions.

5. Hardware Problems

Physical layer issues are a common but frequently overlooked cause of Unknown leaf state. Even a simple faulty optic can completely prevent discovery.

Common hardware causes include defective or unsupported transceivers, damaged fiber or copper cables, faulty ports on the leaf or spine, or mismatched speed or media types.

Indicators include interfaces staying down, intermittent connectivity, missing LLDP information, or hardware‑related faults in APIC.

Troubleshooting involves replacing suspect cables and optics, using only Cisco‑supported transceivers, testing alternate ports, and validating interface status on both leaf and spine.

6. Time Synchronization Issues

Certificate validation in ACI is time‑sensitive. If the system time on the leaf is significantly out of sync with the APIC, certificate authentication can fail even if the configuration and connectivity are correct.

This is common in environments where NTP is misconfigured, unavailable, or the device has been powered off for an extended period.

Symptoms include authentication failures and persistent Unknown leaf state with no obvious physical or configuration issues.

Resolution involves verifying NTP configuration on APIC, ensuring the leaf can synchronize time, and reinitiating discovery after time correction.

7. Incorrect Node ID or Serial Number Issues

ACI uniquely identifies nodes using a combination of node ID, serial number, and certificates. If these identifiers do not match what APIC expects, the leaf will fail authentication.

This commonly occurs when a switch was previously part of another fabric, reused after RMA without proper cleanup, or when a node ID conflict exists.

Symptoms include the leaf appearing with unexpected identity information or being rejected during registration.

The safest resolution is to fully wipe the leaf configuration, reboot the device, and allow APIC to assign a fresh node identity.

8. Recommended Troubleshooting Sequence

When a leaf is stuck in Unknown state, follow this sequence:

First, verify physical connectivity and optics.
Second, confirm LLDP adjacency and cabling.
Third, check software compatibility.
Fourth, validate certificates and authentication.
Fifth, ensure correct time synchronization.
Finally, reinitialize the leaf if needed.

Following this order avoids unnecessary configuration changes and reduces downtime.

9. Best Practices to Prevent Unknown Leaf State

Always wipe reused hardware before deployment.
Keep APIC, spine, and leaf software versions compatible.
Use supported Cisco optics and cables.
Ensure stable NTP configuration.
Verify LLDP connectivity during installation.
Document node IDs and serial numbers carefully.

Most Unknown leaf issues are preventable with proper procedures.

10. Conclusion

An Unknown leaf state in Cisco ACI is always a symptom of a failed discovery, authentication, compatibility, or initialization process. Certificate issues, LLDP failures, firmware incompatibility, hardware problems, time synchronization issues, and incorrect node identity are the most common causes.

By understanding these root causes and following a structured troubleshooting approach, engineers can resolve Unknown leaf issues quickly and avoid prolonged deployment delays.

A clean initialization and methodical verification remain the most effective solution in Cisco ACI environments.