Showing posts with label Cisco ACI ACI Troubleshooting Cisco APIC. Show all posts
Showing posts with label Cisco ACI ACI Troubleshooting Cisco APIC. Show all posts

Friday, 8 May 2026

Cisco ACI Decommission Only vs Remove vs Secure Remove — Complete Guide with Management IP Behavior

Introduction

In Cisco ACI environments, switch lifecycle management is a critical operational task. Whether you are performing a node ID change, replacing hardware, or decommissioning a switch permanently, understanding the differences between Decommission Only, Decommission & Remove, and Decommission & Secure Remove is essential.

Many engineers misunderstand these options, especially when it comes to switch behavior, APIC interaction, and management IP handling, which can lead to unexpected outages or onboarding failures.

In this detailed guide, we will break down each option with real-world behavior, management IP impact, and best-use scenarios.

What Happens During Decommission in Cisco ACI

Decommissioning in ACI involves three key components:

  1. Switch (Leaf/Spine) behavior
  2. APIC database behavior
  3. Fabric identity and configuration handling

Each decommission option treats these components differently.

1. Decommission Only (Reset but Keep Identity)

Behavior

  • Switch is wiped and reloaded
  • Node ID and serial number mapping is retained in APIC
  • APIC retains configuration for that node

After Reload

  • Switch boots in clean state
  • Automatically rejoins the fabric
  • No manual intervention required

Management IP Behavior

  • OOB Management IP (mgmt0):
    • Retained and reused
    • No need to reconfigure
  • TEP IP:
    • Re-established automatically

Key Advantage

  • Fast recovery without reconfiguration

Use Case

  • Fixing switch issues without changing node identity
  • Restarting node cleanly
  • Troubleshooting fabric inconsistencies

Important Insight

This is the only mode where the switch auto-rejoins the fabric without manual setup.

2. Decommission & Remove (Reset + Remove Identity)

Behavior

  • Switch is wiped and reloaded
  • Node registration is removed from APIC
  • APIC retains logical configuration (policies, tenants)

After Reload

  • Switch appears as:
    • Unregistered node
  • Requires manual onboarding via:
    • Fabric → Inventory → Node Setup

Management IP Behavior

  • OOB Management IP (mgmt0):
    • Not retained logically in APIC
    • Needs to be re-entered or reconfigured
  • TEP IP:
    • Reassigned during recommission

Key Advantage

  • Allows fresh onboarding of the same hardware

Use Case

  • Node ID change
  • Recommissioning switch
  • Hardware replacement (same fabric)
  • Moving switch within fabric design

Critical Step

You must perform:

run bash
setup-clean-config.sh
reload

Practical Risk

If you skip cleanup:

  • Residual configs may remain
  • Fabric join issues can occur

3. Decommission & Secure Remove (Full Secure Wipe)

Behavior

  • Switch is:
    • Securely wiped (deep erase)
    • Reloaded
  • Removes:
    • Configuration
    • Certificates
    • Encryption keys
    • Fabric identity

After Reload

  • Switch becomes:
    • Factory-like device
  • Cannot join fabric directly

Management IP Behavior

  • OOB Management IP:
    • Completely erased
  • TEP IP:
    • Fully removed

Key Requirement

  • Requires:
    • ACI image validation/reload (if removed)
    • Complete day-0 onboarding

Key Advantage

  • Ensures zero residual data

Use Case

  • Device disposal
  • RMA return
  • Moving switch to a different customer/fabric
  • Security compliance requirements

Side-by-Side Comparison

FeatureDecommission OnlyDecommission & RemoveSecure Remove
Switch ReloadYesYesYes
Config WipeYes (normal)Yes (normal)Full secure wipe
Node ID RetainedYesNoNo
Auto Rejoin FabricYesNoNo
Manual RecommissionNoYesYes
Mgmt IP (OOB)RetainedNeeds reconfigFully erased
TEP IPAuto restoredReassignedRemoved
Secure Data EraseNoNoYes

Real-World Decision Guide

Use Decommission Only When:

  • You want quick reset
  • You are not changing Node ID
  • You want automatic fabric rejoin

Use Decommission & Remove When:

  • You are:
    • Changing Node ID
    • Rebuilding switch
    • Replacing hardware
  • You are okay with manual recommission

Use Secure Remove When:

  • Device is leaving environment
  • Security wipe is required
  • Moving to new fabric/customer

Common Mistakes to Avoid

1. Assuming “Remove” wipes everything

It does not remove all residual configs. Always run cleanup script.

2. Forgetting Management IP

  • After Decommission & Remove:
    • Mgmt IP must be planned and reconfigured
  • After Secure Remove:
    • Completely lost

3. Using Decommission Only for Node ID Change

This will fail because:

  • Node identity is preserved
  • APIC will not allow new ID

Pro Tip (Very Important for Production)

Before any decommission:

✅ Note down:

  • Node ID
  • Serial number
  • Management IP
  • TEP pool

✅ Ensure:

  • Console access is available

✅ Plan:

  • Multiple reload windows

Conclusion

Understanding the differences between Decommission Only, Decommission & Remove, and Secure Remove is crucial for smooth Cisco ACI operations.

  • Decommission Only keeps identity and allows auto-rejoin
  • Decommission & Remove resets switch and requires manual setup
  • Secure Remove completely wipes the device for disposal or reuse

The biggest differentiator in real environments is management IP behavior and node identity retention, which directly impacts how the switch rejoins the fabric.


Please refer to below blog for Leaf Node ID swap

Networklearner: Leaf Node ID Swap in Cisco ACI: Risks, Precautions, and Steps

Tuesday, 28 April 2026

How to Safely Decommission a Leaf Switch in Cisco ACI — Step-by-Step Guide with Checklist

 Decommissioning a leaf switch in a Cisco ACI fabric is a high‑impact operational activity. Doing it correctly avoids outages, policy corruption, ghost nodes, and rediscovery issues. Doing it incorrectly can result in traffic loss, unresolved faults, and extended downtime.

Many engineers think decommissioning a leaf is as simple as clicking a button in APIC. In reality, Cisco ACI follows a strict dependency and lifecycle model, and a leaf must be carefully prepared before removal.

This blog explains how to decommission a leaf switch from an ACI fabric safely, with real‑world checks, APIC steps, CLI verification, and best practices. It is written for production environments, troubleshooting scenarios, and interviews.


What Does “Decommissioning a Leaf Switch” Mean in ACI?

In Cisco ACI, decommissioning a leaf switch means:

  • Removing the switch from fabric membership
  • Withdrawing all policies and certificates
  • Detaching the switch logically from the fabric control plane
  • Preparing it for power‑off, replacement, or reuse

Decommissioning is a logical operation, not just a hardware action.


Why Proper Decommissioning Is Critical

Incorrect decommissioning can lead to:

  • Traffic outages due to active endpoints
  • Broken vPC or port‑channel configurations
  • L3Out or BGP/OSPF failures
  • “Ghost nodes” still visible in APIC
  • Problems when re‑adding or repurposing the switch

In production data centers, a clean decommission is often part of:

  • Hardware refresh
  • RMA replacement
  • Capacity rebalancing
  • Fabric redesign

Pre‑Decommission Checklist (Very Important)

Before you touch APIC, validate all dependencies. This is where most failures occur.

1. Confirm No Active Endpoints on the Leaf

A leaf with active endpoints must not be decommissioned.

APIC GUI path:

Fabric → Inventory → Pod → Node → Leaf → Endpoints

✅ Endpoint count should be 0

Optional APIC CLI:

Shell
moquery -c fvCEp | grep node-<leaf-id>

If endpoints are present:

  • Migrate workloads
  • Shut down ports
  • Move static bindings

2. Remove Static EPG Bindings

Static path bindings directly tie an EPG to a leaf port.

APIC GUI path:

Tenant → Application Profile → EPG → Static Ports

Remove:

  • Access ports
  • Port‑channels
  • vPC bindings

⚠️ A leaf with static bindings will fail decommission.


3. Handle vPC and Port‑Channels Properly

If the leaf is part of a vPC pair:

  • Remove vPC associations
  • Remove port‑channels
  • Ensure services are moved to the peer leaf

Never decommission one side of an active vPC without cleanup.


4. Check if the Leaf Is a Border Leaf (L3Out)

If the leaf is used for L3Out:

  • Remove it from:
    • Logical Node Profile
    • Logical Interface Profile
  • Ensure routing is functional on alternate border leaves
  • Verify BGP/OSPF is stable

Decommissioning a border leaf without migration can cause external connectivity outages.


Step‑by‑Step: Decommissioning a Leaf Switch from ACI Fabric

Step 1: Verify Fabric Health (Recommended)

Before removing infrastructure components, ensure fabric health is stable.

Fabric → Inventory → Fabric Membership
  • No critical fabric‑wide faults
  • Controllers and spines healthy

This reduces unexpected behavior during changes.


Step 2: Decommission the Leaf from APIC

This is the main and officially supported step.

APIC GUI path:

Fabric → Inventory → Fabric Membership
  1. Select the leaf switch
  2. Click Actions
  3. Choose Decommission
  4. Confirm the action

APIC will:

  • Withdraw policies
  • Remove certificates
  • Update fabric membership state

⏱️ This usually takes 1–2 minutes.


Step 3: Verify Decommission Status

After completion, confirm the state.

APIC GUI:

Fabric → Inventory → Fabric Membership

Leaf should show:

  • Decommissioned or Removed

Optional APIC CLI:

Shell
moquery -c fabricNode | grep <leaf-id>

Step 4 (Optional but Strongly Recommended): Clean the Switch

Once decommissioned, APIC no longer manages the switch.
If the switch will be reused or re‑added, you must clean it locally.

How to Clean the Decommissioned Leaf

Access via:

  • Console
  • OOB management
  • CIMC / KVM (if available)

Run:

Shell
acidiag touch clean
reload

This removes:

  • Fabric certificates
  • Node ID
  • ACI state information

After reboot, the switch will be ready for fresh discovery.


What NOT to Do (Common Mistakes)

MistakeImpact
Decommission with live endpointsTraffic outage
Skip static path cleanupDecommission failure
Forget L3Out dependenciesExternal routing outage
Power off without decommissionGhost node in APIC
Skip cleaning before reuseRediscovery failures

Leaf vs Spine Decommissioning (Quick Comparison)

ItemLeafSpine
Endpoint check required✅ Yes❌ No
Policy dependency cleanup✅ Mandatory❌ Minimal
L3Out impact✅ Possible❌ None
Redundancy consideration✅ Workloads✅ Fabric
Last node restriction❌ Leaf allowed❌ Never remove last spine

Real‑World Decommission Scenarios

Scenario 1: Hardware Refresh

A leaf is replaced due to lifecycle expiry.
Decommission old leaf → clean → rack new leaf → approve membership.

Scenario 2: RMA Replacement

Failed leaf is decommissioned logically, replaced physically, and re‑added with the same or new node ID.

Scenario 3: Fabric Re‑design

Leaf removed as part of capacity reshaping or topology optimization.


Troubleshooting Decommission Failures

If decommission fails:

  • Check for:
    • Active endpoints
    • Static bindings
    • vPC remnants
  • Look at Faults under the leaf
  • Verify no L3Out or service graph references remain

ACI always points to what dependency is blocking the operation.


Interview‑Ready Questions and Answers

Q: How do you decommission a leaf switch in Cisco ACI?
A: Remove all endpoints and policy dependencies, then decommission the leaf from Fabric Membership in APIC.

Q: Can you decommission a leaf with active endpoints?
A: No, endpoints must be removed first.

Q: Why run acidiag touch clean after decommission?
A: To remove fabric identity and prepare the switch for reuse or rediscovery.


Best Practices Summary

  • Always validate endpoints and bindings first
  • Treat border leaves with extra caution
  • Use vPC and redundancy wisely
  • Clean switches before reuse
  • Document node IDs and reasons for decommission

Final One‑Line Summary

In Cisco ACI, a leaf switch must be carefully prepared, cleaned of dependencies, and decommissioned through Fabric Membership in APIC to ensure a safe and outage‑free fabric operation.

Sunday, 15 March 2026

Cisco ACI MoQuery – Advanced Commands for Day‑to‑Day Operations

Cisco ACI provides a powerful graphical interface through APIC, but experienced ACI engineers rarely rely only on the GUI during daily operations. In real production environments, engineers prefer moquery because it offers fast, accurate, and read‑only access to the Cisco ACI Management Information Tree (MIT).

Moquery is safe to use in production, does not impact traffic, and does not program hardware. It exposes the real‑time state of the fabric and eliminates guesswork during troubleshooting. For day‑to‑day ACI operations, moquery is often the first tool engineers reach for.


What Is MoQuery in Cisco ACI?

Moquery is a command‑line utility available directly on the APIC that allows engineers to query managed objects (MOs) stored in the ACI database. Unlike the APIC GUI, moquery does not hide relationships or simplify outputs. It shows raw and authoritative information exactly as it exists in the fabric.

Moquery is commonly used for:

  • Endpoint troubleshooting
  • Contract and policy validation
  • VRF and bridge domain verification
  • Fault analysis
  • Fabric and node health checks

Endpoint Troubleshooting Using MoQuery

Endpoint‑related issues are the most common problems in Cisco ACI environments. When endpoints are not reachable or behave unexpectedly, moquery provides immediate visibility.

To display all learned endpoints:

moquery -c fvCEp

This command shows:

  • MAC address
  • IP address
  • EPG association
  • Bridge Domain
  • Leaf and interface where the endpoint is learned

To find a specific IP address:

moquery -c fvCEp | grep 10.10.10.25

To find a specific MAC address:

moquery -c fvCEp | grep 00:50:56

These commands are used daily to identify incorrect endpoint learning, endpoint mobility events, duplicate IPs, and static path misconfigurations.


Validating Application Profiles and EPGs

To list all Endpoint Groups (EPGs) in a tenant:

moquery -c fvAEPg

This command is helpful when:

  • EPGs do not appear in the GUI
  • Verifying naming conventions
  • Confirming EPG existence during migrations

To identify which application profile an EPG belongs to:

moquery -c fvAEPg | grep dn

This is especially useful in environments with many application profiles and similarly named EPGs.


Contract Troubleshooting Using MoQuery

Contracts are one of the most frequent causes of traffic drops in Cisco ACI. Moquery allows engineers to validate contract relationships without relying on GUI assumptions.

To list all contracts:

moquery -c vzBrCP

To check which EPGs are providers of a contract:

moquery -c fvRsProv

To check which EPGs are consumers of a contract:

moquery -c fvRsCons

These commands confirm whether the correct EPGs are actually providing and consuming the intended contracts.


Validating Contract Subjects and Filters

Many contract issues occur not because the contract is missing, but because the filter is wrong.

To inspect contract subjects:

moquery -c vzSubj

To list filters:

moquery -c vzFilter

To validate filter entries (ports, protocol, and direction):

moquery -c vzEntry

These commands remove ambiguity and clearly show whether the contract allows the required traffic.


Taboo Contract Verification

Taboo Contracts explicitly deny traffic and override permit contracts. They should be used sparingly, as misconfiguration can cause outages.

To list all Taboo Contracts:

moquery -c vzTaboo

To inspect Taboo contract subjects:

moquery -c vzTSubj

If traffic is unexpectedly denied, these commands should always be checked early in troubleshooting.


Validating vzAny and VRF‑Level Policies

vzAny represents all EPGs within a single VRF and is commonly used for shared services or broad policy application.

To list all VRFs:

moquery -c fvCtx

To confirm vzAny configuration:

moquery -c vzAny

This is critical in environments using:

  • Shared‑services architectures
  • Permit‑all designs
  • Contract Preferred Groups

Many production incidents occur because engineers are unaware of an existing vzAny contract.


Bridge Domain Troubleshooting

Bridge Domain issues can silently break connectivity.

To list all bridge domains:

moquery -c fvBD

To display bridge domain subnets:

moquery -c fvSubnet

To validate Bridge Domain to VRF mapping:

moquery -c fvRsCtx

These commands help identify:

  • Missing gateways
  • Incorrect VRF bindings
  • Wrong subnet scope

L3Out and External Connectivity Validation

To list all Layer‑3 Outs:

moquery -c l3extOut

To view external EPGs:

moquery -c l3extInstP

To check external subnets:

moquery -c l3extSubnet

These are essential when troubleshooting:

  • North‑south traffic issues
  • Firewall integration
  • Route advertisement problems

Fault and Fabric Health Troubleshooting

To display all active faults:

moquery -c faultInst

To see only critical faults:

moquery -c faultInst | grep critical

To find operational faults:

moquery -c faultInst | grep oper

These commands are faster and often more actionable than navigating the APIC fault dashboard.


Fabric and Node Health Validation

To list all fabric nodes:

moquery -c fabricNode

To check fabric health scores:

moquery -c fabricHealth

These commands are commonly used before and after production changes to ensure stability.


Interface and Path Troubleshooting

To list physical interfaces:

moquery -c ethpmPhysIf

To check interface operational state:

moquery -c ethpmPhysIf | grep operSt

To validate static path bindings:

moquery -c fvRsPathAtt

These commands explain many partial connectivity issues, link‑state problems, and unexpected traffic drops.


Best Practices for Daily MoQuery Usage

  • Use moquery during incidents, not after
  • Save outputs for RCA and audits
  • Combine moquery with grep for faster analysis
  • Learn common managed object classes such as fvCEp, fvAEPg, fvBD, fvCtx, and faultInst

Why Every ACI Engineer Should Master MoQuery

Moquery significantly reduces MTTR, increases confidence during incidents, and exposes the actual state of the fabric. Engineers who master moquery troubleshoot faster, avoid mistakes, and operate more effectively in large ACI environments.


Conclusion

Moquery is one of the most powerful yet underutilized tools in Cisco ACI. While the APIC GUI is excellent for visualization, moquery provides the facts. For serious ACI operations, moquery should be part of every engineer’s daily workflow.