Networklearner: aci leaf switch

Showing posts with label aci leaf switch. Show all posts

Tuesday, 28 April 2026

Cisco ACI Leaf Node ID Swap on vPC Pairs: Risks, Precautions, and Step-by-Step Process

Cisco ACI Leaf Node ID Swap Steps When Leaves Are Part of vPC

Step 0 – Preconditions
Confirm maintenance window is approved. Ensure alternate connectivity or downtime is acceptable. Make sure you have console or OOB access to both leaf switches.

Step 1 – Drain Traffic and Clear Endpoints
Shut down or migrate all server-facing interfaces connected to the vPC pair.
From APIC, navigate to Fabric → Inventory → Pod → Node → Leaf → Endpoints.
Verify endpoint count is zero on both leaves.

Step 2 – Remove vPC and Port-Channel Configuration
Delete vPC protection group policies.
Delete all vPC port-channels.
Remove interface policy associations.
Remove all static EPG bindings that reference the vPC or either leaf.
At this stage, the leaves must have no access policy dependencies.

Step 3 – Remove L3Out (If Leaves Are Border Leaves)
If the vPC pair is used for L3Out, remove both leaves from the L3Out logical node profile and logical interface profile.
Confirm external routing is stable via remaining border leaves.

Step 4 – Decommission First Leaf (Leaf A)
In APIC, go to Fabric → Inventory → Fabric Membership.
Select Leaf A and perform Decommission.
Wait until the status shows Decommissioned.
Do not power off yet.

Step 5 – Clean Leaf A
Connect to Leaf A using console or OOB.
Run acidiag touch clean and then reload the switch.
This removes old node ID, certificates, and fabric identity.

Step 6 – Decommission Second Leaf (Leaf B)
In APIC, again go to Fabric → Inventory → Fabric Membership.
Select Leaf B and perform Decommission.
Wait until the status shows Decommissioned.

Step 7 – Clean Leaf B
Connect to Leaf B using console or OOB.
Run acidiag touch clean and reload the switch.
Both leaves are now clean and discovery-ready.

Step 8 – Re-add Leaf A with New Node ID
Power on Leaf A only.
Ensure fabric uplinks to spines are connected.
From APIC Fabric Membership, approve the switch and assign the new desired node ID.
Wait until Leaf A is fully discovered and stable.

Step 9 – Re-add Leaf B with New Node ID
Power on Leaf B.
From APIC Fabric Membership, approve it and assign the other node ID.
Wait until Leaf B is fully discovered and stable.

Step 10 – Rebuild vPC Configuration
After both leaves are healthy, recreate the vPC protection group.
Recreate vPC port-channels and interface policies.
Reapply static EPG bindings to the vPC.
Do not rebuild vPC if only one leaf is active.

Step 11 – Validation
Verify fabric health is green.
Ensure no vPC, access, or infra faults exist.
Confirm port-channels are up on both leaves.

Step 12 – Restore Traffic
Enable server-facing interfaces.
Bring servers or upstream devices back online.
Verify endpoint learning and confirm no MAC flapping or faults.

Final Rule

Never attempt a live node ID swap. Always decommission, clean, and re-add both vPC peer leaves in a controlled sequence.

Precautions

Swapping node IDs between Cisco ACI leaf switches is a sensitive operation, especially when the leaves are configured as a vPC pair. Unlike traditional networks, Cisco ACI tightly binds policies, forwarding state, and infrastructure objects to node IDs, making a node ID swap a planned maintenance activity, not a live change. When vPC is involved, the risk multiplies because both leaves act as a single logical endpoint for servers and network devices.

This article explains the critical precautions you must follow when performing a Cisco ACI leaf node ID swap in a vPC environment, based on real production experience and Cisco‑accepted operational practices.

Why Node ID Swap Is Risky in vPC‑Based ACI Fabrics

In Cisco ACI, a leaf’s node ID is not just an identifier; it is embedded into multiple internal constructs such as vPC identifiers, static EPG bindings, endpoint tables, and forwarding databases. In a vPC pair, both leaves jointly provide forwarding for a single logical port‑channel. Swapping node IDs without proper preparation can cause MAC flapping, endpoint blackholing, broken port‑channels, and fabric faults.

There is no supported in‑place node ID change in Cisco ACI. The only supported method to swap node IDs is to decommission, clean, and re‑add the leaf switches with the desired node IDs.

Precaution 1: Treat the vPC Pair as a Single Failure Domain

The most important rule is to treat both vPC peers as a single unit, even though they are two physical switches. Never attempt a node ID swap on only one vPC peer while the other peer is actively forwarding traffic. ACI vPC forwarding relies on consistent node information across both leaves. Any mismatch can result in unpredictable traffic loss.

Before starting, ensure:

All connected servers or upstream devices are drained or shut down.
No single‑homed devices depend on the vPC pair.
Maintenance is scheduled during a proper change window.

Precaution 2: Ensure Zero Active Endpoints on Both Leaves

A node ID swap must never be performed while endpoints are active. In ACI, endpoints can be learned dynamically through traffic, and their state is tied to the leaf node ID. If endpoints remain on either vPC peer, swapping node IDs will cause immediate disruption.

From APIC, verify that both leaves show zero endpoints before proceeding. If endpoints are present, migrate workloads, shut down interfaces, or disconnect cables until endpoint learning is cleared.

Precaution 3: Remove vPC and Port‑Channel Policies Before Decommissioning

ACI does not automatically clean up vPC policies during decommissioning. All vPC‑related constructs must be removed manually. This includes:

vPC protection group
Port‑channel policies
Interface policy associations
Static EPG bindings referencing the vPC

Leaving these objects in place can block decommissioning or result in orphaned configuration that causes faults after the swap. A clean policy removal ensures that the fabric does not retain references to the old node IDs.

Precaution 4: If the vPC Pair Is Also a Border Leaf, Remove L3Out First

When a vPC pair is serving as a border leaf for L3Out, the risk is even higher. External routing protocols such as BGP or OSPF depend on stable leaf identities. Before any node ID swap:

Remove the leaves from all L3Out logical node profiles.
Ensure routing is fully operational on alternate border leaves.
Validate external reachability before continuing.

Failure to do this can result in complete north‑south traffic outages.

Precaution 5: Always Clean Both Leaves Using acidiag

After decommissioning each leaf, it is mandatory to run:

acidiag touch clean
reload

on both vPC peers. Cleaning only one switch is a common and dangerous mistake. If one leaf still retains fabric identity or certificates, the fabric may encounter node ID conflicts, discovery failures, or inconsistent vPC behavior when the switches are re‑added.

Cleaning ensures that the switch boots in a discovery‑ready state with no residual ACI identity.

Precaution 6: Re‑Add Leaves Sequentially, Not in Parallel

When re‑adding switches with swapped node IDs, never power up or approve both leaves at the same time. Always follow a controlled order:

Bring up the first leaf and assign its new node ID.
Wait for full fabric stability and health.
Bring up the second leaf and assign its new node ID.

This approach avoids node ID collisions, partial vPC instantiation, and confusing APIC fault scenarios.

Precaution 7: Rebuild vPC Only After Both Leaves Are Fully Healthy

Do not recreate vPC configurations until both leaves are fully discovered, healthy, and visible in the fabric. Building vPC with only one peer active leads to port‑channel inconsistencies and deployment failures.

Once both leaves are stable:

Recreate vPC protection groups.
Recreate port‑channels.
Reapply static EPG bindings.
Validate that both leaves appear in all bindings.

Only after this should server ports or network devices be reconnected.

Precaution 8: Validate vPC Health Before Allowing Traffic

Before reintroducing traffic, perform strict validation:

No vPC‑related faults in APIC.
Port‑channels show operational status.
No access, fabric, or infra faults.
Leaf interfaces are up and error‑free.

Once validation is complete, gradually restore server or upstream connectivity and observe endpoint learning behavior.

Common Mistakes to Avoid

The most common mistakes during node ID swap in vPC environments include attempting a live swap, forgetting to remove vPC policies, cleaning only one leaf, or restoring traffic before full validation. Each of these can result in extended outages and complex recovery procedures.

Final Takeaway

A Cisco ACI leaf node ID swap in a vPC environment is a full teardown and rebuild operation, not a minor change. Success depends on treating both leaves as a single unit, removing all dependencies, cleaning both switches, and performing a controlled re‑addition process. When executed correctly, the swap is safe and fully supported, but shortcuts almost always lead to problems.

One‑Line Summary

In Cisco ACI, swapping node IDs on vPC‑connected leaf switches requires full vPC teardown, clean decommissioning of both leaves, and a controlled rebuild to avoid traffic loss and fabric instability.

How to Safely Decommission a Leaf Switch in Cisco ACI — Step-by-Step Guide with Checklist

Decommissioning a leaf switch in a Cisco ACI fabric is a high‑impact operational activity. Doing it correctly avoids outages, policy corruption, ghost nodes, and rediscovery issues. Doing it incorrectly can result in traffic loss, unresolved faults, and extended downtime.

Many engineers think decommissioning a leaf is as simple as clicking a button in APIC. In reality, Cisco ACI follows a strict dependency and lifecycle model, and a leaf must be carefully prepared before removal.

This blog explains how to decommission a leaf switch from an ACI fabric safely, with real‑world checks, APIC steps, CLI verification, and best practices. It is written for production environments, troubleshooting scenarios, and interviews.

What Does “Decommissioning a Leaf Switch” Mean in ACI?

In Cisco ACI, decommissioning a leaf switch means:

Removing the switch from fabric membership
Withdrawing all policies and certificates
Detaching the switch logically from the fabric control plane
Preparing it for power‑off, replacement, or reuse

Decommissioning is a logical operation, not just a hardware action.

Why Proper Decommissioning Is Critical

Incorrect decommissioning can lead to:

Traffic outages due to active endpoints
Broken vPC or port‑channel configurations
L3Out or BGP/OSPF failures
“Ghost nodes” still visible in APIC
Problems when re‑adding or repurposing the switch

In production data centers, a clean decommission is often part of:

Hardware refresh
RMA replacement
Capacity rebalancing
Fabric redesign

Pre‑Decommission Checklist (Very Important)

Before you touch APIC, validate all dependencies. This is where most failures occur.

1. Confirm No Active Endpoints on the Leaf

A leaf with active endpoints must not be decommissioned.

APIC GUI path:

Fabric → Inventory → Pod → Node → Leaf → Endpoints

✅ Endpoint count should be 0

Optional APIC CLI:

Shell

moquery -c fvCEp | grep node-<leaf-id>

If endpoints are present:

Migrate workloads
Shut down ports
Move static bindings

2. Remove Static EPG Bindings

Static path bindings directly tie an EPG to a leaf port.

APIC GUI path:

Tenant → Application Profile → EPG → Static Ports

Remove:

Access ports
Port‑channels
vPC bindings

⚠️ A leaf with static bindings will fail decommission.

3. Handle vPC and Port‑Channels Properly

If the leaf is part of a vPC pair:

Remove vPC associations
Remove port‑channels
Ensure services are moved to the peer leaf

Never decommission one side of an active vPC without cleanup.

4. Check if the Leaf Is a Border Leaf (L3Out)

If the leaf is used for L3Out:

Remove it from:
- Logical Node Profile
- Logical Interface Profile
Ensure routing is functional on alternate border leaves
Verify BGP/OSPF is stable

Decommissioning a border leaf without migration can cause external connectivity outages.

Step‑by‑Step: Decommissioning a Leaf Switch from ACI Fabric

Step 1: Verify Fabric Health (Recommended)

Before removing infrastructure components, ensure fabric health is stable.

Fabric → Inventory → Fabric Membership

No critical fabric‑wide faults
Controllers and spines healthy

This reduces unexpected behavior during changes.

Step 2: Decommission the Leaf from APIC

This is the main and officially supported step.

APIC GUI path:

Fabric → Inventory → Fabric Membership

Select the leaf switch
Click Actions
Choose Decommission
Confirm the action

APIC will:

Withdraw policies
Remove certificates
Update fabric membership state

⏱️ This usually takes 1–2 minutes.

Step 3: Verify Decommission Status

After completion, confirm the state.

APIC GUI:

Fabric → Inventory → Fabric Membership

Leaf should show:

Decommissioned or Removed

Optional APIC CLI:

Shell

moquery -c fabricNode | grep <leaf-id>

Step 4 (Optional but Strongly Recommended): Clean the Switch

Once decommissioned, APIC no longer manages the switch.
If the switch will be reused or re‑added, you must clean it locally.

How to Clean the Decommissioned Leaf

Access via:

Console
OOB management
CIMC / KVM (if available)

Run:

Shell

acidiag touch clean

reload

This removes:

Fabric certificates
Node ID
ACI state information

After reboot, the switch will be ready for fresh discovery.

What NOT to Do (Common Mistakes)

Mistake	Impact
Decommission with live endpoints	Traffic outage
Skip static path cleanup	Decommission failure
Forget L3Out dependencies	External routing outage
Power off without decommission	Ghost node in APIC
Skip cleaning before reuse	Rediscovery failures

Leaf vs Spine Decommissioning (Quick Comparison)

Item	Leaf	Spine
Endpoint check required	✅ Yes	❌ No
Policy dependency cleanup	✅ Mandatory	❌ Minimal
L3Out impact	✅ Possible	❌ None
Redundancy consideration	✅ Workloads	✅ Fabric
Last node restriction	❌ Leaf allowed	❌ Never remove last spine

Real‑World Decommission Scenarios

Scenario 1: Hardware Refresh

A leaf is replaced due to lifecycle expiry.
Decommission old leaf → clean → rack new leaf → approve membership.

Scenario 2: RMA Replacement

Failed leaf is decommissioned logically, replaced physically, and re‑added with the same or new node ID.

Scenario 3: Fabric Re‑design

Leaf removed as part of capacity reshaping or topology optimization.

Troubleshooting Decommission Failures

If decommission fails:

Check for:
- Active endpoints
- Static bindings
- vPC remnants
Look at Faults under the leaf
Verify no L3Out or service graph references remain

ACI always points to what dependency is blocking the operation.

Interview‑Ready Questions and Answers

Q: How do you decommission a leaf switch in Cisco ACI?
A: Remove all endpoints and policy dependencies, then decommission the leaf from Fabric Membership in APIC.

Q: Can you decommission a leaf with active endpoints?
A: No, endpoints must be removed first.

Q: Why run acidiag touch clean after decommission?
A: To remove fabric identity and prepare the switch for reuse or rediscovery.

Best Practices Summary

Always validate endpoints and bindings first
Treat border leaves with extra caution
Use vPC and redundancy wisely
Clean switches before reuse
Document node IDs and reasons for decommission

Final One‑Line Summary

In Cisco ACI, a leaf switch must be carefully prepared, cleaned of dependencies, and decommissioned through Fabric Membership in APIC to ensure a safe and outage‑free fabric operation.