Sunday, 7 September 2025

Rogue Endpoint Detection in Cisco ACI

 🔐 Rogue Endpoint Detection in Cisco ACI

⚠️ Problem Addressed

Rogue endpoints or misconfigured devices can cause frequent MAC/IP moves across leaf switches, leading to:

  • Network instability
  • High CPU usage
  • Crashes in endpoint mapper (EPM) and client (EPMC)
  • Rapid log rollover, making debugging difficult

🛡️ How Rogue Endpoint Control Works

The feature helps mitigate these issues by:

  • Detecting rapidly moving endpoints (MAC/IP)
  • Quarantining them by making their entries static
  • Deleting the unauthorized MAC/IP after a set interval
  • Raising a fault for visibility
  • Generating a host tracking packet to re-learn the endpoint

🔄 Behavior Based on Software Version

Version

Quarantine Behavior

Traffic Handling

Final Action

Before 3.2(6)

Endpoint is made static

Traffic is dropped during quarantine

MAC/IP is deleted after the interval

3.2(6) and later

Endpoint is made static

Traffic is allowed during quarantine

MAC/IP is deleted after the interval

 Improvement: From 3.2(6) onwards, the system is less disruptive, allowing traffic to continue while still monitoring rogue behavior.


📝 Rogue/COOP Exception List

 Purpose

Allows higher tolerance for endpoint movement before marking as rogue.

📋 Behavior

  • Endpoints in the list are marked rogue only after 3,000 moves in 10 minutes
  • Once marked:
    • Endpoint is made static
    • Deleted after 30 seconds

🆕 From APIC 6.0(3) Onwards

  • You can:
    • Create global exception lists
    • Exclude MACs from rogue detection across all bridge domains or L3Outs
    • Exclude all MACs for a specific bridge domain or L3Out

 

Cisco ACI Mis-Cabling Protocol (MCP) – Loop Detection Simplified

Cisco ACI uses Mis-Cabling Protocol (MCP) to detect and mitigate Layer 2 loops, replacing traditional STP participation. MCP sends special Layer 2 packets across access ports, VPCs, and virtual ports. If the fabric receives its own MCP packet, it identifies a loop and can either log the event or error-disable the port.

 Key Highlights:

  • Global MCP policies are disabled by default; port-level policies are enabled.
  • Global MCP Policy:
    This is the master switch that controls whether MCP is active across the entire fabric.
    • Disabled by default: Even though individual ports may be configured to support MCP, no MCP packets are sent unless this global policy is explicitly enabled.
  • Port-Level MCP Policy:
    These are the interface-specific settings that determine how each port behaves when MCP is active.
    • Enabled by default: Ports are ready to participate in MCP loop detection, but they won’t actually send or process MCP packets unless the global policy is turned on.
  • MCP works complementarily with STP on external switches.
  • BPDU filtering or disabling loopguard on external switches helps prevent loop-related issues.
  • Endpoint move loop detection is available but disabled by default.
  • MCP supports native VLAN mode and per-VLAN mode (from APIC 2.0(2)) for granular loop detection.
  • Faster detection introduced in APIC 3.2(1) with transmission intervals as low as 100 ms.
  • Scalability limits: 256 VLANs per interface and 2000 logical ports per leaf switch. Per-VLAN MCP will only run on 256 VLANs per interface. If there are more than 256 VLANs, then the first numerical 256 VLANs are chosen.

🔐 MCP Modes:

  • Non-Strict Mode: Allows traffic while monitoring for loops; default detection time is 7 seconds.
  • Strict Mode (from APIC 5.2(4)):
    • Performs early loop detection before allowing data traffic.
    • Uses initial delay and grace period timers for STP convergence and aggressive MCP checks.
    • Requires port flap to activate on already-up ports.

⚠️ Strict Mode Guidelines:

  • Not supported on FEX or QinQ edge ports.
  • Requires APIC 5.2(4) or later on all participating leaf switches.
  • May impact vPC convergence time.
  • Must be disabled before downgrading the fabric.
  • Can cause both ports to error-disable if loops are detected simultaneously.

MCP Mode Comparison Table

Feature

Non-Strict Mode

Strict Mode

Traffic Acceptance

Accepts data and control traffic immediately

Initially blocks data traffic; only control packets allowed

Loop Detection Timing

MCP packets sent every 2 seconds; loop detection in ~7 seconds

Aggressive MCP packet transmission during grace period (default 3 sec)

Early Loop Detection

Not performed

Performed before allowing data traffic

Port Behavior on Loop Detection

Port is error-disabled

Port is error-disabled and shut down

Activation Requirement

Active immediately

Requires port flap to activate if port is already up

Timers Used

Global MCP instance policy

Initial delay timer + grace period timer

Default Initial Delay

Not applicable

0 seconds (can be set to 45–60 sec for STP convergence)

Default Grace Period

Not applicable

3 seconds

STP Compatibility

Works with STP

Accepts STP BPDUs even if VLAN is not enabled

Use Case

General loop detection

Early and aggressive loop prevention before traffic forwarding

 

Saturday, 6 September 2025

Cisco ACI, Storm Control : Drop and Shutdown

 In Cisco ACIStorm Control is a feature used to mitigate traffic storms caused by excessive broadcast, multicast, or unknown unicast traffic. It can be configured with two types of actions: Drop and Shutdown.


⚙️ Storm Control Actions in Cisco ACI

1. Drop (Default Action)

  • When traffic exceeds the configured threshold (either in packets per second or percentage of bandwidth), the excess traffic is dropped.
  • The port remains up and operational.
  • This is a non-disruptive method to suppress storm traffic.
  • Suitable for most environments where you want to limit traffic without affecting port availability 

2. Shutdown

  • When traffic exceeds the threshold:
    • Traffic is dropped for a soaking interval (default: 3 seconds).
    • If the storm persists, the port is administratively shut down at the end of the interval.
  • You can configure the soaking interval between 3 to 10 seconds.
  • This action is more aggressive and is used when dropping traffic alone is insufficient to protect the network 

🔍 Behavioral Differences

Feature

Drop

Shutdown

Traffic Handling

Drops excess traffic

Drops traffic, then shuts down port

Port Status

Remains up

Goes down if storm persists

Faults/Traps

Can raise SNMP traps

Interface traps raised; storm traps may be unreliable

1

Use Case

Mild suppression

Severe storm mitigation


🛠️ Configuration Notes

  • Storm Control is configured via Access Policies in ACI:
    • Fabric > Access Policies > Interface > Storm Control
  • You can apply it to:
    • Physical interfaces
    • Port channels
  • Monitoring policies can be added to raise alerts when thresholds are exceeded 

 

Sunday, 31 August 2025

Cisco ACI Port Security

  Cisco ACI Port Security – Summary

Purpose:
Controls the number of MAC addresses that can be learned on an interface to prevent unauthorized access and MAC flooding.


⚙️ Key Features

  • MAC Limit: Set a maximum number of MAC addresses per interface (0–12000).
  • Protect Mode: Only supported violation action.
    • Excess MAC addresses are dropped.
    • MAC learning is disabled temporarily.
    • Learning resumes after a timeout (default: 60 seconds).
  • Supported Interfaces: Physical ports, port channels, and vPCs.
  • Monitoring: Faults and syslogs are generated when limits are exceeded.

🚫 Restrictions

  • Not supported on Fabric Extender (FEX) ports.
  • Only MAC address limits are enforced (not MAC+IP).

🛠️ Configuration Path in APIC GUI

  1. Fabric → Access Policies → Interface Policies → Port Security
  2. Create and attach the policy to an Interface Policy Group
  3. Bind the group to a Switch Profile

 

Saturday, 30 August 2025

Cisco ACI - Fabric Secure Mode Overview

 Fabric Secure Mode Overview

Fabric Secure Mode is a security feature in Cisco ACI that safeguards the infrastructure from unauthorized additions. It ensures that only verified switches and APIC controllers can join the fabric, even if someone has physical access to the equipment.

Starting from release 1.2(1x), Cisco ACI performs a validation check during installation or upgrade. This check confirms that each device has a valid serial number and a Cisco-signed digital certificate.

By default, the system operates in Permissive Mode, allowing existing setups to continue functioning even if some devices lack valid certificates. However, administrators can enable Strict Mode for enhanced security, requiring manual approval for any new device joining the fabric.


⚙️ Modes of Operation

Mode

Permissive Mode (Default)

Strict Mode

Device Validation

Valid Cisco serial number and certificate required

Enforces serial number and certificate validation

Existing Fabric

Continues operating even with invalid certificates

Requires all devices to be validated

Authorization

Auto-discovers and allows devices without manual approval

Manual authorization needed for each new device

Security Level

Basic security

Enhanced security and control


To change the Fabric Secure Mode in Cisco ACI (e.g., from Permissive to Strict), follow these steps using the Cisco APIC GUI:

🔧 Steps to Change Fabric Secure Mode

  1. Log in to the APIC GUI.
  2. Navigate to:
    System → System Settings → Fabric Security
  3. In the Properties pane, locate the Fabric Secure Mode setting.
  4. Select Strict Mode from the available options.
  5. Save the configuration.
  6. Reboot the APIC and affected switches to apply the change.

⚠️ Important: Changing the mode requires a reboot for the configuration to take effect.

Cisco ACI - Node Stateful Vs Stateless reload

 

Aspect

Stateful Reload

Stateless Reload

Definition

Reload where process state is preserved using checkpoints

Reload where process starts fresh without any prior state

State Preservation

Yes – runtime state is saved to Persistent Storage Services (PSS)

No – process is restarted without retaining previous state

Recovery Speed

Faster – resumes from last known state

Slower – requires full reinitialization

System Impact

Minimal – seamless continuation of operations

Higher – may cause temporary disruption or delay

Use Case

Preferred for critical services needing quick recovery

Used when state cannot be preserved or process needs a clean start

Managed By

Persistent Storage Services (PSS)

System Manager

Example Scenario

Restarting a service with session data intact

Replacing a crashed process with a new instance

 

Thursday, 28 August 2025

ACI Node state - undiscovered Vs Unknown

 

  Undiscovered:

  • You’ve manually added a node ID in APIC.
  • The switch is not yet connected or powered on.
  • Could be due to cabling issues or incorrect port configuration.

  Unknown:

  • The switch is physically connected and sending LLDP packets.
  • APIC detects it but doesn’t have a matching Node ID policy.
  • You need to assign a Node ID to complete discovery.

 

Tuesday, 26 August 2025

What is a Contract Preferred Group in ACI?

 🔷 What is a Contract Preferred Group in ACI?

In Cisco ACI, Endpoint Groups (EPGs) typically require contracts to communicate with each other. This follows the “allow list” model, where communication is explicitly permitted only if a contract exists.

The Preferred Group (PG) feature simplifies this by allowing certain EPGs within the same VRF to communicate freely without contracts.


Key Concepts

Term

Description

Included EPGs

EPGs that are part of the preferred group and can communicate with each other without contracts.

Excluded EPGs

EPGs outside the preferred group that still require contracts to communicate.

VRF PG Setting

Must be enabled for the preferred group to work. Without this, even included EPGs won’t communicate freely.


🛠️ Configuration Steps

  1. Enable Preferred Group on VRF:
    • Go to the VRF settings in APIC or Nexus Dashboard Orchestrator (NDO).
    • Check the Preferred Group box.
  2. Add EPGs to the Preferred Group:
    • In the EPG properties, check Include in Preferred Group.
    • Save the configuration.
  3. Verify Membership:
    • You can view all EPGs in the preferred group under the VRF’s properties.

🌐 Multi-Site Considerations

  • In a stretched VRF across multiple sites, preferred group EPGs are shadowed in other sites to enable inter-site communication.
  • This allows, for example, a web EPG in Site 1 to communicate with an app EPG in Site 2 without contracts.

⚠️ Limitations

  • Preferred Groups are not supported for L3Out external EPGs.
  • If vzAny is already consuming/providing a contract in the VRF, you should not configure preferred groups.
  • All EPGs in a preferred group must be managed consistently (either all via APIC or all via NDO).

 

ACI Leaf Switch Replacement

 To replace a Cisco ACI leaf switch, follow these step-by-step instructions to ensure a smooth transition without disrupting your fabric:


🛠️ Preparation

  1. Document the existing switch details:
    • POD ID
    • Node ID
    • Node Name
    • Serial Number 
  1. Ensure the replacement switch is in ACI mode:
    • Connect via console and run show version.
    • If in NX-OS mode, convert to ACI mode using Cisco's documented procedure 
    • Before adding the new leaf switch to the fabric, ensure it's manually upgraded to the target image or one with a direct upgrade path. Avoid using intermediate images that require multiple upgrade steps, as they can trigger issues and impact your production environment. A final upgrade via policy helps ensure BIOS and FPGA components are properly updated.
  1. Clean up the replacement switch:
    • Run setup-clean-config.sh and then reload to remove any existing configuration 

🔄 Decommission the Faulty Leaf Switch

  1. Go to APIC GUI:
    Fabric > Inventory > Fabric Membership
  2. Right-click the faulty switch → Select Decommission.
  3. Once decommissioned, Remove from Controller and confirm the action 
  4. Physically disconnect and unmount the old switch.

🔌 Install and Connect the New Leaf Switch

  1. Mount the new switch and connect uplinks to spine switches. DONOT CONNECT DOWNLINK AT THIS STAGE
  2. Power on the switch.
  3. In APIC GUI, go to:
    Fabric > Inventory > Fabric Membership > Nodes Pending Registration
  4. Verify serial number, then Register the switch:
    • Use the same POD ID, Node ID, and Node Name as the old switch 
  1. Once registered, go to:
    Fabric > Inventory > Fabric Membership > Registered Nodes
    → Right-click → Select Commission.
  2. Wait for the switch to reach Active state.

🔍 Post-Replacement Validation

  1. Connect downlink cables (after switch is active).
  2. Go to:
    Fabric > Inventory > Topology
    → Verify the switch is visible and operational.
  3. SSH into APIC and run:

→ Confirm switch status is active 

  1. If you get SSH warnings (e.g., DNS spoofing), update the known_hosts file:

🧩 Troubleshooting Tips

  • Switch not discovered: Check LLDP neighbors and cable connections.
  • Switch shows "Not Supported": Upgrade APIC firmware to match switch model.
  • No TEP IP assigned: May be a DHCP issue—contact Cisco TAC.
  • SSL issues: Check for established sessions on port 12215