ACS 5.2.0.26 Upgrade Hangs

I had a problem when upgrading from 5.1.0.44 to 5.2.0.26.

I’d see the following:

Saved the running configuration to startup successfully

Then the CLI would just hang, after an hour I CTRL+C ‘d out of it.

Stopping the ACS process and re-running the install fixed the issue.

ACS/user# application stop acs

Stopping ACS.
Stopping Management and View............................................................
Stopping Runtime........................
Stopping Database..............
Cleanup....

ACS/user# application upgrade ACS_5.2.0.26.tar.gz repo
Do you want to save the current configuration ? (yes/no) [yes] ?
Generating configuration...
Saved the running configuration to startup successfully

     <Still hung for a very long time here>

% CARS Install application required post install reboot...

Broadcast message from root (pts/0) (Tue May 29 13:48:54 2016):

The system is going down for reboot NOW!






Application upgrade successful
ACS/user#

Running a show process from another terminal after starting the upgrade showed that the install was working fine this time. I could see processes like tar, gzip and rpm running.

VPLS Unicast Flooding

Unicast flooding problems, usually associated with switched networks, can also impact VPLS.

If traffic is forwarded asymmetrically through a VPLS instance, unicast flooding of unknown frames can occur. I’ll step through a scenario where this could happen.

I set a LAB up with two CSR 1000V routers acting as PE routers, providing a VPLS instance. GNS3 was used to run the IOS routers acting as CE and C routers.

Base VPLS

In this scenario we could imagine that CE3 and CE4 are Internet routers, they run HSRP on the VPLS facing interfaces for redundancy. CE3 is the Active HSRP device with a HSRP address of 10.0.0.5. On the Internet side of CE3 and CE4 would typically be a BGP connection, although we use static routes in our scenario to easily cause asymmetric routing. With regular BGP connections to an ISP, it’s easy to imagine how asymmetric forwarding could arise.

CE1 and CE2 have static default routes pointing to the HSRP address to provide connectivity. CE3 points to C1 for the address 4.2.2.2. C1 has a static route for 10.0.0.0/24 pointing to CE4 to cause asymmetric routing.

This configuration forces traffic from / to CE1 (10.0.0.1) to 4.2.2.2 to be forwarded in the following direction.

VPLS Loop

This initially works fine. Looking at the MAC address table associated with the bridge domain connected to the VPLS instance on PE2, we can see that an entry for CE1 (CA02.02A0.0008) exists. Any traffic PE2 receives from CE4 destined for CE1 is forwarded across the pseudowire towards the neighbouring PE1 router.

PE2#show bridge-domain 200
Bridge-domain 200 (3 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 20
    GigabitEthernet3 service instance 20
    vfi LAB neighbor 172.16.0.1 300
   MAC address    Policy  Tag       Age  Pseudoport
   0000.0C07.AC01 forward dynamic   297  LAB.1001010
   CA02.02A0.0008 forward dynamic   252  LAB.1001010
   FFFF.FFFF.FFFF flood   static    0    OLIST_PTR:0xe8e8bc00
   CA00.016C.0008 forward dynamic   300  GigabitEthernet2.EFP20
   CA01.01F9.0008 forward dynamic   253  LAB.1001010
   CA02.016C.0008 forward dynamic   253  GigabitEthernet3.EFP20

In order to test for any flooding, we set up an access list egress on the PE2 to CE2 link.

ICMP ACL

We then send 100 pings from CE1 to 4.2.2.2.

CE1#ping 4.2.2.2 rep 100
Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 4.2.2.2, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 16/40/172 ms

The ICMP echo reply packets are routed back asymmetrically via PE2, the icmptest ACL does not increment and the unicast ICMP packets are not flooded out of the PE2 to CE2 link.

PE2#show access-lists     
Extended IP access list icmptest
    10 permit icmp any any
    20 permit ip any any (3881 matches)

By default, the MAC address entries are only in place for 300 seconds. As with a switch, this is updated based on the source address of frames received by the PE router.

As traffic is being forwarded asymmetrically and no communication occurs between CE1 and CE2, PE2 does not receive any frames to update the MAC address entry for CE1 (CA02.02A0.0008).

We see that the timer for the CE1 (CA02.02A0.0008) entry is decremented.

PE2#show bridge-domain 200
Bridge-domain 200 (3 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 20
    GigabitEthernet3 service instance 20
    vfi LAB neighbor 172.16.0.1 300
   MAC address    Policy  Tag       Age  Pseudoport
   0000.0C07.AC01 forward dynamic   299  LAB.1001010
   CA02.02A0.0008 forward dynamic   3    LAB.1001010
   FFFF.FFFF.FFFF flood   static    0    OLIST_PTR:0xe8e8bc00
   CA00.016C.0008 forward dynamic   298  GigabitEthernet2.EFP20
   CA01.01F9.0008 forward dynamic   238  LAB.1001010
   CA02.016C.0008 forward dynamic   238  GigabitEthernet3.EFP20

It then times out and is removed.

PE2#show bridge-domain 200
Bridge-domain 200 (3 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 20
    GigabitEthernet3 service instance 20
    vfi LAB neighbor 172.16.0.1 300
   MAC address    Policy  Tag       Age  Pseudoport
   0000.0C07.AC01 forward dynamic   298  LAB.1001010
   FFFF.FFFF.FFFF flood   static    0    OLIST_PTR:0xe8e8bc00
   CA00.016C.0008 forward dynamic   300  GigabitEthernet2.EFP20
   CA01.01F9.0008 forward dynamic   229  LAB.1001010
   CA02.016C.0008 forward dynamic   229  GigabitEthernet3.EFP20

This results in a problem where traffic being received be PE2, destined for CE1 has no entry in the MAC address table of PE2. The frames are therefore flooded by PE2.

ICMP Flooded

To prove that the frames are being flooded, 100 more echo requests are sent to 4.2.2.2.

CE1#ping 4.2.2.2 rep 100
Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 4.2.2.2, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 12/59/288 ms

The asymmetrically forwarded echo reply packets can now be seen on the ACL on the PE2 to CE2 link.

PE2#show access-lists     
Extended IP access list icmptest
    10 permit icmp any any (100 matches)
    20 permit ip any any (4114 matches)

Needless to say, this is at best very inefficient. Bandwidth on all links connected to this VPLS instance on PE2 is wasted.

If we generate any traffic from CE1 that reaches PE2 directly across the VPLS instance, the problem is resolved for a further 300 seconds.

A permanent solution can be put in place by lowering the ARP timeout on the VPLS facing interface on CE4 (although it would be recommended to apply on CE3 VPLS interface too).

Lowering the ARP timeout to 300 seconds on CE4 causes CE4 to send out an ARP request for any IP addresses is it forwarding traffic towards before it’s 300 second timer expires. The ARP reply that comes from CE1 towards CE4 causes the MAC address table on PE2 to be updated, preventing unicast flooding of frames destined for CE1.

Configuration applied as follows.

CE4#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
CE4(config)#int fa0/0
CE4(config-if)#arp timeout 300

We see the MAC address entry timeout for CE1 decrement down to 89 seconds.

PE2#show bridge-domain 200
Bridge-domain 200 (3 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 20
    GigabitEthernet3 service instance 20
    vfi LAB neighbor 172.16.0.1 300
   MAC address    Policy  Tag       Age  Pseudoport
   0000.0C07.AC01 forward dynamic   299  LAB.1001010
   CA02.02A0.0008 forward dynamic   89   LAB.1001010
   FFFF.FFFF.FFFF flood   static    0    OLIST_PTR:0xe8e8bc00
   CA00.016C.0008 forward dynamic   300  GigabitEthernet2.EFP20
   CA01.01F9.0008 forward dynamic   246  LAB.1001010
   CA02.016C.0008 forward dynamic   278  GigabitEthernet3.EFP20

An ARP request and reply is then exchanged between CE4 and CE1, and the timer is renewed.

PE2#show bridge-domain 200
Bridge-domain 200 (3 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet2 service instance 20
    GigabitEthernet3 service instance 20
    vfi LAB neighbor 172.16.0.1 300
   MAC address    Policy  Tag       Age  Pseudoport
   0000.0C07.AC01 forward dynamic   299  LAB.1001010
   CA02.02A0.0008 forward dynamic   300  LAB.1001010
   FFFF.FFFF.FFFF flood   static    0    OLIST_PTR:0xe8e8bc00
   CA00.016C.0008 forward dynamic   300  GigabitEthernet2.EFP20
   CA01.01F9.0008 forward dynamic   245  LAB.1001010
   CA02.016C.0008 forward dynamic   277  GigabitEthernet3.EFP20

When thousands of ICMP messages are sent to 4.2.2.2 from CE1, the icmptest ACL shows no unicast flooding.

PE2#show access-lists 
Extended IP access list icmptest
    10 permit icmp any any
    20 permit ip any any (179 matches)

MPLS MRU

The purpose of the MPLS MRU (Maximum Receive Unit) is to indicate the maximum size of a packet, including MPLS labels, that the local router router can forward without fragmenting. MRU is only locally significant.

If an incoming packet belonging to a particular FEC (Forwarding Equivalence Class) exceeds the MRU calculated for that FEC, the packet will require fragmentation prior to it being transmitted on the outgoing interface.

The MRU for each FEC varies depending on the MTU of the outgoing interface as well as the number of labels that are added / removed by the local router.

As an example please see the diagram below. R1 is sending an MPLS labeled packet destined to the loopback interface of R3 (172.16.0.3/32).

MPLS MRU

R1 has a local label of 18 for 172.16.0.3/32 and calculates an MRU of 1500 bytes based on a swap operation for R2’s advertised label of label 16.

R2 is popping a label off due to the penultimate hop pop before the IP packet is sent to R3. Based on the knowledge that the outgoing packet to R3 will be 4 bytes shorter, R2 calculates an MRU of 1504.

In this example the maximum sized unlabelled IP packet that could be sent without fragmentation would be 1496 bytes.

The MRU values along a path can be seen by performing an MPLS traceroute. The below traceroute is taken from R1 in the above example.

R1#traceroute mpls ipv4 172.16.0.3/32
Tracing MPLS Label Switched Path to 172.16.0.3/32, timeout is 2 seconds

Codes: '!' - success, 'Q' - request not sent, '.' - timeout,
  'L' - labeled output interface, 'B' - unlabeled output interface, 
  'D' - DS Map mismatch, 'F' - no FEC mapping, 'f' - FEC mismatch,
  'M' - malformed request, 'm' - unsupported tlvs, 'N' - no label entry, 
  'P' - no rx intf label prot, 'p' - premature termination of LSP, 
  'R' - transit router, 'I' - unknown upstream index,
  'X' - unknown return code, 'x' - return code 0

Type escape sequence to abort.
  0 10.0.0.0 MRU 1500 [Labels: 16 Exp: 0]
L 1 10.0.0.1 MRU 1504 [Labels: implicit-null Exp: 0] 16 ms
! 2 10.0.0.3 232 ms

In addition to this, the LFIB can also be checked on each of the routers, showing the MRU for FEC 172.16.0.3/32.

R1#show mpls forwarding-table 172.16.0.3 32 detail 
Local  Outgoing      Prefix            Bytes Label   Outgoing   Next Hop
Label  Label or VC   or Tunnel Id      Switched      interface          
18     16            172.16.0.3/32     0             Fa1/0      10.0.0.1
    MAC/Encaps=14/18, MRU=1500, Label Stack{16}
    CA0428E00006CA0328E0001C8847 00010000
    No output feature configured


R2#show mpls forwarding-table 172.16.0.3 32 detail 
Local  Outgoing      Prefix            Bytes Label   Outgoing   Next Hop
Label  Label or VC   or Tunnel Id      Switched      interface          
16     Pop Label     172.16.0.3/32     36886         Fa0/0      10.0.0.3
    MAC/Encaps=14/14, MRU=1504, Label Stack{}
    CA0628E10008CA0428E000088847 
    No output feature configured


R3#show mpls forwarding-table 172.16.0.3 32 detail 
Local  Outgoing      Prefix            Bytes Label   Outgoing   Next Hop
Label  Label or VC   or Tunnel Id      Switched      interface          
None   No Label      172.16.0.3/32     0             aggr-punt
    MAC/Encaps=0/0, MRU=0, Label Stack{}
    No output feature configured

If a push operation occurred along the forwarding path, this would cause a lower MRU on that router. If R2 was to push a label onto the stack for 172.16.0.3/32 rather than pop one off, the MRU on R2 would be 1496 bytes. The interface MTU of 1500 bytes would mean that an ingress packet larger than 1496 bytes would be larger than the 1500 byte interface MTU and therefore require fragmentation.