{"id":1330,"date":"2014-06-27T10:50:36","date_gmt":"2014-06-27T09:50:36","guid":{"rendered":"http:\/\/blogs-new.it.ox.ac.uk\/networks\/?p=1330"},"modified":"2014-06-27T10:50:36","modified_gmt":"2014-06-27T09:50:36","slug":"cisco-networking-and-eduroam-routing","status":"publish","type":"post","link":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/2014\/06\/27\/cisco-networking-and-eduroam-routing\/","title":{"rendered":"Cisco networking and eduroam: Routing"},"content":{"rendered":"<p>This is the first post in a series discussing some of the finer details of the networking setup for the new eduroam infrastructure that went into production last month.<\/p>\n<p>In this post, I will be covering the IP routing setup of the new networking infrastructure. This uses static routing &amp; Virtual Routing &amp; Forwarding instances (VRF) to get traffic from clients using the eduroam service out on to the Internet. Following on from this, I\u2019ll explain the associated failover setup we opted for which uses the IOS &#8216;object-state tracking&#8217; feature in a somewhat clever way for our active\/standby setup.<\/p>\n<p>What I won\u2019t be covering here is how the traffic traverses the university backbone (from the FroDos) and is aggregated at a nominated egress (C) router within the backbone. This is because the mechanism for achieving this hasn\u2019t actually changed much. It still uses the cleverness of the \u2018Location Independent Network\u2019 (LIN) system. I will mention briefly though that this makes use of VRFs, Multi-Protocol Label Switching (MPLS) and Multi-Protocol extensions to the Border Gateway Protocol (MP-BGP) to achieve this task. This allows us to provide LIN services (of which eduroam is one service) to many buildings around the collegiate university in a scalable way, whilst isolating these networks from others on the backbone.<\/p>\n<p>Also omitted from this post are the details on how traffic from the Internet reaches our eduroam clients. Again, this is achieved in much the same way as before, using a combination of an advertising statement in our BGP configuration and some light static routing at the border for the new external eduroam IP range to get traffic to the new infrastructure.<\/p>\n<h2>So what are we working with?<\/h2>\n<p>We procured two Cisco Catalyst 4500-X switches which run the IOS-XE operating system. For those not familiar with this platform, these are all SFP\/SFP+ switches in a 1U fixed-configuration form-factor. As well as delivering the base L2\/L3 features you\u2019d normally expect from a switch, this platform also delivers some other cool features you might perhaps expect to find in a more advanced chassis-based form factor (at least in Cisco&#8217;s offerings anyway).<\/p>\n<p>Specifically in the context of the new eduroam infrastructure, we\u2019re using the Virtual Switching System (VSS) to pair these switches up to act as one logical router and also microflow policing for User Based Rate Limiting (UBRL). The latter of these features will be discussed at length in a later post. There are of course other features available within this platform which are noteworthy but I won\u2019t be discussing them here.<\/p>\n<p>Running VSS in any scenario has some obvious benefits, not least of which negating the need for any First-Hop Redundancy Protocol (FHRP) or Spanning-Tree Protocol (STP). It also allows us to use Multi-chassis EtherChannels (MECs) for our infrastructure interconnects. In non-Cisco speak, these are link aggregations that consist of member ports that each connect to a different 4500-X switch in our VSS pair.\u00a0 For more information on the L1\/L2 side of things, please see my previous post \u2018Building the eduroam networking infrastructure\u2019. All MECs have been configured in routed (no switchport) mode rather than in switching (switchport) mode. This makes the configuration far simpler in my opinion.<\/p>\n<p>So with all this in mind, the diagram below illustrates how this looks from a logical point-of-view including some IP addressing we defined for the routed links in our new infrastructure:<\/p>\n<h2><a href=\"http:\/\/blogs-new.it.ox.ac.uk\/networks\/files\/2014\/06\/Eduroam-backend-refresh-L3-routing-2.0.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-1453\" alt=\"Eduroam-backend-refresh-L3-routing-2.0\" src=\"http:\/\/blogs-new.it.ox.ac.uk\/networks\/files\/2014\/06\/Eduroam-backend-refresh-L3-routing-2.0-1024x938.jpg\" width=\"640\" height=\"586\" srcset=\"https:\/\/blogs-new.it.ox.ac.uk\/networks\/files\/2014\/06\/Eduroam-backend-refresh-L3-routing-2.0-1024x938.jpg 1024w, https:\/\/blogs-new.it.ox.ac.uk\/networks\/files\/2014\/06\/Eduroam-backend-refresh-L3-routing-2.0-300x274.jpg 300w, https:\/\/blogs-new.it.ox.ac.uk\/networks\/files\/2014\/06\/Eduroam-backend-refresh-L3-routing-2.0.jpg 1733w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><\/h2>\n<h2>Considering &amp; applying the routing basics<\/h2>\n<p>OK, so with our network foundations built, we needed to configure the routing to get everything talking nicely.<\/p>\n<p>Before I went gung ho configuring boxes, I thought it would be best to stand back and have a think about our general requirements for the routing configuration. At this point, it is noteworthy to mention that all Network Address Translation (NAT) in the design is handled externally by the Linux hosts in our infrastructure (my colleague Christopher has written an excellent post covering the finer points of NAT on Linux for those interested).<\/p>\n<p>I summarised our requirements for the routing configuration as follows:<\/p>\n<ol>\n<li>Traffic from clients egressing the university backbone (addressed within the internal eduroam LIN service IP range 10.16.0.0\/12) should have one default route through the currently active Linux host firewall. This is pre NAT of course and the routing for replies back to the clients should also be configured;<\/li>\n<li>Traffic from clients that makes it through the Linux host firewall egressing towards the Internet (NAT&#8217;d to addresses within the external eduroam IP range 192.76.8.0\/26) should have one default route through the currently active border router. Once again, the routing for replies back to the clients should also be configured;<\/li>\n<li>Routing via direct paths (bypassing our Linux firewalls) should not be allowed;<\/li>\n<li>Ideally, the routing of management traffic should be kept isolated from normal data traffic.<\/li>\n<\/ol>\n<p>With these requirements in mind, I started to consider technical options.<\/p>\n<p>First of all, we decided to meet requirements 3 &amp; 4 using VRFs. More specifically, what we would use is defined as a VRF \u2018lite\u2019 configuration &#8211; that is, separate routing table instances but without the MPLS\/MP-BGP extensions. At this point, I would highlight that for the 4500-X platform, the creation of additional VRFs required the &#8216;Enterprise Services&#8217; licence to be purchased and applied to each switch. This may not be the case with other platforms so if it\u2019s a feature you ever intend to use, do ensure you check the licensing level required &#8211; of course I\u2019m sure everyone checks these things first right?<\/p>\n<p>To fulfil requirement 4, we would make use of the stock \u2018mgmtVrf\u2019 VRF built-in to many Cisco platforms (including the 4500-X) for the purpose of Out-Of-Band (OOB) management via a dedicated management port. This port is by default locked to this VRF anyway (so you can&#8217;t change its assignment even if you wanted to). We were forced down this route because there are no other built-in baseT ethernet ports on these switches to connect to our local OOB network &#8211; OK, we could have installed a copper gigabit SFP transceiver in one of the front-facing ports, but that would have been a waste considering the presence of a dedicated management port! I&#8217;ll avoid further discussion of this here as it&#8217;s outside the scope of this post. However I do intend to cover this topic in a later post as setting this up really wasn\u2019t as easy as it should have been in my honest opinion.<\/p>\n<p>So, I started with the following configuration to break up the infrastructure generally into two \u2018zones\u2019. One VRF for an \u2018inside\u2019 zone (university internal side) and another for an \u2018outside\u2019 zone (the Internet facing side):<\/p>\n<pre>vrf definition inside\r\n\u00a0 address-family ipv4\r\n\u00a0 exit-address-family\r\nexit\r\n\r\nvrf definition outside\r\n\u00a0 address-family ipv4\r\n\u00a0 exit-address-family\r\n\u00a0exit<\/pre>\n<p>Note the syntax to create VRFs on IOS-XE is quite different to that of it\u2019s IOS counterparts. In IOS-XE It is necessary to define address family configurations for each routed protocol you wish to operate (in a similar way to how you would do with a BGP configuration for example). In this scenario, we are only running unicast IPv4 (for now at least) so that\u2019s what was configured. With our new VRFs established, it was then necessary to assign the appropriate interfaces to each VRF and give them some IP addressing. The example below depicts this process for two example interfaces \u2013 I simply rinsed and repeated as necessary for the others in the topology:<\/p>\n<pre>interface Port-channel50\r\n\u00a0description to COUCS1\r\n\u00a0no switchport\r\n\u00a0vrf forwarding inside\r\n\u00a0ip address 192.76.34.30 255.255.255.252\r\n\u00a0no shut\r\n\u00a0exit\r\n\r\ninterface Port-channel60\r\n\u00a0description to JOUCS1\r\n\u00a0no switchport\r\n\u00a0vrf forwarding outside\r\n\u00a0ip address 192.76.34.194 255.255.255.252\r\n\u00a0no shut\r\n\u00a0exit<\/pre>\n<p>With this completed for all interfaces, I verified the routing tables had been populated like so:<\/p>\n<pre>#<b>Global table:\r\n<\/b>lin-router#sh ip route\r\n&lt;snip&gt;\r\nGateway of last resort is not set\r\n\r\n<b>\u2018Inside\u2019 VRF table:<\/b>\r\nlin-router#sh ip route vrf inside\r\n&lt;snip&gt;\r\n\r\nGateway of last resort is not set\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.0\/24 is variably subnetted, 8 subnets, 2 masks\r\nC\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.28\/30 is directly connected, Port-channel50\r\nL\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.30\/32 is directly connected, Port-channel50\r\nC\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.56\/30 is directly connected, Port-channel51\r\nL\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.58\/32 is directly connected, Port-channel51\r\nC\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.92\/30 is directly connected, Port-channel10\r\nL\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.94\/32 is directly connected, Port-channel10\r\nC\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.96\/30 is directly connected, Port-channel11\r\nL\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.98\/32 is directly connected, Port-channel11\r\n\r\n<b>\u2018Outside\u2019 VRF table\r\n<\/b>lin-router#sh ip route vrf outside\r\n&lt;snip&gt;\r\n\r\nGateway of last resort is not set\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 163.1.0.0\/16 is variably subnetted, 4 subnets, 2 masks\r\nC\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 163.1.120.0\/30 is directly connected, Port-channel20\r\nL\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 163.1.120.2\/32 is directly connected, Port-channel20\r\nC\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 163.1.120.4\/30 is directly connected, Port-channel21\r\nL\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 163.1.120.6\/32 is directly connected, Port-channel21\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.0\/24 is variably subnetted, 4 subnets, 2 masks\r\nC\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.192\/30 is directly connected, Port-channel60\r\nL\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.194\/32 is directly connected, Port-channel60\r\nC\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.208\/30 is directly connected, Port-channel61\r\nL\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.210\/32 is directly connected, Port-channel61<\/pre>\n<p>This output confirms that I addressed the interfaces properly, assigned them to the correct VRFs and that they were operational (ie capable of forwarding). It also confirmed the presence of <b>no <\/b>routes in the global routing table which is what we wanted &#8211; isolation!<\/p>\n<p>At this point though, it would still be possible to \u2018leak\u2019 routes between VRFs so to eliminate this concern, I applied the following command:<\/p>\n<pre>no ip route static inter-vrf<\/pre>\n<p>So we now have some routing-capable interfaces isolated within our defined VRFs. Next, we need to make things talk to each other!<\/p>\n<h2>Considering static routing vs dynamic routing<\/h2>\n<p>We needed a routing configuration to get some end-to-end connectivity between our internal eduroam clients and the outside world. This basically boiled down to one major question and fundamental design decision &#8211;\u00a0 \u2018<em>Shall I define <\/em><i>static routes or use a routing protocol to learn them?<\/i>\u2019 There are always pros and cons to either choice in my honest opinion.<\/p>\n<p>Why? Well static routing is great in its simplicity and for the fact it doesn\u2019t suck up valuable resources on networking platforms. It does however have the potential for laborious administrative overhead \u2013 especially if used excessively! In other words, it doesn\u2019t scale well in some large deployments.<\/p>\n<p>Dynamic routing via an Interior Gateway Protocol (IGP) can be a great choice depending on the situation and which one you choose. They reduce the need for manual administrative overhead when changes occur but this does come at a price. Routing protocols consume resources such as CPU cycles and require administrators to have a sound knowledge of their internal mechanisms and their intricacies when things go wrong. This can get interesting (or painful) depending on the problem scenario!<\/p>\n<p>So I would suggest this decision comes to picking the \u2018<em>right tool for the right job<\/em>\u2019. As a general rule of thumb, I tend to work on the basis that large environments with many routes that change frequently probably need an IGP configuration. Everything else can usually be done with static routing.<\/p>\n<h3>Some history<\/h3>\n<p>Previously with the old infrastructure, we made use of the Routing Information Protocol version 2 (RIPv2) IGP to learn and propagate routes. I believe this was a design decision based on two main factors &#8211; I leave room for being wrong here though as it was admittedly before my time. I summarised these as:<\/p>\n<ol>\n<li>The need for two physical switches performing the routing for internal and external zones &#8211; This in itself would have mandated a larger number of static routes so an IGP configuration probably seemed like a more logical choice at the time;<\/li>\n<li>RIPv2 was the only IGP available using the IP base license on the Catalyst 3560 switches.<\/li>\n<\/ol>\n<p>There could have been other reasons too of course. RIPv2 for those that don\u2019t know is a \u2018distance-vector\u2019 routing protocol that uses \u2018hop count\u2019 as it\u2019s metric.<\/p>\n<p>RIPv2 communicated routes between the separate internal and external switches in the old topology through the active Linux firewall host. What this meant in production was that a loss of a link or the Linux host running the firewall resulted in a re-convergence of the routed topology to use the standby path. The convergence process when using RIPv2 is quite slow really and to initiate a failover manually (say you wanted to pull the Linux host offline to perform some maintenance for example) meant re-configuring an \u2018offset list\u2019 to manipulate the hop count of the routes to reflect your desired topology. Granted this all worked, but it felt a little clunky at times!<\/p>\n<h3>Static routing simplicity<\/h3>\n<p>For the new infrastructure, we don\u2019t have two switches performing the routing (there are two switches but these are logically arranged as one with VSS). Instead we have logical separation with VRFs which equates to having two logical routers. With this design, there is no requirement for direct inter-VRF communication \u2013 instead our firewalls provide inter-VRF communication as required. This, coupled with the considerations above, ultimately led to a decision to use a static routing configuration over one based on dynamic routing with an IGP.<\/p>\n<p>To elaborate further, the routing configuration in this new design really only requires two routes per VRF per path (ignoring the mgmtVrf). For the active path for example, these are:<\/p>\n<pre><strong>#From eduroam clients to Linux firewall host:<\/strong>\r\nip route vrf inside 0.0.0.0 0.0.0.0 192.76.34.93\r\n\r\n<strong>#From Linux firewall host to eduroam clients:<\/strong>\r\nip route vrf inside 10.16.0.0 255.240.0.0 192.76.34.29\r\n\r\n<strong>#From eduroam clients (post-NAT)\u00a0 to the Internet<\/strong>\r\nip route vrf outside 0.0.0.0 0.0.0.0 192.76.34.193\r\n\r\n<strong>From the Internet to eduroam clients (post-NAT)<\/strong>\r\nip route vrf outside 192.76.8.0 255.255.255.0 163.1.120.1<\/pre>\n<p>So this is a very simple and lightweight static routing configuration really. OK, so it does get a little larger and more complicated with the failover mechanism and the standby path routes included, but not by much as you&#8217;ll see shortly. In total there are only ever likely to be a handful of routes in this configuration that are unlikely to change very frequently so the administrative overhead is negligible.<\/p>\n<h2>How shall we handle failures?<\/h2>\n<p>At this point, assuming we\u2019d configured the routing as described and had added our standby routes in exactly the same fashion, what we\u2019d have actually ended up with is an active\/active type setup &#8211; at least from the networking point-of-view. This would have resulted in traffic through our infrastructure being load-balanced across all available routes via both firewall hosts.<\/p>\n<p>Configuring the additional routes in this way might have been OK had these general caveats not been true of our firewall\/NAT setup:<\/p>\n<ul>\n<li>The NAT rules on both firewall hosts translate traffic sourced from internal (RFC1918) IP addresses into the same external IP address range;<\/li>\n<li>The firewall hosts do not work together to keep track of the state of their NAT translation tables.<\/li>\n<\/ul>\n<p>So at this point, my work clearly wasn\u2019t done yet. In our scenario we were most certainly going to carry on with an active\/standby setup (at least in the short-term).<\/p>\n<p>I reached the conclusion that what was needed was a way to track the state of the active path to make sure that if a full or partial path failure occurred, a failover mechanism would ensure all traffic would use the secondary path instead.<\/p>\n<h3>Standby path routes<\/h3>\n<p>When I added these routes, I in fact configured them slightly differently. Specifically, I configured them with a higher Administrative Distance (AD) value.<\/p>\n<p>To explain briefly, AD is assigned based on the source of the route. For instance, we can consider two sources in this context to be routes that have been statically configured, or ones that have been learned via an IGP for example. There are some default values IOS &amp; IOS-XE assigns to each route source. AD only comes into play if you have more than one exactly matching candidate route to a destination (of the same prefix length) offered to the routing table from different sources. The one with the lowest AD in this situation wins and is then installed in the routing table.<\/p>\n<p>You can view the AD value currently assigned to a route by interrogating the routing table. For example, let&#8217;s look at the static routes in the inside VRF routing table:<\/p>\n<pre>lin-router#sh ip route vrf inside static\r\n\r\n&lt;snip&gt;\r\n\r\nGateway of last resort is 192.76.34.93 to network 0.0.0.0\r\n\r\nS*\u00a0\u00a0\u00a0 0.0.0.0\/0 [<strong>1<\/strong>\/0] via 192.76.34.93\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 10.0.0.0\/12 is subnetted, 1 subnets\r\nS\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10.16.0.0 [<strong>1<\/strong>\/0] via 192.76.34.29<\/pre>\n<p>I&#8217;ve highlighted the AD values in bold in the output for illustration purposes. You can see the default AD value of &#8216;1&#8217; is applied to these routes. The second value is the &#8216;metric&#8217; of the route, in the case of the two routes shown here, the next-hop is connected to the router so this is &#8216;0&#8217;.<\/p>\n<p>So in the case of our standby routes, I assigned an AD value\u00a0 of \u2018254\u2019 to the standby routes. This was achieved using the following commands:<\/p>\n<pre><strong>#From eduroam clients to Linux firewall host:<\/strong>\r\nip route vrf inside 0.0.0.0 0.0.0.0 192.76.34.97 <strong>254<\/strong>\r\n\r\n<strong>#From Linux firewall host to eduroam clients:<\/strong>\r\nip route vrf inside 10.16.0.0 255.240.0.0 192.76.34.57 <strong>254<\/strong>\r\n\r\n<strong>#From eduroam clients (post-NAT) to the Internet<\/strong>\r\nip route vrf outside 0.0.0.0 0.0.0.0 192.76.34.209 <strong>254<\/strong>\r\n\r\n<strong>From the Internet to eduroam clients (post-NAT)<\/strong>\r\nip route vrf outside 192.76.8.0 255.255.255.0 163.1.120.5 <strong>254<\/strong><\/pre>\n<p>You may see the creation of static routes with an artificially high AD value sometimes referred to as creating \u2018floating\u2019 routes. They can be considered to float because they will never be installed in the routing table (or sink if you will) provided that matching routes with a better (lower) AD value have already been installed. So our standby path routes will now be offered to the routing table in the event the active ones disappear for any reason.<\/p>\n<p>At this point, I noted that we could still end up in a situation where a new path made up of a hybrid of both active and standby links could be selected. In our scenario, I feared this could result in undesired asymmetric routing and make traffic paths harder to predict. What I really wanted was an easily predictable path every time regardless of where a failure occurred or the nature of such a failure.<\/p>\n<h2>Introducing IOS &#8216;object-state tracking&#8217;<\/h2>\n<p>The object-state tracking feature does pretty much what the name implies. You configure a tracking object to check the state of something \u2013 be it an interface\u2019s line protocol status or a static route\u2019s next hop reachability for instance. The two possible states can either be \u2018up\u2019 or \u2018down\u2019 and depending on the configuration you apply and a change in state can trigger some form of action.<\/p>\n<h3>What to track and how to track it<\/h3>\n<p>It was clear that what was needed was a way to track each of our directly connected links making up our active path. To re-cap, these are:<\/p>\n<p><b>\u2018Inside VRF\u2019<\/b><\/p>\n<ul>\n<li>C\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.28\/30 is directly connected, Port-channel50<\/li>\n<li>C\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.92\/30 is directly connected, Port-channel10<\/li>\n<\/ul>\n<p><b>\u2018Outside VRF\u2019<\/b><\/p>\n<ul>\n<li>C\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 163.1.120.0\/30 is directly connected, Port-channel20<\/li>\n<li>C\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 192.76.34.192\/30 is directly connected, Port-channel60<\/li>\n<\/ul>\n<p>To start with, I decided to map these to separate tracking-objects using the following configuration:<\/p>\n<pre>track 2 ip route 192.76.34.92 255.255.255.252 reachability\r\n\u00a0ip vrf inside\r\n\u00a0delay down 2 up 2\r\n\r\ntrack 3 ip route 192.76.34.28 255.255.255.252 reachability\r\n\u00a0ip vrf inside\r\n\u00a0delay down 2 up 2\r\n\r\ntrack 4 ip route 163.1.120.0 255.255.255.252 reachability\r\n\u00a0ip vrf outside\r\n\u00a0delay down 2 up 2\r\n\r\ntrack 5 ip route 192.76.34.192 255.255.255.252 reachability\r\n\u00a0ip vrf outside\r\n\u00a0delay down 2 up 2<\/pre>\n<p>One potential gotcha to watch for when configuring tracking objects for routes\/interfaces assigned within VRFs is that it is also necessary to define the VRF in the object itself. If you don\u2019t, you&#8217;ll likely find that your object will never reach an up state (because the entity being tracked doesn\u2019t exist as far as the global routing table is concerned). I admit, I got caught out by this the first time around!<\/p>\n<p>Note that an alternative strategy I could have chosen would have been to monitor the line protocol of the interfaces involved. There is a good reason I didn&#8217;t configure the objects this way. This is basically because it&#8217;s inherently possible for the line protocol of the interfaces to stay up but there be other issues causing an IP to be unreachable. I therefore figured tracking reachability would be the safest and most reliable option for our scenario.<\/p>\n<p>Also delay up\/down values (in seconds) have been defined. These just add a delay of 2 seconds whenever the state of one of the objects changes from up-&gt;down or down-&gt;up. I&#8217;ll explain this further in the context of our failover mechanism shortly.<\/p>\n<h3>Tying the tracking configuration together with the other elements<\/h3>\n<p>At this point, the configuration gets a bit more interesting (at least in my view). What I wasn\u2019t originally aware of is that it\u2019s possible to in effect \u2018nest\u2019 a list of tracking objects within another tracking object. Therefore to meet our requirements, I created another tracking object (the \u2018parent\u2019) to track the objects I created earlier (the\u00a0\u2018daughters\u2019):<\/p>\n<pre>track 1 list boolean and\r\n\u00a0object 2\r\n\u00a0object 3\r\n\u00a0object 4\r\n\u00a0object 5\r\n\u00a0delay down 2 up 2<\/pre>\n<p>This configuration allows us to track the state of many daughter objects. If one of these ever reaches the \u2018down\u2019 state, this also causes the parent tracking object to follow suit using the \u2018boolean and\u2019 logic parameter.<\/p>\n<p>With the object-tracking configuration completed, I proceeded to amend the static route configuration for the active path to make use of the parent tracking object:<\/p>\n<pre><strong>#Removing previous static routes for active path:<\/strong>\r\nno ip route vrf inside 0.0.0.0 0.0.0.0 192.76.34.93\r\nno ip route vrf inside 10.16.0.0 255.240.0.0 192.76.34.29\r\nno ip route vrf outside 0.0.0.0 0.0.0.0 192.76.34.193\r\nno ip route vrf outside 192.76.8.0 255.255.255.0 163.1.120.1\r\n\r\n<strong>#Re-adding static routes with reference to parent tracking object:<\/strong>\r\nip route vrf inside 0.0.0.0 0.0.0.0 192.76.34.93 track 1\r\nip route vrf inside 10.16.0.0 255.240.0.0 192.76.34.29 track 1\r\nip route vrf outside 0.0.0.0 0.0.0.0 192.76.34.193 track 1\r\nip route vrf outside 192.76.8.0 255.255.255.0 163.1.120.1 track 1<\/pre>\n<p>What this gives us is a mechanism that will remove *all* the active path static routes if any one, many or all of the directly connected active links fails. The cumulative delay between an object state change (and therefore when any routing table change will occur) in our scenario should be:<\/p>\n<p>daughter_object_delay + parent_object delay = total delay time.<\/p>\n<p>So that&#8217;s:<\/p>\n<p>2 + 2 = <strong>4<\/strong> seconds of total delay time.<\/p>\n<p>You might be wondering why I configured these particular delay values on the objects, or even why I bothered delay times at all. Well, I did so in an effort to guard against the possibility of the state of an object rapidly transitioning.<\/p>\n<p>Why could this be an issue? Well in our scenario here, it could result in routing table &#8216;churn&#8217; (routes rapidly being installed and withdrawn from the routing table) which in-turn could have a negative impact on the performance of the switches. Frankly, I don&#8217;t see this being a likely occurrence and even if it did, I&#8217;m not sure it would be enough to drastically impact the performance of the switches (especially in light of their relatively high hardware specification) but the rapid state transitioning could be possible, say for instance, if a link were to flap (go up and down rapidly) because of an odd interface or transceiver fault. It&#8217;s probably best to think of these values and their configuration as a kind of insurance policy.<\/p>\n<p>Generally, I think the resulting failover time of approximately 5 seconds is acceptable in this scenario and is certainly going to be an improvement over what we would have experienced with the old infrastructure using RIPv2.<\/p>\n<h2>Does it work?<\/h2>\n<p>Yes it does and to prove the point, I\u2019ll demonstrate this using an identical configuration I \u2018labbed up earlier\u2019 in our development environment. Rest assured, it\u2019s been tested in our production environment too and we\u2019re confident it works in exactly the same way as what&#8217;s shown below.<\/p>\n<p>Here\u2019s some output from the \u2018show track\u2019 command illustrating everything in a working happy state:<\/p>\n<pre>Rack1SW3#show track\r\nTrack 1\r\n\u00a0 List boolean and\r\n\u00a0 Boolean AND is Up\r\n\u00a0\u00a0\u00a0 112 changes, last change 2w5d\r\n\u00a0\u00a0\u00a0 object 2 Up\r\n\u00a0\u00a0\u00a0 object 3 Up\r\n\u00a0\u00a0\u00a0 object 4 Up\r\n\u00a0\u00a0\u00a0 object 5 Up\r\n\u00a0 Delay up 2 secs, down 2 secs\r\n\u00a0 Tracked by:\r\n\u00a0\u00a0\u00a0 STATIC-IP-ROUTINGTrack-list 0\r\nTrack 2\r\n\u00a0 IP route 192.76.34.92 255.255.255.252 reachability\r\n\u00a0 Reachability is Up (connected)\r\n\u00a0\u00a0\u00a0 106 changes, last change 2w5d\r\n\u00a0 Delay up 2 secs, down 2 secs\r\n\u00a0 VPN Routing\/Forwarding table \"inside\"\r\n\u00a0 First-hop interface is Port-channel10\r\nTrack 3\r\n\u00a0 IP route 192.76.34.28 255.255.255.252 reachability\r\n\u00a0 Reachability is Up (connected)\r\n\u00a0\u00a0\u00a0 2 changes, last change 12w0d\r\n\u00a0 Delay up 2 secs, down 2 secs\r\n\u00a0 VPN Routing\/Forwarding table \"inside\"\r\n\u00a0 First-hop interface is Port-channel48\r\nTrack 4\r\n\u00a0 IP route 163.1.120.0 255.255.255.252 reachability\r\n\u00a0 Reachability is Up (connected)\r\n\u00a0\u00a0\u00a0 96 changes, last change 2w5d\r\n\u00a0 Delay up 2 secs, down 2 secs\r\n\u00a0 VPN Routing\/Forwarding table \"outside\"\r\n\u00a0 First-hop interface is Port-channel20\r\nTrack 5\r\n\u00a0 IP route 192.76.34.192 255.255.255.252 reachability\r\n\u00a0 Reachability is Up (connected)\r\n\u00a0\u00a0\u00a0 4 changes, last change 12w0d\r\n\u00a0 Delay up 2 secs, down 2 secs\r\n\u00a0 VPN Routing\/Forwarding table \"outside\"\r\n\u00a0 First-hop interface is Port-channel47<\/pre>\n<p>So you can see that aside from the interface numbering used in the development environment, the configuration used is the same.<\/p>\n<p>I\u2019ll simulate a failure of the inside link between the router and our active Linux firewall host by shutting down the associated interface (Port-channel10). I\u2019ve also enabled debugging of tracking objects using the &#8216;debug track&#8217; command which simplifies the demonstration and saves me the effort of manually interrogating the routing table or the tracking object to verify that the change took place:<\/p>\n<pre>Rack1SW3#conf t<strong>\r\nRack1SW3(config)#int po10\r\nRack1SW3(config-if)#shut<\/strong>\r\nRack1SW3(config-if)#\r\n^Z\r\nRack1SW3#\r\n<strong>*May 24 04:35:39.488: %LINEPROTO-5-UPDOWN: Line protocol on \r\nInterface Port-channel10, changed state to down<\/strong>\r\nRack1SW3#\r\n*May 24 04:35:40.452: %LINK-5-CHANGED: Interface FastEthernet1\/0\/9, \r\nchanged state to administratively down\r\n*May 24 04:35:40.469: %LINK-5-CHANGED: Interface FastEthernet1\/0\/10, \r\nchanged state to administratively down\r\n<strong>*May 24 04:35:40.478: %LINK-5-CHANGED: Interface Port-channel10, \r\nchanged state to administratively down<\/strong>\r\n*May 24 04:35:41.459: %LINEPROTO-5-UPDOWN: Line protocol on \r\nInterface FastEthernet1\/0\/9, changed state to down\r\nRack1SW3#\r\n*May 24 04:35:41.476: %LINEPROTO-5-UPDOWN: Line protocol on \r\nInterface FastEthernet1\/0\/10, changed state to down\r\nRack1SW3#\r\n<strong>*May 24 04:35:52.364: Track: 2 Down change delayed for 2 secs\r\n<\/strong>Rack1SW3#<strong>\r\n*May 24 04:35:54.369: Track: 2 Down change delay expired\r\n*May 24 04:35:54.369: Track: 2 Change #109 IP route 192.76.34.92\/30, \r\nconnected-&gt;no route, reachability Up-&gt;Down\r\n*May 24 04:35:54.797: Track: 1 Down change delayed for 2 secs\r\n<\/strong>Rack1SW3#<strong>\r\n*May 24 04:35:56.802: Track: 1 Down change delay expired\r\n*May 24 04:35:56.802: Track: 1 Change #115 list, boolean and \r\nUp-&gt;Down(-&gt;30)<\/strong><\/pre>\n<p>OK, so we can see above that the Port-channel went down. I\u2019m representing the backup path in my development scenario using loopback interfaces and floating routes have been configured using these pretend links. These routes should now have been installed in the routing table so to verify this, I checked which next-hop interface was being selected for some example destinations within each of the VRFs using the \u2018show ip cef\u2019 command:<\/p>\n<pre>Rack1SW3#sh ip cef vrf inside 10.16.136.1\r\n10.16.0.0\/12\r\n<strong>\u00a0 nexthop 192.76.34.57 Loopback20<\/strong>\r\n\r\nRack1SW3#sh ip cef vrf inside 8.8.8.8\r\n0.0.0.0\/0\r\n<strong>\u00a0 nexthop 192.76.34.97 Loopback10<\/strong>\r\n\r\nRack1SW3#sh ip cef vrf outside 192.76.8.1\r\n192.76.8.0\/26\r\n<strong>\u00a0 nexthop 163.1.120.5 Loopback40<\/strong>\r\n\r\nRack1SW3#sh ip cef vrf outside 8.8.8.8\r\n0.0.0.0\/0\r\n<strong>\u00a0 nexthop 192.76.34.209 Loopback30<\/strong><\/pre>\n<p>So this looks to work for our pretend failure scenario, but will it recover? To find out, I brought interface Port-channel10 back up:<\/p>\n<pre><strong>Rack1SW3(config)#int po10\r\nRack1SW3(config-if)#no shut<\/strong>\r\nRack1SW3(config-if)#\r\n^Z\r\nRack1SW3#\r\n*May 24 04:37:39.411: %LINK-3-UPDOWN: Interface Port-channel10, \r\nchanged state to down\r\n*May 24 04:37:39.411: %LINK-3-UPDOWN: Interface FastEthernet1\/0\/9, \r\nchanged state to up\r\n*May 24 04:37:39.411: %LINK-3-UPDOWN: Interface FastEthernet1\/0\/10, \r\nchanged state to up\r\nRack1SW3#\r\n*May 24 04:37:43.832: %LINEPROTO-5-UPDOWN: Line protocol on \r\nInterface FastEthernet1\/0\/9, changed state to up\r\n*May 24 04:37:44.075: %LINEPROTO-5-UPDOWN: Line protocol on \r\nInterface FastEthernet1\/0\/10, changed state to up\r\nRack1SW3#\r\n<strong>*May 24 04:37:44.830: %LINK-3-UPDOWN: Interface Port-channel10, \r\nchanged state to up<\/strong>\r\n<strong>*May 24 04:37:45.837: %LINEPROTO-5-UPDOWN: Line protocol on \r\nInterface Port-channel10, changed state to up<\/strong>\r\nRack1SW3#\r\n<strong>*May 24 04:37:52.422: Track: 2 Up change delayed for 2 secs\r\n<\/strong>Rack1SW3#<strong>\r\n*May 24 04:37:54.427: Track: 2 Up change delay expired\r\n*May 24 04:37:54.427: Track: 2 Change #110 IP route 192.76.34.92\/30, \r\nno route-&gt;connected, reachability Down-&gt;Up\r\n*May 24 04:37:54.720: Track: 1 Up change delayed for 2 secs\r\n<\/strong>Rack1SW3#<strong>\r\n*May 24 04:37:56.725: Track: 1 Up change delay expired\r\n*May 24 04:37:56.725: Track: 1 Change #116 list, boolean and \r\nDown-&gt;Up(-&gt;40)<\/strong><\/pre>\n<p>I then repeated my previous show ip cef\u00a0 tests:<\/p>\n<pre>Rack1SW3#sh ip cef vrf inside 10.16.136.1\r\n10.16.0.0\/12\r\n<strong>\u00a0 nexthop 192.76.34.29 Port-channel48<\/strong>\r\n\r\nRack1SW3#sh ip cef vrf inside 8.8.8.8\r\n0.0.0.0\/0\r\n<strong>\u00a0 nexthop 192.76.34.93 Port-channel10<\/strong>\r\n\r\nRack1SW3#sh ip cef vrf outside 192.76.8.1\r\n192.76.8.0\/26\r\n<strong>\u00a0 nexthop 163.1.120.1 Port-channel20<\/strong>\r\n\r\nRack1SW3#sh ip cef vrf outside 8.8.8.8\r\n0.0.0.0\/0\r\n<strong>\u00a0 nexthop 192.76.34.193 Port-channel47<\/strong><\/pre>\n<p>Great! So failure and recovery scenarios have tested successfully.<\/p>\n<h2>Final thoughts<\/h2>\n<p>I am generally very pleased with the routing and failover solution that&#8217;s been built for the new infrastructure. I think of particular benefit is its relative simplicity, especially when compared with the mechanisms used in the previous infrastructure.<\/p>\n<p>It\u2019s also much easier to initiate a failover with this new mechanism say if for some reason you specifically wanted the standby path to be used instead of the active one. This can be useful for carrying out any configuration changes or maintenance work on one of the Linux hosts for instance. This can either be executed by shutting down an interface on the host, or one on the switch within the active path. Then in around 5 seconds, hey presto! Traffic starts to flow over the other path!<\/p>\n<p>Configuring an active\/active scenario in the longer-term may be a better way forward ultimately. I\u2019ve had some thoughts on using Policy-Based Routing (PBR) on the networking side to manipulate the next-hop of routing decisions based on the internal client source IP address. When used in conjunction with two distinct external NAT pool IP ranges (one per firewall host) this could be just the ticket to achieve a workable active\/active scenario. Time-permitting, I&#8217;ll be looking to test this within our development environment before contemplating this for production service. Assuming it worked OK in testing, I think it would also be worth weighing up the time and effort that this configuration would involve against the relative benefits and risks to the service.<\/p>\n<p>That concludes my coverage on the routing\/failover setup for the networking-side of the new eduroam back-end infrastructure. Thanks for reading!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the first post in a series discussing some of the finer details of the networking setup for the new eduroam infrastructure that went into production last month. In this post, I will be covering the IP routing setup &hellip; <a href=\"https:\/\/blogs-new.it.ox.ac.uk\/networks\/2014\/06\/27\/cisco-networking-and-eduroam-routing\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":207,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,13465],"tags":[],"class_list":["post-1330","post","type-post","status-publish","format-standard","hentry","category-cisco-networks","category-eduroam"],"_links":{"self":[{"href":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/wp-json\/wp\/v2\/posts\/1330","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/wp-json\/wp\/v2\/users\/207"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/wp-json\/wp\/v2\/comments?post=1330"}],"version-history":[{"count":181,"href":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/wp-json\/wp\/v2\/posts\/1330\/revisions"}],"predecessor-version":[{"id":1545,"href":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/wp-json\/wp\/v2\/posts\/1330\/revisions\/1545"}],"wp:attachment":[{"href":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/wp-json\/wp\/v2\/media?parent=1330"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/wp-json\/wp\/v2\/categories?post=1330"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs-new.it.ox.ac.uk\/networks\/wp-json\/wp\/v2\/tags?post=1330"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}