Hyperscale reference architecture is NOT resilient

Last post 01-31-2020, 11:41 AM by Shiva. 7 replies.
Sort Posts: Previous Next
  • Hyperscale reference architecture is NOT resilient
    Posted: 01-29-2020, 7:41 AM

    We have deployed Cisco C240 servers in a 3-node Hyperscale block.

    Now the block is resilient in that it can lose a node but we realised that the individual nodes and the block itself are not resilient against either the failure of the external switch or the internal network card, as there is only the single onboard VIC card giving a single data and single heartbeat connection.

    We have installed a second VIC card in each node so the bonded data and heartbeat connections have multiple physical paths.  This gives us resilience both externally and internally.

    I'm posting this in case anyone else is running nodes like this and also to feed back that Commvault really should be speccing their reference architecture with multiple cards to give true real world resilience.

    If anyone's interested in the VIC cards and the methods we used to configure in the Linux OS drop me a message


    Regards

    Guy
  • Re: Hyperscale reference architecture is NOT resilient
    Posted: 01-29-2020, 1:50 PM

    We had the exact same issue with our ref arch HPE Apollo gear! We also added additional cards. Nice post. 

  • Re: Hyperscale reference architecture is NOT resilient
    Posted: 01-29-2020, 6:12 PM

    It depends on how the Cisco UCS servers are connected to the network switches.

    whether the servers are UCS Managed (with Fabric-Interconnects) or not,  each port on a Cisco VIC can support up to 256 virtual ports. Thus, you could have one port from the VIC go to a different switch with each port capable of passing traffic for both hyperscale VLAN's. Thus you have resilience at the ethernet switch level for both traffic. Regarding resilience at the server level due to a single VIC, if the server allows it (PCIe slot), you can include an additional VIC. However, we do not think it is necessary as there is application level resilience. Moreover, it is not possible to eliminate all single points of failure (SPOF) as the mother-board, RAID controller for mirroring the OS disks and NVMe for Index cache & DDB partitions are not redundant.

  • Re: Hyperscale reference architecture is NOT resilient
    Posted: 01-29-2020, 9:55 PM

    Whilst it may be a very low failure rate, it is only logical that the Reference Architecture should be reviewed as to include two separate VIC adapters.

  • Re: Hyperscale reference architecture is NOT resilient
    Posted: 01-30-2020, 3:19 AM

    Hello. I do understand your point however the issue came about as we have two switches servicing the block and we spanned the connections across them.

    One switch failed and took out the heartbeat and data connections for 2 of the nodes.  Block failure no matter how clever hyperscale and erasure coding is and no way to prevent this with single physical data and heartbeat.

    We now have the two data and two heartbeat connections configured such that a failure of a single switch does not take out any node at all.

    This has the added benefit of allowing our network team to work on these switches without affecting backup and recovery service.


    Regards

    Guy
  • Re: Hyperscale reference architecture is NOT resilient
    Posted: 01-30-2020, 11:49 AM

    Hi Guy,

    I am not sure how the vNIC's from the VIC on each server were setup in this environment. Typically, whether connected through a pair of F.I's or directly to ethernet switches (assuming Cisco Nexus switches), each physical port on the VIC has two virtual ports created (vNIC), one for each traffic type - dataprotection and storage-pool. These virtual ports then show up as seperate network ports (4x) and this is Cisco's recommended practice with the VIC. For example, in the diagram (attached), the un-managed connections are shown on the left where the physical port connections go to different switches. I refer to ports 0 & 1 going to switch-A and ports 2 & 3 going to switch-B.  And within that one connection, both 'blue' and 'tan' coloured traffic (VLAN's) can pass. when managed by fabric-interconnects (F.I), as shown on the right side, the same method of connecting and passing of both traffic types over a single link to the F.I is in use. The virtual ports, 2x per physical port, need to be first created either through the CIMC or UCS Manager tools:

     

    I suspect there were no virtual ports created for each of the VIC ports to allow for redundancy in your case ?

  • Re: Hyperscale reference architecture is NOT resilient
    Posted: 01-31-2020, 3:44 AM

    Hello.

     

    The devices were set up by the Commvault consultant with single physical connections using the vnic rather than the vhba, which gave us this:

    Node 1:

    Data port to SW1

    HB port to SW2

    Node 2:

    Data port to SW2

    HB port to SW1

    Node 3:

    Data port to SW1

    HB port to SW2

     

    Switch 2 failed which took out the heartbeat for two nodes.

     

    Now we have the attached we have resilience.

     

     



    Regards

    Guy
    Attachment: diag.png
  • Re: Hyperscale reference architecture is NOT resilient
    Posted: 01-31-2020, 11:41 AM

    Thankyou for sharing the details on how it was previously setup.

    With a single VIC which has two physical ports, the following method could have been adopted for better resilience. (Please refer to the cabling picture I shared earlier for a mapping of the "Ethernet" port numbers below):

    Node 1:

      VIC Physical port #1:

           Ethernet #0: Data port to SW1

           Ethernet #1: HB port to SW1

     

     

      VIC Physical port #2:

           Ethernet #2: Data port to SW2

           Ethernet #3: HB port to SW2

     

    similarly on other two nodes. Please note, the same VIC, which is a Converged Network Adaptor (CNA), can also contain what Cisco calls virtual HBA (vHBA) for Fiber-Channel traffic, if required. This assumes the customer is using fabric-interconnects(F.I) to manage the servers.

The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of Commvault Systems, Inc. ("Commvault") and Commvault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, Commvault.
Commvault, Commvault and logo, the “CV” logo, Commvault Systems, Solving Forward, SIM, Singular Information Management, Simpana, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of Commvault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Close
Copyright © 2020 Commvault | All Rights Reserved. | Legal | Privacy Policy