Communication errors?

Last post 06-14-2018, 6:00 PM by Wwong. 5 replies.
Sort Posts: Previous Next
  • Communication errors?
    Posted: 06-11-2018, 1:08 PM

    I'm getting some errors in the event viewer every day while performing aux copies, but so far I've been unable to figure out the problem.  I've tried to open several tickets on this as well, and so far have not been successful in pinpointing the cause.  This is our environment.  We have a Commserve host running on a VM in my office.  The media agents are also VM's, each one in a different geographic location.  The aux copies we run consist of copying the backed up data from that site to either another site, or to our OracleCloud container.  One site seems to be giving us the lions share of the errors.  The errors read as follows:

    "Unable to communicate with CommServe on host [cvmaster]. Please check the network connectivity between the Client and CommServe, and verify this product's Communication Service is running on the CommServe."

      Support most recently told us to make sure that certain paths etc were excluded from scanning by antivirus software.  So just to be sure as a test, we actually uninstalled all antivirus from the commserve, and the media agents, and let the jobs run again.  We continue to get the same errors daily on the same jobs.  We've noticed though that the jobs start running at 6 am, and usually around 7 am or soon after, that's when we start to get the errors.  It is sporadic up until about 12:30 pm, and the errors taper off for the day.  In some cases, the job actually goes into a pending state for a while and later restarts automatically.  In some cases the errors are accompanied by these types of errors:

    "Error occurred while processing chunk [660250] in media [V_42648] for storage policy [SP-SC_Main] copy [SG-Offsite-Primary]:  Backup job [31584]. Unable to send the stop data transfer control message to the tail. ."

      We thought that antivirus might have been responsible for interrupting something on the disk library, but this error actually occured AFTER the antivirus had been removed.  I thought it might be a network issue, but we've tried to pair up errors from the log file with time stamps of errors on the network, and could not find anything that matched up.  Firewalls on the network are open for Commvault to communicate freely.  What else can we look at? 

  • Re: Communication errors?
    Posted: 06-11-2018, 7:06 PM

    Hi tbrown

    In relation to the error that you get from the "Events" reporting "Unable to send the stop data transfer control message to the tail." This usually points to a network drop, which causes the Aux Copy process to "go down". 

    In your specific scenario, since the MediaAgent are situated in different location, the next recommended option would be to put in "Network Route (or previously referred to as "Firewall Configuration") within CommVault. 

    Once the above is done, I would then recommend to do a wireshark capture between the two sites that report the most errors (on the Source MediaAgent and Destination MediaAgent, run one wireshark capture). 

    When the errors are reported within CommVault, stop the wireshark captures and marry up the times with the logs from CommVault 

    The above steps is to further isolate the issue, if this is thought to be a potential network issue, we can drill down on whether this could be related to a TCP reset or an issue at the ISP level, as the Aux Copy are transferring over geographic locations. 

    Hopefully that helps 

    Thank you 

    Winston 

  • Re: Communication errors?
    Posted: 06-11-2018, 7:45 PM

    Ok, yes, that network route was also suggested at one point, and we put that in, but so far that hasn't seemed to have helped.  I'll check with our network admin about the wireshark capture, not sure if he can do that or not. 

  • Re: Communication errors?
    Posted: 06-13-2018, 8:45 AM

    Hi tbrown

    If CommVault firewall/Network Route is not helping, the next step will have to be the wireshark capture, so you can either find out whether it is the internal network or the external ISP that is the issue.

    Thank you 

    Winston

  • Re: Communication errors?
    Posted: 06-14-2018, 2:27 PM

    Our network admin helped out the last two mornings with a Wireshark capture.  We were only able to capture one error due to lack of disk space for the capture, but it did show a packet drop between the media agent and the commserve at the same time stamp as the error message.  This doesn't really seem to explain much though.  He did set the IPSEC tunnels to re-key on an 8 hour rotation instead of a 1 hour rotation, though I am still not sure if that has translated into seeing fewer errors so far this morning.  I've only seen two instead of the usual 8 or 10, and those errors don't necessarily coincide with the actual re-keying.  The one this morning happened about 5 minutes after the re-key, so I don't know if it had anything to do with it or not. 

  • Re: Communication errors?
    Posted: 06-14-2018, 6:00 PM

    Hi tbrown

    Thanks for the detail 

    So in the scenario where you do see a packet drop between the MediaAgent and CommServe, that indicate that there was a interruption that would of impacted communication between the two Servers. 

    However, as the primary issue is the Aux Copy, do we have any network trace between the two MediaAgent, becasue if there are packet drops between the Source and Destination MediaAgent, the error in the CommServe would be a result of that.

    Thankyou 

    Winston 

The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of Commvault Systems, Inc. ("Commvault") and Commvault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, Commvault.
Commvault, Commvault and logo, the “CV” logo, Commvault Systems, Solving Forward, SIM, Singular Information Management, Simpana, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of Commvault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Close
Copyright © 2018 Commvault | All Rights Reserved. | Legal | Privacy Policy