CommVault Forums

Solving Forward - Solving Together
Welcome to CommVault Forums Sign in | Join | Help
in

Verifying Synthetic Full Backups

Last post 10-26-2011, 2:50 PM by Jeff D. 17 replies.
Sort Posts: Previous Next
  • Verifying Synthetic Full Backups
    Posted: 03-02-2011, 4:13 PM

    Books Online includes this statement:

    "In a scenario where a conventional full backup is run only once for a given subclient, and incrementals (or differentials) with periodic synthetic full backups are run after that, files that never change may inadvertently be missed. Eventually, these files may be pruned, leaving no existing backups of the files. The problem of omissions may build up over time until the file changes or a conventional backup is executed."

     

    The solution presented is to use the "Verify Synthetic Fulls" advanced backup feature. The problem is that this feature is only supported by a handful of agents:

     

    http://documentation.commvault.com/commvault/release_9_0_0/books_online_1/english_us/feature_support/feature_support.htm?var1=http://documentation.commvault.com/commvault/release_9_0_0/books_online_1/english_us/features/backup/support_adv_bu_options.htm

     

    For example, the agents I use most commonly with Synthetic Full backups are the Exchange Mailbox, Exchange Public Folder, SharePoint, and Virtual Server agents, none of which support the "Verify Synthetic Full" feature.

     

    I have two questions:

     

    1) Can someone please explain the BOL statement above more clearly?  How exactly are certain files "inadvertently missed" in a synthetic full operation?  Wouldn't this make it very dangerous to use Synthetic Full backups on a regular basis?

     

    2) Are there any other workaround for the agents that don't support the "Verify Synthetic Full" feature?  Why would we use Synthetic Full if we can't be sure it didn't "inadvertently miss" any files?

     

    Thanks,

    Bill

  • Re: Verifying Synthetic Full Backups
    Posted: 03-02-2011, 5:18 PM

    Bill, ill give you my take adjacent to the number:

    1) Yes, it can be dangerous which is why in BOL it mentions numerous times that conventional backup should be run and syn full is more supplemental.

    (seen under considerations)

    • Synthetic full backups consolidate data; they do not back up data from the client computer. You should therefore use synthetic full backups in addition to and not in place of any regularly scheduled incremental or differential backup jobs.
    • If a synthetic full backup fails to complete successfully, we suggest that you run a standard full backup in its place.
    • In this example, the synthetic full backup was started on 4/7; however, the data it contains is from the 4/1 through 4/6 backups. To secure data for 4/7, we would need to run a conventional (i.e., full, incremental, or differential) backup on 4/7.

    2) With V9 use source side dedup and run regular backups.

  • Re: Verifying Synthetic Full Backups
    Posted: 03-02-2011, 6:24 PM

    I understand that a Synthetic Full is only as good as the previous incrementals and fulls/synthetic fulls (i.e. you are at the mercy of your backup media), but I still don't understand how files could be "inadvertently missed" by CommVault?  Does that imply a functional deficiency with the actual synthetic full process, or is BOL assuming that an "operator error" will be made, like forgetting to run an incremental before a synthetic full?  Or is BOL assuming some sort of physical media error or corruption?  I am just trying to understand what BOL means by files being "inadvertently missed".  CommVault has always told me that a Synthetic Full is "just as good" as a regular full, assuming your backup media is good and all of the jobs completed successfully (incrementals and last full/synthetic full).  Is this not the case?  Is there some other known deficiency with synthetic fulls?

  • Re: Verifying Synthetic Full Backups
    Posted: 03-04-2011, 10:34 AM

    With the way i read it.  If you run the syn full on the 7th and expect the 7ths data to be there it isnt.  If you ran a real full on the 7th the data would be protected on that job.  Although running an inc before the syn full runs is helpful in preventing this type of loss, its not like running a real full, in the way it captures the "now" data. 

  • Re: Verifying Synthetic Full Backups
    Posted: 03-07-2011, 6:14 AM

    From my point of view, it all depends on the changes that leed to a certain entity (file, mail) to be backed up by an incremental backup. If there are changes but the incremental fails to "see" them, then those changes are not backed up.

    Whats puzzling me also is the second part of the statement: "...files that never change may inadvertently be missed. Eventually, these files may be pruned,..."

    For the missed part, o.k., but for the pruned part? If a file is part of the first full backup and then never gets changed (gets backed up again by an incremental backup), why should it age of in a synthetic full scenario? The majority of the files of the first full backup will not be part of the subsequent incrementals. So if every "unchanged" file would be aged of, that would not leave much data pretty fast. There is a mechanism to prevent this anyway, regardless of the feature to verify a synthetic full, or? Unfortunately I don't know much more about these mechanisms.

    It seems to me, that the synthetic full feature is not the same in terms of reliability for all kinds of supported data types. Thats because a file backup has other problems during incremental backup as an exchange mailbox backup in terms of detecting changes.

    For the notion of the source side dedup as the better way to do this, is that also true in terms of backup time? Is an Exchange Mailbox level backup (aka brick level backup) able to be carried out in a acceptable time with source side dedup whereas it was not without (because of the bad MAPI performance)?

  • Re: Verifying Synthetic Full Backups
    Posted: 03-08-2011, 10:29 PM

    I read this a couple months ago and was worried as well so I opened a ticket asking more details about this. Unfortunately, it seems the answer was given to me by phone as I can't find any answers in my emails that I could copy/paste here.

    Basically, what I was told is that the documentation is misleading. What they mean by "pruned" is that if your data on disk gets corrupted (hardware failure, Windows file system bug, etc.), you wouldn't know about it since you never actually try to read/write data to that disk sector again. Doing a real full would copy every single file to disk/tape and ensures you have a valid copy of the data. As you can see, it isn't nearly as bad as it sounds when you read the BOL. Support told me it definitely wasn't Commvault that would prune the data since the retention would get updated with each synthetic (and incrementals for that matter).

    The thing is, if I'm not mistaken (would need to verify that with support),with deduplication enabled, this would actually also be true even if you did a real full backup as the Commserve will only look in its DDB to know if it has the blocks or not and will not actually double check if it can read the data off the disk.

    As far as missing missing files, I guess you either trust incrementals or you don't and If you don't trust the incrementals of your backup solution, you might as well buy a new one. Years ago, there were often issues with many products where an incremental wouldn't catch somes changes (mostly attribute changes like security permissions, etc.) but now those changes are being backed up as well (make sure to read each data agent documentation as some types of scan will not detect those changes). Since then, I've yet to see a restore missing files/attributes because an incremental missed it.

    I don't have Exchange so I can't say for sure but I'd beleive that if you make a verification of every full, it would include synthetic full backups as well. I personally have a schedule that verifies my fulls every week-end and the verification expires 4 weeks later (meaning I check every full once a month). Also, if you do an aux copy offsite for DR purposes, you will have a copy of your data on another set of disk/tape reducing even more the odds of having such a failure.

    As for synthetic full vs full, I personally only swear by synthetic fulls. I've used TSM for a long time with their incremental forever and never had any issue so I'm not expecting to have any issue using Commvault's implementation of the "incremental forever". Obviously, I always check the option box to run an incremental just before the synthetic full so that the end result is exactly as taking a full backup (but faster and taking up less space).

    In the end, you gotta remember that your best defense against any restore failures is to actually do restore drills (good for DR tests as well) 1-2 times a year. Boring and annoying as hell but necessary, no matter which product you use. In my opinion, this should always be done as the issues are most often caused by misconfigurations made by the backup administrators (excluded files by mistake, forgot to include that new server that was installed a couple months ago, etc) and not so often by the actual backup solution.

    Hope this helps.

    Phil

  • Re: Verifying Synthetic Full Backups
    Posted: 03-10-2011, 12:46 PM

    Great discussion, guys.  Here are my thoughts:

    Vincenzo_Basolino: I understand that a Synthetic Full job, by itself, doesn't actually backup any data from the client, but if you check the box to run an incremental backup prior to the Synthetic Full backup, you are by definition backing up changed files since the last incremental.  Thus, when the Synthetic Full runs, it glues together the last Full + Incremental #1 + Incremental #2 + Incremental #3, etc. + the incremental that just ran.  I fail to see how data could be "lost", assuming the backup media is not corrupted or the last Full did not get pruned prematurely.

     

    BerndMueller_CW: I think you've really hit the nail on the head with why the BOL statement is puzzling.  I can't think of any reason why CommVault would "prune" bits and pieces from a Full backup, other than the entire backup itself getting pruned due to retention period being too short or operator error.  I also agree with your questioning source-side dedupe in the case of an Exchange MAPI message-level backup.  The reason an Exchange Mailbox-level backup is so slow is because MAPI is so slow.  With source-side dedupe running on the MAPI agent host, I'm pretty sure a Full backup with source-side dedupe enabled would still be painfully slow becuse MAPI would still need to log in to each mailbox, look at each message for duplicate blocks, so even a Full backup with source-side dedupe would be painfully slow.  Hence the need for Synthetic Full.  Source-side dedupe solves a lot of problems but it doesn't completely eliminate the need for Synthetic Full.

     

    PhilippeMorin: I am glad you opened a ticket to get Support's take on this.  From what you are saying, the statement on BOL isn't very clear, and it sounds like there is nothing inherently dangerous with the Synthetic Full operation.  It sounds like the danger comes in when there is disk corruption on the disk target, or if there is operator error (pruning backup jobs indiscriminately, etc.).  You make an excellent point regarding deduplication and how dedupe also relies on previous backup jobs, because a deduped job might share blocks with many older backup jobs.  So if we can't trust the integrity of a Synthetic Full backup, it sounds like we shouldn't trust dedupe'd backups either.  Frankly, dedupe backups sound even more dangerous since the corruption of one backup job might corrupt dozens of other backup jobs that share blocks with the first job.  Of course this now becomes a silly argument because you either trust the whole backup solution or don't.  I agree that running periodic Data Verification jobs (and sample restores) is an excellent layer of protection.  Also, running aux copies to tape is another great way to make sure jobs on disk are valid.

     

    So I am hoping the answer to my original question is that the BOL statement is overly vague and over-hyping the dangers of Synthetic Full.  But one additional questions remains:

    What is the difference between the "Verify Synthetic Full" option (which is on the Advanced Backup dialog box) and a traditional "Data Verification" job that runs against a storage policy?  What exactly does the "Verify Synthetic Full" option do?

  • Re: Verifying Synthetic Full Backups
    Posted: 03-10-2011, 1:14 PM

    What is the difference between the "Verify Synthetic Full" option (which is on the Advanced Backup dialog box) and a traditional "Data Verification" job that runs against a storage policy?  What exactly does the "Verify Synthetic Full" option do?

    The data verification job and the "Verify Synthetic Full" is exactly the same. The data verification job is a schedule while with the verify synthetic full option, you run it right after your backup (or maybe is it wise enough to start verification while backing up? I've never noticed)...

    If you look at your storage policy and right click the primary copy, view, jobs, it will show a column saying if the job has been marked for data verification and when was the last verification done. You will see that if you checked the verify synthetic full option, the verification column for that job will be marked as successful while if you don't, it will be marked as "picked for verification" (depending if you require verification of full backup jobs in your storage policy) and the next time you run a manual/scheduled data verification on that storage policy, it will verify the data and mark it as successful (or not).

  • Re: Verifying Synthetic Full Backups
    Posted: 03-11-2011, 9:40 AM

    Nice!  That is great news, and it makes me even less concerned about using synthetic fulls.  For some reason I always thought "Verify Synthetic Fulls" was something special or unique for synthetic fulls, but if it's the same as a regular Data Verification, that's wonderful.

    Thanks,

    Bill

  • Re: Verifying Synthetic Full Backups
    Posted: 03-11-2011, 12:19 PM

    PhilippeMorin:

    The data verification job and the "Verify Synthetic Full" is exactly the same. The data verification job is a schedule while with the verify synthetic full option, you run it right after your backup (or maybe is it wise enough to start verification while backing up? I've never noticed)...

    This is incorrect.

    Data verification reads the data stored in the backup and makes sure the chunks are valid.  Regardless of what data exists, its only concerned that the chunks on the mag lib are valid.

    Verify Synthetic full checks disparities between actual files on the client computer and the Index are collected.  Internally, a flag is set when the synthetic full backup completes successfully. This flag adds functionality to the next incremental/differential backup to detect any items that the previous synthetic full backup did not include, and include any such items in that incremental/differential backup. The pending flag is cleared when the incremental/differential backup completes successfully, or when a conventional full backup completes, whichever occurs first

  • Re: Verifying Synthetic Full Backups
    Posted: 03-11-2011, 12:59 PM

    Ah... that is odd and interesting! My synthetics have their data verification status set to successful so I've always thought it was done by that option. I guess it's just that our data verification schedule runs right after our weekly synthetics.

    This is even better though as it ensures that if the incremental "missed" something like it is stated in the BOL, you'll still get all your files.

    Thanks for the correction and sorry for the confusion I created!

    Phil

  • Re: Verifying Synthetic Full Backups
    Posted: 03-11-2011, 2:16 PM

    So now we are back at Square 1 (one of my original questions):

    Are there any other workarounds for the agents that don't support the "Verify Synthetic Full" feature?  Why would we use Synthetic Full if we can't be sure it didn't "inadvertently miss" any files?

    The only agents that support "Verify Synthetic Full" are the AD agent, Windows file system, and Unix file system agents.  The rest of the agents (in partiular, Exchange Mailbox-level backups) do not support "Verify Synthetic Full".  Does this mean we are getting potentially "bad" backups when we run a Synthetic Full against an agent that doesn't include "Verify Synthetic Full" capability?  I am still trying to figure out how this "Verify Synthetic Full" functionality is different from simply running another incremental backup, thereby catching any changes since the last incremental or full/synthetic full.  Can someone provide a real-life sample scenario where the "Verify Synthetic Full" will actually catch and/or correct some sort of error condition that would otherwise not be caught?  This seems vitally important for Exchange mailbox backups, which are often retained for long periods of time for legal/compliance reasons.  If you tell an attorney that their last Synthetic Full "may or may not" be valid, that's generally not an acceptable answer.  But if you switch to running full backups, you may not be able to complete the full backup in the backup window due to MAPI's slowness, so you are kind of caught between a rock and a hard place.  I think the key is to have a real-life example of what might go wrong without using "Verify Synthetic Full" and then present the example as a possible risk, and let the customer decide.  Anyone care to offer up an example?

    Thanks,

    Bill

  • Re: Verifying Synthetic Full Backups
    Posted: 04-21-2011, 4:06 PM

    Anyone have thoughts on this?

    Are there any other workarounds for the agents that don't support the "Verify Synthetic Full" feature?  Why would we use Synthetic Full if we can't be sure it didn't "inadvertently miss" any files?

  • Re: Verifying Synthetic Full Backups
    Posted: 04-21-2011, 5:11 PM

    Run real fulls.  With source side dedup and all the other enhancements to minimize the data getting backed up and the ability for the DDB to be reconstructed, running real fulls isnt as big of a deal as it used to be.

    To answer your scenario based question;

    If the syn full has a file that was inadvertently missed within it, the verify syn full scans the file system or mailboxes and then check what needs to get backed up against the prior syn full index.  IT would then find this inadvertently missed file and add it to the following incremental backup collect.  Remember sync fulls do not touch the file system/mailboxes  so if its not "changed" it doesnt get picked up in the incs either.  The verify option forces it to check the actual file system/mailboxes and compare it to what was getting synthetically backed up.

  • Re: Verifying Synthetic Full Backups
    Posted: 04-21-2011, 5:22 PM

    The "Verify Synthetic Full" option isn't supported by the Exchange Mailbox Backup iDA according to Books Online.  Running traditional full backups with the Exchange Mailbox Backup iDA can take days, maybe even a week for a large customer.  Even source-side dedupe won't help with this because MAPI is the source of the slowness, not transferring data.  So even with source-side dedupe enabled, a full backup will run forever.  Synthetic Fulls solve this problem, but apparently "Verify Synthetic Full" isn't supported with this agent.  So we have a Catch-22 situation.  I am still don't see how the original Full backup, or subsequent Incremental backups can "inadvertently miss" files.  I absolutely understand that a Synthetic Full operation never goes back to the original client to check for files, but I don't understand how the original Full and subsequent Incrementals will "inadvertently miss" files.  This doesn't make sense; it implies that the original backups were not complete.  I definitely understand that if a MagLib has an error or corruption, then the Synthetic Full very well might "miss" some files.  But assuming the MagLib is in good shape, no data corruption, I still don't see how files can be "inadvertently missed".  Can you elaborate???  This is critical for Exchange Mailbox backups where Synthetic Full is used all the time.

  • Re: Verifying Synthetic Full Backups
    Posted: 10-04-2011, 5:54 PM

    This caution in the BOL discouraged me from utilizing synthetic fulls much in Simpana 8. I also didn't think they performed all that well compared to a full except in scenarios where bandwidth was limited.

    In Simpana 9, I've grown to love DASH Fulls and how fast and efficient they are. What frustrates me is that only AD, Windows, and Unix iDAs officially support verifying synthetic fulls. Yet, when setting up schedule policies for a group of virtual server agent subclients the checkbox for 'Verify Synthetic Full' is enabled by default in the Advanced Backup Options. This is very misleading. 

    I decided to open up a support ticket with CommVault to confirm that unsupported agents like VSA and OES files system cannot verify synthetic fulls. Support did confirm that "verify synthetic full" is not supported for VSA or OES files systems. However, they did state that the workaround I use should work. It's what I did in Simpana 8 as well: I schedule a traditional full backup 1 out of every 3 weeks (with the other 2 weeks being DASH fulls and all days in-between incrementals). Since my primary copy retention is 30 days/1 cycle, I'm guaranteed of always having 1 traditional full backup on disk. 

    Week 1 - FULL, Week 2 - DASH FULL, Week 3 - DASH FULL, repeat...

  • Re: Verifying Synthetic Full Backups
    Posted: 10-05-2011, 10:50 PM

    Leif,

     

    Your work-around is a good and safe one.  However, I still have yet to have CommVault (or anyone else) explain in sufficient detail how a Synthetic Full can "inadvertently miss" files (other than operator error or MagLib corruption) or why CommVault permits Synthetic Full backups in the first place for certain iDAs without having the "Verify Synthetic Full" option.  In the latter case, it strikes me as odd that CommVault would allow a customer to perform a data protection operation that cannot possibly be verified to be "good."  It seems like if the software isn't able to verify that a synthetic full is "good", it should not even offer the opportunity to take a synthetic full; afterall, who wants to "roll the dice and hope for the best"?  That's not really the point of data protection.  CommVault really needs to explain their rationale for not having "Verify Synthetic Full" on all iDA's that support Synthetic Full.

     

    My hypothesis at this point (until someone proves me wrong :)) is that synthetic fulls are just as good as full backups plus their associated incremental backups (assuming no operator error and no MagLib/job corruption).  If someone can prove me wrong, please do, and please elaborate.

     

    My most relevant example is that of the Exchange Mailbox iDA backup agent.  I have a number of customers with >3000 mailboxes, and a "traditional Full" backup takes well over a week, in some cases two weeks, due to mailboxes with tens of thousands of messages and the slowness of MAPI.  Obviously it's not acceptable to have a backup job running for 1-2 weeks (imagine all of the missed incremental backups!).  The solution is a Synthetic Full backup; but the Exchange Mailbox iDA does not support "Verify Synthetic Full."  So either we are gambling that Synthetic Full does what it claims to do; or we are gambling by letting a backup run for 1-2 weeks and missing all incremental backups for that entire time period.  Either way is a gamble; either way has risks; the question is which is less risky and why.  I don't know the answer but I will let you guys chime in.  :)

     

    Bill

  • Re: Verifying Synthetic Full Backups
    Posted: 10-26-2011, 2:50 PM

    I agree with Bill completely.  Why does the option exist if it isn't foolproof?  It wasn't until just now that I found out my VSA backups might not be intact.  I do tests from time to time of course, but if my weekly backup and administrative job reports come up green, I feel like I'm covered - at least I did until I read the above comments.  I've asked several Commvault employees in the past and all of them have said using Synthetic Fulls after an initial Traditional Full is a perfectly acceptable way to go.  I'm in a reply so I can't scroll up but has a Commvault person officially weighed in on this recently?  I think we really need a definitive answer on this so we can put it to bed.  If I need to interleave a traditional full into the mix from time to time I can, but I won't if I don't have to.  So which is it?


    Just my 1.4 cents (after taxes)
The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of CommVault Systems, Inc. ("CommVault") and CommVault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, CommVault.
CommVault, CommVault and logo, the “CV” logo, CommVault Systems, Solving Forward, SIM, Singular Information Management, Simpana, CommVault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of CommVault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Close
Copyright © 2014 CommVault | All Rights Reserved. | Legal | Privacy Policy