Global Deduplication

Last post 05-21-2012, 2:27 PM by Nmendias. 12 replies.
Sort Posts: Previous Next
  • Global Deduplication
    Posted: 04-20-2012, 1:33 PM

    We are looking to finally implement CommVault Dedupe in the near future.  A vendor has come in and recommended we split the DDB up between 5 media agents.  So it would not be a shared global DDB, it would be 5 seperate ones.

    I am having a very hard time buying into this design.  I have always been told to go with a shared dedicated DDB server in the past.  This was recommended both by 2 previous CV classes as well as a CV engineer that performed a CV healthcheck for us. 

    Anyways, I am very curious how many of you with larger implementations of CommVault  are NOT using a dedicated DDB???  and for what reasons?  How much does the deduplication rate decrease if you split it up between 5 or more media agents?  We were told only a 3% reduction in dedupe ratio, and I have a hard time believing this too.

    Just FYI - We backup around 600 servers, with about 70+ TB of data per weekly full.

    Thanks for any help!

     

  • Re: Global Deduplication
    Posted: 04-20-2012, 3:00 PM

    Our environment is similarly sized.  One potential issue with using a single DDB in an environment that large is that the spindles housing the DDB will become the throughput constraint (especially for aux copies requiring rehydration of deduped data).  If you segment data between DDBs by data type (file system on one DDB, SQL on another, etc.) you shouldn't see a much lower dedupe ratio because of splitting the DDBs since the different data types aren't likely to dedupe against each other very well.  As much as I like CV, this is the Achilles heel of CV dedupe in my opinion and is where other solutions (like Avamar) shine with truly global dedupe across large datasets.

  • Re: Global Deduplication
    Posted: 04-20-2012, 3:17 PM

    The best way to get "around" this problem is, to use SSD as DDB.

    You get so much IOPS without using any spindles. :)

  • Re: Global Deduplication
    Posted: 04-20-2012, 3:19 PM

    SSDs to house the DDB makes sense to me (at first thought) but I've heard CommVault still reccomends using four traditional 15k drvies to house the DDB.  Has that position changed with newer SSDs?

  • Re: Global Deduplication
    Posted: 04-20-2012, 3:31 PM

    According to the dedup building block guide, you shouldn't have more than 50 concurrent streams on your DDB or you'll start seeing performance degradation. If you backup over 600 servers and some of them are multi streamed, I'm guessing you can most likely break this soft limit which is probably why you were recommended 5 servers (250 concurrent streams).

    http://documentation.commvault.com/commvault/release_9_0_0/books_online_1/english_us/features/dedup_disk/building_block.htm

    As for SSD, I know I've read somewhere that you could use them and I thought their building block guide was mentionning it but it seems they don't. I'll try to find where I read it and post a link. At this point though, Commvault shouldn't care much about what kind of media you use, they just want the most IOPS you can give them!

    Phil

  • Re: Global Deduplication
    Posted: 04-20-2012, 3:33 PM

    I guess I read it in their scalability guide but they only mention it being recommended for enterprise size Commserve. Again, I'm sure it's just that the other documentation is a bit old and hasn't been updated to include SSDs.

    http://documentation.commvault.com/commvault/release_9_0_0/books_online_1/english_us/features/scalability/commcell.htm

  • Re: Global Deduplication
    Posted: 04-20-2012, 3:59 PM

    Ok, so if you are forced to have several seperated deduplication pools, wouldn't it make sense to have the hashes syncronize in the background between all the DDB's?  I would think CommVault would be working on this issue, because this is NOT a fix in my opinion.  I am angry.

    This work-around is going to force me to repurchase 5 top of the line media agents now, very, very costly!  100 grand unexpectedly is a lot of cost.

    Also, this is something they NEED to teach in the admin/engineering classes, because both v8.0 and v9.0 classes with different teachers never mentioned this to us.

  • Re: Global Deduplication
    Posted: 04-20-2012, 4:23 PM

    mortepa:

    Ok, so if you are forced to have several seperated deduplication pools, wouldn't it make sense to have the hashes syncronize in the background between all the DDB's?  I would think CommVault would be working on this issue, because this is NOT a fix in my opinion.

    1. This is a best practice which obviously Commvault and consultants will recommend. You are not forced to do this. Actually, I use 100 streams per MA/DDB and meet my backup Window so there is no need for me to follow this best practice. I would recommend starting small and growing only if needed. If you end up having bad performance, this is something to keep in mind and watch for.

    2. It would make sens to share the hashes and I'm sure they are already working on a solution to this.

    mortepa:

    This work-around is going to force me to repurchase 5 top of the line media agents now, very, very costly!  100 grand unexpectedly is a lot of cost.

    I'm sure you can get away with cheaper MAs than this. Also, the recommendation of 50 streams per DDB is only for the DDB, not the MA. If you have an insanely powerful MA, you could always have it host 2 or more DDBs. The important thing here is to keep each DDB on their dedicated spindles. Again though, if you already had 50 SSDs (just to go extreme here) for your single DDB, you could most likely split this in 5 and host all DDBs with your current hardware.

    mortepa:

    Also, this is something they NEED to teach in the admin/engineering classes, because both v8.0 and v9.0 classes with different teachers never mentioned this to us.

    Unfortunately, like anything, some teachers are better than others. I remember this was mentionned to me when I had my course. There are other tweaks which I learned since then that wasn't talked about in the course. In the end, 5 days is not a lot to go through all the possible configurations of Simpana but I agree with you that the DDB paper should be one covered by the course.

    Phil

  • Re: Global Deduplication
    Posted: 04-23-2012, 6:20 PM

    I dont think you will see a big loss in deduplication rate having seperate Global DDB. I would have a DDB on each media agent. Have the backup data land on disk that is only zoned in for that media agent.  If you have a DDB on another media agent when the backup is going on the client will need to talk with the media agent with the DDB and the media agent that is running the backup. We first start out with one DDB and moved away from that model. Today we have a DDB on each media agent. We have spill and fill turned on. Each media agent has primary storage policy with an incremental policy. Every Server we backup always uses the same media agent. We also found this has improved the aux copy to tape.

  • Re: Global Deduplication
    Posted: 04-25-2012, 4:23 PM

    fdxpilot:

    Our environment is similarly sized.  One potential issue with using a single DDB in an environment that large is that the spindles housing the DDB will become the throughput constraint (especially for aux copies requiring rehydration of deduped data).  If you segment data between DDBs by data type (file system on one DDB, SQL on another, etc.) you shouldn't see a much lower dedupe ratio because of splitting the DDBs since the different data types aren't likely to dedupe against each other very well.  As much as I like CV, this is the Achilles heel of CV dedupe in my opinion and is where other solutions (like Avamar) shine with truly global dedupe across large datasets.

    I don't understand this part as rehydrating to do an aux copy doesn't touch the DDB?

  • Re: Global Deduplication
    Posted: 04-25-2012, 4:39 PM

    Paul Hutchings:

    fdxpilot:

    Our environment is similarly sized.  One potential issue with using a single DDB in an environment that large is that the spindles housing the DDB will become the throughput constraint (especially for aux copies requiring rehydration of deduped data).  If you segment data between DDBs by data type (file system on one DDB, SQL on another, etc.) you shouldn't see a much lower dedupe ratio because of splitting the DDBs since the different data types aren't likely to dedupe against each other very well.  As much as I like CV, this is the Achilles heel of CV dedupe in my opinion and is where other solutions (like Avamar) shine with truly global dedupe across large datasets.

    I don't understand this part as rehydrating to do an aux copy doesn't touch the DDB?

    You're right, aux copies don't use the DDB. I hadn't noticed what was said above but the statement is definitely incorrect. If there was a performance gain after splitting the DDBs, it is because of another factor(s) (like adding MAs maybe?) but definitely not because of more DDBs. This is quite easy to test too as you can put your DDB offline and still run aux copies. This will prove that the DDB isn't in use.

  • Re: Global Deduplication
    Posted: 05-02-2012, 4:35 PM

    mortepa:

    Ok, so if you are forced to have several seperated deduplication pools, wouldn't it make sense to have the hashes syncronize in the background between all the DDB's?  I would think CommVault would be working on this issue, because this is NOT a fix in my opinion.  I am angry.

    Let's just say you are not the only one wanting DDB replication/parallellism, eliminating DDB recovery in a Gridstor configuration amongst other things...

    However, patience is a virtue Wink

  • Re: Global Deduplication
    Posted: 05-21-2012, 2:27 PM

    if you have to dump data to tape do not use Gridstor. You aux copy to tape will suffer. When we frist implemented Commvault the SME said on global dedup no matter how many media agents you have. We since when the oppsite with a DDB on each media agent. This has improved our backup and aux copy performance.

The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of Commvault Systems, Inc. ("Commvault") and Commvault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, Commvault.
Commvault, Commvault and logo, the “CV” logo, Commvault Systems, Solving Forward, SIM, Singular Information Management, Simpana, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of Commvault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Close
Copyright © 2019 Commvault | All Rights Reserved. | Legal | Privacy Policy