Deduplication Simpana 8

Last post 07-10-2010, 6:16 AM by Daniel Rose. 13 replies.
Sort Posts: Previous Next
  • Deduplication Simpana 8
    Posted: 07-06-2010, 10:25 AM

    Hello everybody,

    It's my first message Cool

    We have upgraded from Simpana from the ver. 7 to the ver. 8.

    New : deduplication and the ver.8 is also new forme ( a training is planned)

    by creating a storage policy we can choose if the deduplication is made from the client or from the media agent server.

    on the subclient is also possible to choose between the client and the media agent server

    My question: where is the difference ? The backup jobs are very "slow"and  if I modify the config  at the subclient I do not see any difference. At the SP is not possible

    Thanks for your help




  • Re: Deduplication Simpana 8
    Posted: 07-06-2010, 10:46 AM



    Thanks for the post.

    When deduplication is enabled on a storage policy the data is broken up into hashes. These hashes are then placed into a deduplication database for reference. The setting you are referring to is to determine where this hashing is going to take place. Rule of thumb is to set the hashing to the client.

    If you are experiencing slow backups then I would recommend looking at the location of the DDB as the source of the issue. By default the DDB (deduplication database) requires 10 times the I/O as a normal backup. The DDB should be placed on its own 3 to 5 disk raid 0 single lun configurations.



  • Re: Deduplication Simpana 8
    Posted: 07-06-2010, 2:14 PM

    How large does a dedup database normally get? I'm in the process of setting it up within my company. Right now I'm planning on using 5 fast 450GB drives (RAID0). My other question is what is considered "best practice" for backing up the dedup database?

  • Re: Deduplication Simpana 8
    Posted: 07-06-2010, 2:46 PM


    There is no limit to the DDB size. What I typically tell everyone is to seal the DDB after 64tb of deduplicated data (raw data). This equates to 320tb of application data to 128k block sizes. This size DDB is about 200gb on disk. In your case the DDB size will be determined by your application data and retention.

    In 8.0 CV does not support the backing up the DDB. The reason for this is that the DDB is closely tied to the CommServe DB. Restoring a DDB and cause issue where the DDB and CSDB are out of sync. This can lead to data loss issues. Beside in 7.0 and 8.0 the DDB is not required for restore. Any data that has been written to disk is still restorable if the DDB is lost. The only issue that occurs from the DDb being lost Is a rebaseline. If you are concerned with losing the DDB then I would recommend Raid 1+0.

    I would recommend reviewing BOL for Deduplication prior to deploying Deduplication in your environment. Better yet see if you can get into one of the training classes.



  • Re: Deduplication Simpana 8
    Posted: 07-06-2010, 6:16 PM

    Thanks for the reply. Your information is very helpful.

    We have 3 sites (MA, AZ, TX) that we will be using deduplication with. The current plan is to house the dddb's in TX. Can a server handle one local and 2 remote sites or should I have individual server for each site? MA has roughly 40 clients. TX has 60. AZ is small with only 10. I don't believe the total data size (deduped) will be greater than 100TB.

     I saw that a company called OCZ has created a 512GB SSD that utilizes a PCI-Express slot. They advertise 1GB per second write, 1.3GB per second read, and 550MB sustained.

    Does anyone have any experience with using SDD drives for their dedup database? With these specs, I'm almost willing to bet it could run on a Media Agent that is also doing backup work.

  • Re: Deduplication Simpana 8
    Posted: 07-07-2010, 3:05 AM


    Thanks for your infos.

    ""The DDB should be placed on its own 3 to 5 disk raid 0 single lun configurations.""

    I believe you mean RAID 5, correct ?

    The DDB is on the Media Agent located (own partion in a  RAID 0) and the hashing of the subclients is configured on the client and on the media agent, it depends.





  • Re: Deduplication Simpana 8
    Posted: 07-07-2010, 8:15 AM
    Azzuri The DDB performs better on a RAID0 LUN. There is no need to create a parity disk-based RAID set for the DDB. As the DDB is self sealing in the event of a disk failure there is no loss of consistency for restores from de-duplicated backups. Its recommended, therefore, to use the RAID0 configuration for the deduplication database in order to gain the benefit of the faster R/W performance
  • Re: Deduplication Simpana 8
    Posted: 07-07-2010, 8:28 AM

    Thanks to all for your help Laughing, I'll try with a faster RAID

  • Re: Deduplication Simpana 8
    Posted: 07-07-2010, 10:07 AM


    from a performance stand point it would be better to host the DDB's at the remote sites. otherwise the MA will have to traverse the wan in order to check the hash into the DC DDB. Depending on the wan link this can slow down the backups.

    I have tested an High end SSD appliance as the DDB master and found the performance is slightly better than spindles. If you are thinking about using an ssd for the DDB then i would recommend running the SIDB Simulation utiltiy to see how well the DDB will scale.

    Also just to clarify the DDB can also reside on the Data Mover MA. The DDB drive will need to meet CV requirements and best pactice but we see this configuration often. The biggest issue is Server resources. A good example of a Data Mover and DDB manger server is the Dell R710.


  • Re: Deduplication Simpana 8
    Posted: 07-07-2010, 10:54 AM

    Hello AaronA, could you post some resulsts of those tests (also showing the testcommands used)? Thanks !Smile

  • Re: Deduplication Simpana 8
    Posted: 07-07-2010, 11:25 AM
    I don’t have the exact number readily available. I followed BOL recommendation for evaluating disk performance.
  • Re: Deduplication Simpana 8
    Posted: 07-08-2010, 2:12 AM

    Hi Aaron,

    The MA is a Dell R510, 8GB Mem, 4CPUs, 2x 350GB HDUs mirrored, 2 partions C:\ and D:\.

    The DDB resides on the D:\ partition

    My idea is to upgrade with 3 or 4 new HDUs configured as RAID0 for the DDB and also to upgrade  the MEM.

    The connection between the backup environnement and the clients is a 2GB FO one and it is not really busy...




  • Re: Deduplication Simpana 8
    Posted: 07-08-2010, 9:02 AM


    the issue with this configuration is that all operations for the server are happening on two spindles.

    The best configuration is the following.

    O/S (windows 2003 or 2008 x64) on 2 drive Raid 1

    Page File on 2 drive Raid 0 (MS recommendation)

    DDB on 3 to 5 drive Raid 0

    From the latest specs on the R510 is looks like you can achieve this configuration

    Drive Bays

    Cabled or hot-swap options available:

    Up to four cabled 3.5" SAS or SATA drives in 4 hard drive chassis

    Up to eight hot-swap 2.5"/3.5" SAS, SATA or SSD drives in 8 hard drive chassis

    Up to 12 hot-swap 2.5"/3.5" SAS, SATA or SSD drives in 12 hard drive chassis with 2 additional 2.5" internal cabled hard drives


    All raid set should be on their own port on the Raid Controller.

    If this is not possible then maybe create a 3 to 5 drive Raid 0 with a single lun on the San for the DDB.

    For the Memory I would recommend 32 gb.


  • Re: Deduplication Simpana 8
    Posted: 07-10-2010, 6:16 AM

    We do around 25TB (perhaps 30ish TB in a cycle) in a single full backup, and recently put in deduplication with Simpana 8. We've got a dedicated DDB Manager, and two data mover MA's. The DDB Manager is an R510 with X5660 (6 core CPUs). The R510 is great because it has support for a large number of spindles, and the H700 RAID controller is quite good.

    We've got a few DDBs. Two are for primary storage policies, each has around half the total backup volume (around 12.5TB in each full backup). Each of those DDB is on a RAID10 set, with 300GB SAS2 6GB/s 15k disks. We also have a smaller DDB used for archiver data, this gets a LOT less data, so it sits on a RAID1 spindle. The operating system and page file also get their own RAID1 set.

    We see the performance of this is way above anything we even approach needed. The disk queues never exceed 0.01. The CPU doesn't get touched, nor does the RAM. That said, I'd not reduce the performance of our disk in this system. I'd happily get a system with slightly less CPU and RAM though.

The content of the forums, threads and posts reflects the thoughts and opinions of each author, and does not represent the thoughts, opinions, plans or strategies of Commvault Systems, Inc. ("Commvault") and Commvault undertakes no obligation to update, correct or modify any statements made in this forum. Any and all third party links, statements, comments, or feedback posted to, or otherwise provided by this forum, thread or post are not affiliated with, nor endorsed by, Commvault.
Commvault, Commvault and logo, the “CV” logo, Commvault Systems, Solving Forward, SIM, Singular Information Management, Simpana, Commvault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, QuickSnap, QSnap, Recovery Director, CommServe, CommCell, SnapProtect, ROMS, and CommValue, are trademarks or registered trademarks of Commvault Systems, Inc. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.
Copyright © 2019 Commvault | All Rights Reserved. | Legal | Privacy Policy