VERIFIED SOLUTION i

How to use indexcopy to validate Vault indexes

I think that my indexes are corrupted.  Is there a faster way than reindexing to make sure that I have valid indexes?  I need to make sure that what has been ingested matches the indexes


You can use the indexcheck index:copy method of validating the indexes.  This will ensure that everything in the indexes matches what's in pagedata/docdata:
 
For example, assuming the following indexes:
 
tipoletranro.dri
account.drr
invlink.dri
guid.dri
account.dri
 
Make sure that you have a working backup of the indexes before attempting this, just in case something goes wrong. 
 
0) Make sure no files are currently being loaded into Vault 
1) stop e2loaderd so that no new files can be loaded while this is happening 
2) From the vault/server directory, execute this command: 
 
indexcheck invlink.dri -copy:new.invlink.dri -validfile -noprint 
 
 
(using -noprint may make the command run a little bit faster, but you won't see results on the screen while it is running. However the new.invlink.dri file should grow as the operation runs) 
 
(repeat this process for each of the other DRI files.  Note that account.drr is not an index per se, but a custom record table.  You can ignore it for this procedure)
 
3) Once step 2) has completed, stop the rest of the Vault services 
4) rename invlink.dri to old.invlink.dri 
5) rename new.invlink.dri to invlink.dri 
6) start the Vault services (except for e2loaderd) 
7) test to make sure you can see the documents that were missing before 
8) if it all looks good, you can restart e2loaderd and resume loading documents again. 
9) if there is any problem, you can back this out by stopping services, renaming the files back from step 4) and 5) and restarting services 


Note that you will see a size difference between the old and new DRI files.  For example:
 
30/10/2013  03:07 AM    10.970.923.008 invlink.dri
27/10/2013  03:00 AM    12.778.192.896 old.invlink.dri
30/10/2013 03:07 a.m. 5.798.584.320 nrofactura.dri
30/10/2013 10:53 a.m. 3.157.999.616 nrofactura.new.dri
30/10/2013 03:07 a.m. 6.066.307.072 tipoletranro.dri
30/10/2013 11:55 a.m. 3.249.557.504 tipoletranro.new.dri
 
 
One factor is the packing of nodes by -copy for standard indexes. It uses a higher split point on inserts because it knows the keys are usually in perfectly ascending order while being copied. That is, instead of nodes splitting at 50% they splits at 90%. That means the nodes will typically be 90% full after they split rather than 50%. Since new keys are being added past this node they don't split any further and remain at 90%. That ultimately means the same data is packed more tightly in the target index.
 
Once live data is added to the new index these nodes may once again start splitting (now at 50%) and increasing the actual file size. If the index shows growth only in certain areas (e.g. invoice numbers over time are always increasing), the dormant areas will remain packed.
 
Another factor that can come into play is if bulk data was purged from the old indexes at some point, some nodes may have considerable empty space. This space is not copied and the target index is smaller as a result.
 
For correctness you'd need to be looking at the key and key counts more than anything. The file sizes will vary.
UPDATED:  September 18, 2017