Vault Collections vs. Vault PDF input driver considerations

UPDATED: May 5, 2017

Vault Collections vs. Vault PDF input driver considerations

If both collections and PDF ingestion only views in the external viewer, what is the advantage of using PDF ingestion versus Collections?


- multiple documents in one PDF (i.e. PDF as a print job format)

- page ranges

- backgrounds

- can use XML journals


- any type of PDF

- any PDF feature (e.g. encrypted/signed PDF)

- exact reproduction of the original file, byte for byte

- gathers separate PDF files (e.g. multiple email attachments)

- mix non-PDF data in the same job

If the client is using the Vault APIs to view the Collections mode PDF, are there any drawbacks (versus PDF ingestion mode?)

Collections page counts are counts of the blocks the raw file is broken up into not the number of pages the PDF has. That has an effect on page count licencing. It could inflate or deflate the number of pages per month depending on what PDFs are loaded.

Collections is unusual in that it builds the drd and drp at the same time rather than in two separate phases. Normally you can rebuild the drd from the drp but you can't do that directly with collections. You'd have to decompress the collection and reload it.

In terms of the API calls, the output mode is special to collections so the code may be slightly different.

The type of the file is indicated through the file extension in doc.type or It doesn't directly provide the MIME type needed for use on the web for example. For PDF only that isn't a big concern but it can be if many different file types are used in collections and you need to customize the MIME types. Similarly, the encoding for text files is not well defined and would have to be known through some other means.

From Vault Java API side :

1. if the document format is PDF:

A. rendering output=2 -> PDF

B. rendering page range (i.e. 1 -- 100)

2. if the document is COLLECTION:

A. rendering output=10 -> COLLECTION

B. no rendering page range settings or (no meaning of the page range settings (i.e. 1--100) ).

C. use DocumentInfo->getType() to get the COLLECTION sub-type (i.e. .PDF,. .DOC, .HTML, ...)

3. if want to render all pages, then there is no difference between COLLECTION and PDF.

4. for MIME type, you can use Java API OutputFormat, but these are only main document format (i.e. GIF, PDF, PNG, TEXT, COLLECTION, ...);

you need to build your own MIME type maps based on COLLECTION sub-type ( .PDF, .xls, .doc, .zip, .rar, ....)


  • No Downloads