VERIFIED SOLUTION i
X

No Accented characters show up when rendering text in Vault

Issue

An issue was reported when attempting to retrieve a document in text format from Vault with the web service, serviceweb2, or using the java api. The accented chars were replaced by an ? in the text file. (áéíóúÁÉÍÓÚÜüñÑ…)

It displays correctly in the web view and the PDF view, but when you render it as text, that is where you see the issue with the accented characters.

Cause

Basically what is happening here is that SW2 requests the text to be rendered but does not specify the encoding.

It comes back as win-1252 but the HTTP reply does not specify an encoding and it is likely being interpreted as UTF-8 from the wrapping page.

Accessed directly, IE autodetects the text as Western European (Windows) which is win-1252.

Using 7.3.0.20, the text comes back with the header:

text/plain; charset=UTF-8

This renders correctly. You can actually change the page encoding back and forth to see the difference.

You can see the request from SW2 provide the encoding parameter explicitly in the newer version:

using SW2 7.3.0.20

09:28:54 127.0.0.1:61015 {44} <render1> render.transform request, database [portafolio], offset [0000005000000000], account [2017040531516830000001], file [20170523121928-portafoliovled20170405-afp-31231f785ab911f482cb6f1f84fbb480], output [9], page [1], date [20170405], background [1], pagecount [2], textencoding [UTF-8]

using SW2 7.0M2p0039

09:31:31 127.0.0.1:61057 {71} <render1> render.transform request, database [portafolio], offset [0000005000000000], account [2017040531516830000001], file [20170523121928-portafoliovled20170405-afp-31231f785ab911f482cb6f1f84fbb480], output [9], page [1], date [20170405], background [1], pagecount [2]

Resolution

UPDATED: July 31, 2017


The solution for this issue is to upgrade to Vault 7.3 or higher. In these versions the encoding is explicitly set to UTF-8, which resolves the issue.

This fix is in all 7.3 builds of SW2 but no 6.1 builds.

That also implies that the Java API can support this since SW2 uses the Java API to do the conversion.

The basic VaultClient looks like it now defaults to UTF-8 for text mode though that is settable.

If performing this function through custom code, such code might need to set the content type header charset when returning text.

VaultClient vc

vc.setTextEncoding("UTF-8"); // not actually needed since UTF-8 is the default

vc.connect(host, port);

Environment Details

Vault 5, 6, 7.0, 7.1, 7.2

Downloads

  • No Downloads