VERIFIED SOLUTION i
X

MapInfo Professional support for UTF-8

UPDATED: April 21, 2017


MapInfo Professional does have some support for UTF-8 files already, but it is not completely throughout the software. In the Creating New Tables section of MapInfo Professional Help see the following:
Opening DBF and Shape Files Created with UTF-8 Encoding,
MapInfo Professional includes read-only support for DBF files with data stored in UTF-8 encoding. Many Data vendors distribute data in Shapefile format which include DBF files. The attribute data may be in a UTF-8 character set.

To open a DBF or Shape file with UTF-8 encoding:
  • From the File menu, select Open. The Open dialog displays.
  • From the Files of type list, select dBASE DBF (*.dbf) or Shape as appropriate
  • From the Look in list, select the location of the file to open.
  • Select Open. An information dialog displays.
  • From the File Character Set list, select UTF-8.
  • Select OK.
  • The file will open in a browser or map window, and marked as read-only.

Display Results for Tables with Data stored in UTF-8 Encoding:
UTF-8 can represent every character in the Unicode system. However, MapInfo Professional will only represent characters in the machine's character set or "Codepage" as it is called in Microsoft Windows.
A codepage represents all the characters of one or more languages. MapInfo Professional will be able to completely read your UTF-8 data as long as all the characters have a representation in that character set. If the data being read has a character that is not in your character set, MapInfo Professional displays it as the underscore ( _ ) character. The following table shows some examples:

System
Character Set Data Stored in UTF-8 Result
Windows Latin 1
Windows Latin 1
All characters. This includes all Western European languages.

Windows Latin2
Windows Latin2
All characters. This includes all Eastern European languages using latin character.

Windows japanese
Windows japanese
All characters

Windows Cyrillic
Windows Cyrillic
All characters

Windows Latin 1
Windows Latin2
Many Latin2 characters also exist in Latin1.
Characters unique to Latin2 display as underscores.

WindowsLatin1 or WindowsLatin2
Cyrillic
ASCII characters display.
All other characters display as underscores.

Chinese
Japanese
ASCII characters display.
All other characters display as underscores.

Data stored in UTF-8 that is entirely in one character set displays correctly on systems set for that character set. UTF-8 data outside the character set displays as underscores. For example, if a UTF-8 dataset has a mixture of Latin1 and Latin2 characters, then the table opens and displays different parts correctly depending on your system.
  • The UTF-8 character set is supported only for DBF format (including shapefiles).
  • Tables that use UTF-8 encoding have a version of 1000.
  • Tables that use UTF-8 character set are read-only.
To edit a table opened from UTF-8, save it as a MapInfo Native table (.TAB) by choosing File > Save Copy as. Open the copy and edit the file like any other table.

For Pro 15.2 x64:http://reference.mapinfo.com/software/mapinfo_pro/english/15.2/MapInfoProUserGuide.pdf

Using Data Files in Any Language or Character Set

You can work with characters from any language in your data files, so that multi-language tables

display properly in maps, browsers, the Info tool, and other locations. MapInfo Pro can open tables,

files, or workspaces with Unicode characters in the file name or path name regardless of the locale

of MapInfo Pro or which localized version of MapInfo Pro you are running. A system setting called

Encode Workspaces and Tab Files enables this feature, which is on by default.

You would disable Encode Workspaces and Tab Files to share MapInfo tables with versions of

MapInfo Pro that are older than version 15.2, to share data with applications that do not support the

UTF-8 character set, or when you use data from only one language. In this case, workspaces and

tables are written with the current system character setting (charset).

When disabled, this system setting writes workspaces using the UTF-8 charset. New Tab files or

Tab files being re-written, such as save copy as, pack table, update friendly name, or update

metadata, use the UTF-8 encoding. The !charset in the .tab file remains the same; it represents

the data in the table and not the charset of the .tab file itself. MapInfo Pro writes a UTF-8 Byte Order

Mark (BOM) at the beginning of the file, so that other applications recognize the encoding.

To enable or disable the Encode Workspaces and Tab Files feature:

1. On the PRO tab, click Options, and click System Settings in the System group, to open the

System Settings Preferences dialog box.

2. Select the Encode Workspaces and Tab Files check box to enable this feature or clear the

check box to disable it.

3. Click OK.

To specify a specific character set, such as UTF-8 or UTF-16, to use for your MapInfo tables (*.tab)

and MapInfo Interchange files (*.mif, *.mid), see Setting Your Language Preferences.

You can encounter data corruption, due to truncation or conversion, when saving a copy of

a database table between Unicode and non-Unicode character sets. When saving non-UTF-8

Note:

(non-Unicode) to UTF-8 (Unicode), there is the potential for data truncation. When saving

UTF-8 or UTF-16 (Unicode) to a non-Unicode, there is the potential for conversion issues.

When saving data to the MapInfo Extended TAB format (NativeX format), MapInfo Pro

interprets the width of character fields in tables with a UTF-16 character set (charset) as the

number of characters with two bytes (16-bits) per character. It interprets the width of character

fields in tables with any character set other than UTF-16 (such as WindowsLatin1, Cyrillic,

and UTF-8) as the number of bytes. For non UTF-8 character sets each character takes up

one byte, but could also take from one to four bytes. For UTF-8, since it is used to store

characters from any language, it is more likely to require more than one byte. This means

that you need to allow for larger field widths to avoid data truncation.

Using the UTF-16 character set is the best way to ensure that all data is preserved, but it

results in larger file sizes. The UTF-8 character set can be used to encode all characters

faithfully, but truncation could occur. When you save a copy of a table from a non UTF-8

character set to UTF-8, increase the field width to avoid truncation.

 


Downloads

  • No Downloads