Setting codepages in shapefiles to display in ArcMap 10 – A saga

I spent a good few hours last week trying to get ArcMap 10 to display a shapefile with Greek data in. Not an easy job. The shapefile was Open Street Map data including all place names in Greece, downloaded from Cloudmade. When I opened the attribute table in ArcMap it looked something like this:


On the other hand, the shapefile was displaying fine in QGIS IF you opened it by setting the Encoding to UTF-8. So I set forth trying to find a way of specifying the encoding in ArcMap. To skip right to the bottom, I can tell you now that nothing I tried using ESRI-only tools and suggestions worked. But during my quest I found some useful bits of information which may be prove helpful for someone in the future: So this is my story:

I first made sure that my Regional Settings was set to Greek/Greece and the same was true for the Language for Non-Unicode programs- just in case. (this used to work for Arcview 3.x – obviously ArcMap IS a Unicode program):


Then I tried adding the *.cpg file. This is basically a single-line text file with the same name as the shapefile which stores codepage information for the attribute data (if the shapefile does not carry this information). Tried various codepage codes. It didn’t work.

I then move on to add some registry values according to this little gem: HowTo:  Read and write shapefile and dBASE files encoded in various code pages. Nope.

Getting desparate, I then tried the 30th byte way from ESRI which tells you how to determine whether a shapefile has a code page or not (apparently this information is stored in the 30th byte)::

To find the 30th byte, you count the sets of characters in the center. In this example, you start with 03, which counts as 1. Count over 30, counting only the character sets shown in blue in this example. If the set is 00, the code page is not set. The 30th character in this example is 0E. Therefore, the code page is set.

0B8D:0100 03 64 02 07 01 00 00 00-A1 00 41 00 00 00 00 00 .d……..A…..
1489:0110 00 00 00 00 00 00 00 00-00 00 00 00 00 0E 00 00 …………….

If the code page is not set in the .dbf header, you can create a code page file (.cpg) to store the code page. To create a code page file, you use a text editor, such as vi or Notepad, add the code page identifier for the shapefile to the file, and save it with a .cpg extension in the same location as the other files that make up the shapefile. You have to be sure you know the encoding used for the shapefile so you place the correct code page in the .cpg file.

If for some reason you have a .cpg file and the code page is set in the .dbf, the information in the .dbf header takes priority when importing a shapefile. If no code page is set in the .dbf, the code page is read from the .cpg file. If the code page is not set in the .dbf and no .cpg file is present, the code page of the current locale of the operating system from which shp2sde is being run (the server where ArcSDE is installed) will be used.

So I opened the dbf in a HexEditor. It did have a value on the 30th byte. I put it to 00. Re-opened shapefile in ArcMap. Same rubbish characters.

By that time I had enough. I opened the file in QGIS (with a UTF-8 encoding) and then saved it under a new name with a CP-1253 encoding. And of course it worked. Which was what I thought to try in the beginning but my masochist self wanted to find a solution without using any 3rd party tools.

I mean, admire the simplicity of it all:

Step 1. Add the shapefile and set encoding


Step 2. Save it under a new name


Now, how difficult it would be for ESRI to have an encoding option when opening a shapefile?

If you think I missed something and there is indeed a much simpler way (using ESRI tools) to view attribute data in ArcMap I would be more than happy to learn about it!