Setting codepages in shapefiles to display in ArcMap 10 – A saga

I spent a good few hours last week trying to get ArcMap 10 to display a shapefile with Greek data in. Not an easy job. The shapefile was Open Street Map data including all place names in Greece, downloaded from Cloudmade. When I opened the attribute table in ArcMap it looked something like this:


On the other hand, the shapefile was displaying fine in QGIS IF you opened it by setting the Encoding to UTF-8. So I set forth trying to find a way of specifying the encoding in ArcMap. To skip right to the bottom, I can tell you now that nothing I tried using ESRI-only tools and suggestions worked. But during my quest I found some useful bits of information which may be prove helpful for someone in the future: So this is my story:

I first made sure that my Regional Settings was set to Greek/Greece and the same was true for the Language for Non-Unicode programs- just in case. (this used to work for Arcview 3.x – obviously ArcMap IS a Unicode program):


Then I tried adding the *.cpg file. This is basically a single-line text file with the same name as the shapefile which stores codepage information for the attribute data (if the shapefile does not carry this information). Tried various codepage codes. It didn’t work.

I then move on to add some registry values according to this little gem: HowTo:  Read and write shapefile and dBASE files encoded in various code pages. Nope.

Getting desparate, I then tried the 30th byte way from ESRI which tells you how to determine whether a shapefile has a code page or not (apparently this information is stored in the 30th byte)::

To find the 30th byte, you count the sets of characters in the center. In this example, you start with 03, which counts as 1. Count over 30, counting only the character sets shown in blue in this example. If the set is 00, the code page is not set. The 30th character in this example is 0E. Therefore, the code page is set.

0B8D:0100 03 64 02 07 01 00 00 00-A1 00 41 00 00 00 00 00 .d……..A…..
1489:0110 00 00 00 00 00 00 00 00-00 00 00 00 00 0E 00 00 …………….

If the code page is not set in the .dbf header, you can create a code page file (.cpg) to store the code page. To create a code page file, you use a text editor, such as vi or Notepad, add the code page identifier for the shapefile to the file, and save it with a .cpg extension in the same location as the other files that make up the shapefile. You have to be sure you know the encoding used for the shapefile so you place the correct code page in the .cpg file.

If for some reason you have a .cpg file and the code page is set in the .dbf, the information in the .dbf header takes priority when importing a shapefile. If no code page is set in the .dbf, the code page is read from the .cpg file. If the code page is not set in the .dbf and no .cpg file is present, the code page of the current locale of the operating system from which shp2sde is being run (the server where ArcSDE is installed) will be used.

So I opened the dbf in a HexEditor. It did have a value on the 30th byte. I put it to 00. Re-opened shapefile in ArcMap. Same rubbish characters.

By that time I had enough. I opened the file in QGIS (with a UTF-8 encoding) and then saved it under a new name with a CP-1253 encoding. And of course it worked. Which was what I thought to try in the beginning but my masochist self wanted to find a solution without using any 3rd party tools.

I mean, admire the simplicity of it all:

Step 1. Add the shapefile and set encoding


Step 2. Save it under a new name


Now, how difficult it would be for ESRI to have an encoding option when opening a shapefile?

If you think I missed something and there is indeed a much simpler way (using ESRI tools) to view attribute data in ArcMap I would be more than happy to learn about it!

Shp2ora and Ora2shp: Utilities for importing and exporting shapefiles to Oracle

Yes, another one of those. Its not exactly a novel idea as there are a few converters around for moving shapefile data to and from Oracle:

  • If you are an ESRI/ArcSDE user you have the sde2shp and shp2sde commands.
  • Oracle provides two options: a command line program, shp2sdo to load shapefiles into Oracle, and its java equivalent through Mapbuilder
  • Most (all?) GIS’ that can read Oracle Spatial data, include options to save the Oracle layer to a shapefile.

Plenty of choice you probably think. Well, I was still not happy. First of all I didn’t always have ArcSDE on all machines I was working on, secondly installing Mapbuilder means you need to go through the 18.2MB download, plus it will only LOAD shapefiles into Oracle and not the other way around and thirdly, I didn’t want to go through the whole process of downloading a fully-blown desktop GIS (say QGIS) just for the sake of converting a spatial table into a shapefile. Not to mention that you have to configure the client GIS first in order to connect to Oracle BEFORE you can even display the layer – so you can then export it. Too much hassle for a relative easy task.

So I decided to create my own little command-line utilities to do just that. You can download the full source code and executables from the Box.Net widget on the left side of this post (ora2shp.rar).

After compiling the project, you will find two executables under the Debug folders of Ora2shp and shape2ora projects: Ora2shp.exe and shp2ora.exe respectively,

Ora2shp syntax is: ora2shp <username/password>@dbalias> <spatial_table_name> <PK_col> <shape_col> <shapefile> ["optional_where_clause"]

If you use the where clause you should NOT put the WHERE keyword.

Shp2orashp2ora <username/password>@dbalias> <spatial_table_name> <shape_col> <shapefile> <srid>

If you don’t specify a SRID it will default to null. If the spatial table already exists records will get appended. If it doesn’t it will get created and will also create the oracle metadata records record (USER_SDO_GEOM_METADATA table) and spatial index.

It should deal with all shapes (multipart, Z, M) APART from multipatch.

Note that you will need Oracle client 10.2 installed to compile the programs as-is. Otherwise change the reference to the Oracle client of your choice. You will know if you have the wrong Oracle client version if you get an error about Oracle.DataAccess not being the same version as Oracle Client.

When running the programs any errors should appear on the console and the full stack trace will be written to orashp_err.log file, located at the same folder as the executables.

Note that  performance is not great- especially when creating the shapefile. It took something less than 15mins to create a point shapefile of around 66K records. (around 5M of shapefiles). But feel free to improve it and I would very much appreciate to get back to me if you do!