The Open Global Address Format Project
On the bottom line this project deals with a very specific and seemingly easy task: The correct formatting of international postal addresses. While working on this subject, I had to learn that it’s not a piece of cake to correctly format an address — and in many cases, there is nothing such as "the one correct" format.
Many of us will already have printed a postal address in at least one of their projects, right? Or maybe there is an address shown in the contact section of your website? When putting it there, did you think about how to layout the components of the address, such as your name, the street address or the name of the city you’re living in? Let me guess: You did not. Not really at least. Are you aware that most Asians — like Japanese for example — are used to put the family name in front of the given name? Did you know that, in many parts of the world, addresses are structured from the highest down to the lowest significant component, like beginning with the country and ending with a person’s name — instead of the other way round?
Most of the time we don’t really think about the format of addresses, as we usually deal with addresses targeting regions that have similar cultural habits as we have. But in fact there are literally hundreds of different formats out there, depending on the country, the language and the script you use, and maybe even the directional context. For example, it might play a role whether you send something from within the same country or not. Japan knowns at least two different formats: One for inbound sendings, addressed in Japanese Kanji (??) and beginning with the highest significant part (the country respectively the prefecture), and another “international” format, written in Latin script (????) and beginning with the person’s name, just as we Europeans are used to it. However, whereas the Japanese format can be automatically processed, internationally addressed sendings have to be sorted by hand. You see, there is a huge difference between these formats, that go far beyond the fact that you finally address the same location.
Building a database
I have been dealing with addresses in many of my projects. A very common case for example is when you structure a contact form on a website. I guess the most of us are not aware that we are juggling with cultural habits at this point. Every time I did a website for my wife — who is Japanese — I got asked whether I could switch the positioning of the input fields, placing the country selection at first and dropping the given name altogether (or at least placing it after the family name). It’s all about habits, and laying out such a form can quickly become a complex task, as you might imagine.
In late 2012 I worked on a project that required the entry of addresses from all over the world, so I was in search of a comprehensive reference about address formatting rules. Well, there is none — at least no freely available one that is of good quality and covers the whole world. So I started collecting those “just about 250” different formattings. This is what I thought at first. It soon turned out that there is far more to collect, and that I would have to gather it from all over the web, using multiple sources like
- Wikipedia in lots of different languages and scripts (most of which I don’t understand a single word),
- the United Postal Union (which deals with address formats as to be used from within the US),
- websites of companies dealing with address related software like AddressDoctor, and finally
- websites of private “address fanatics” like Frank’s compulsive guide to postal addresses.
For about three months I spent almost every evening with gathering data. The result is a rather huge, manually maintained database of
- 33 territories,
- 252 countries,
- 7.341 country subdivisions,
- 97 country subdivision types,
- 10.512 names of those country subdivisions (in multiple languages and scripts),
- 7.842 languages,
- 163 scripts,
- 25 romanization systems and finally
- a not so clear number of corresponding address formatting rules.
So what to do with it?
Unfortunately, I did not have the time for publishing the database yet. I am aware that what I've done can only be a starting point, as I included many parts that are in languages or scripts that I don’t understand at all. Also, many of the international address formats are not really standardized, so only inhabitants of those country could help. This is why I plan to turn this into a community project at this point. I would like to find “ambassadors” all over the world, being in charge of the address information for their particular country and the languages and scripts they understand. We have already been working on a website for all this in early 2013, but we had to suspend development due to lack of time. I am, however, really keen on publishing the OGAFP database sometime soon.