Convert Zend Framework project to UTF-8

If you still got your Zend Framework project lying around with mixed charsets, it’s now time to clean this up! If you decide to switch to Unicode, change the character set everywhere throughout your project. There should no longer be any need of conversions as utf8_encode() or utf8_decode.
Here’s my quick step-by-step tutorial…

Convert all files to UTF-8

This is the first tricky part. There are tons of different solutions to convert the character set of your files. You could simply take your text editor as e.g. TextMate (on OS X) and save-replace the files, one by one:
TextMate save as...
But let’s do it the easy way, I mean, the platform-independent way (well, in case you find bash and iconv on your Windows box :) )
I wrote a small script that converts your *.php files in-place using iconv:

#/bin/bash
#
# iconv-inplace.sh
# Does recursive charset conversion using iconv
#
# Copyright (c) 2009 Onlime Webhosting, Philip Iezzi
#                    http://www.onlime.ch
 
###### Configuration ######
FROM_CHARSET="ISO-8859-1"
TO_CHARSET="UTF-8"
###########################
 
# Validate args
STARTDIR="$1"
if [ -z "$STARTDIR" ]
then
    echo "Usage: $0 <directory>"
    echo "where: <directory> is the directory to start the recursive UTF-8 conversion."
    exit 1
fi
 
LIST=`find $1 -name "*.php"`
for i in $LIST;
do
    file -I $i
    read -p "Convert $i (y/n)? "
    if [ "$REPLY" == "y" ]
    then
        iconv --from-code=$FROM_CHARSET --to-code=$TO_CHARSET $i > $i."utf8";
        mv $i."utf8" $i;
    fi
    echo "";
done

You can now simply run the script in your current directory and change all files recursively by confirming each, e.g.:

# ./iconv-inplace.sh application/controllers
application/controllers/AccountController.php: text/x-c++; charset=iso-8859-1
Convert application/controllers/AccountController.php (y/n)? y

Convert Database tables

Change your CREATE TABLE statements for new tables:

CREATE TABLE tbl_name (
  ...
) TYPE=InnoDB CHARACTER SET utf8 COLLATE utf8_general_ci;

That’s how you convert existing tables in a different character set:

ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8;

Layout script

Add the following meta-tag to your layout script (usually located in /application/views/layouts/layout.phtml):

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Bootstrap definitions

Set the correct charset for your database connection in the database adapter used by your ZF project:

$dbAdapter->query("SET NAMES 'utf8'");
Zend_Db_Table::setDefaultAdapter($dbAdapter);

Also set the encoding for your Zend_View object:

$view = Zend_Layout::getMvcInstance()->getView();
$view->setEncoding('UTF-8');
$view->doctype('XHTML1_TRANSITIONAL');

form-tags / Zend_Form

Change your form-tags and add accept-charset="utf-8":

<form accept-charset="utf-8" ...

If you’re using Zend_Form, specify this attribute as follows:

class MyForm extends Zend_Form {
 
    public function init()
    {
        $this->setMethod('post');
        $this->setAttrib('accept-charset', 'utf-8');
        ...

Zend_Mail

Zend_Mail also has to be aware of your charset:

$mail = new Zend_Mail('utf-8');

Final Cleanup

As final cleanup, do some search-replace:

6 Comments so far »

  1. ProTom said

    am June 25 2009 @ 8:55 pm

    Super instructions. Helped me very much to convert an existing project to UTF-8. Had some hard time with conversion until I found out that I have to use UTF-8(no BOM) for all text sources.

  2. Darryl said

    am August 11 2009 @ 9:56 pm

    Thanks! I always forget about the dbAdapter part!

  3. Joe Devon said

    am November 28 2009 @ 2:56 am

    Nice tutorial. Let me add a few things.

    Default charset in apache:
    http://httpd.apache.org/docs/2.2/mod/core.html#adddefaultcharset

    Default charset in php.ini:
    http://us3.php.net/manual/en/ini.core.php#ini.default-charset

    Default CSS charset:
    http://www.w3.org/International/questions/qa-css-charset

    FYI on header charsets vs meta charsets:
    “The HTTP header is the preferred method, and it overrides the tag if present.”
    http://diveintohtml5.org/semantics.html#encoding

    Re the form charset:
    Note: The accept-charset attribute does not work properly in Internet Explorer. If accept-charset=”ISO-8859-1″, IE will send data encoded as “Windows-1252″.
    (but the good side is this is NOT a security issue)
    http://www.w3schools.com/TAGS/att_form_accept_charset.asp

    Last but not least, instead of issuing SET NAMES query which will be a call to the dB whether you need to connect or not, as of #ZF 1.8 and up, this will work in your app.ini:
    db.params.charset = utf8 ;

    If you’re running an older version, go with this:
    db.params.driver_options.1002 = “SET NAMES utf8″

    1002 maps to PDO::MYSQL_ATTR_INIT_COMMAND

  4. Florian said

    am March 15 2010 @ 10:17 am

    YOU MADE MY DAY! THANKS!!
     
    Florian

  5. riedi said

    am July 4 2010 @ 9:11 pm

    Thank you, very helpful!
     
    Please note, that you have to set manually the charset if you use the method htmlentities!
     

    htmlentities ( $element->getLabel (), null, ‘UTF-8′ );

  6. Klaus said

    am July 17 2011 @ 11:52 am

    This is fabulous!
    Saved my life and hours of work.
    Greetz Klaus

Comment RSS · TrackBack URI

Leave a comment

Name: (Required)

eMail: (Required)

Website:

Comment: