If you still got your Zend Framework project lying around with mixed charsets, it’s now time to clean this up! If you decide to switch to Unicode, change the character set everywhere throughout your project. There should no longer be any need of conversions as utf8_encode() or utf8_decode.
Here’s my quick step-by-step tutorial…
- Convert all files to UTF-8
- Convert Database tables
- Layout script
- Bootstrap definitions
- form-tags / Zend_Form
- Zend_Mail
- Final Cleanup
Convert all files to UTF-8
This is the first tricky part. There are tons of different solutions to convert the character set of your files. You could simply take your text editor as e.g. TextMate (on OS X) and save-replace the files, one by one:
But let’s do it the easy way, I mean, the platform-independent way (well, in case you find bash and iconv on your Windows box 🙂 )
I wrote a small script that converts your *.php files in-place using iconv
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
#/bin/bash # # iconv-inplace.sh # Does recursive charset conversion using iconv # # Copyright (c) 2009 Onlime Webhosting, Philip Iezzi # http://www.onlime.ch ###### Configuration ###### FROM_CHARSET="ISO-8859-1" TO_CHARSET="UTF-8" ########################### # Validate args STARTDIR="$1" if [ -z "$STARTDIR" ] then echo "Usage: $0 <directory>" echo "where: <directory> is the directory to start the recursive UTF-8 conversion." exit 1 fi LIST=`find $1 -name "*.php"` for i in $LIST; do file -I $i read -p "Convert $i (y/n)? " if [ "$REPLY" == "y" ] then iconv --from-code=$FROM_CHARSET --to-code=$TO_CHARSET $i > $i."utf8"; mv $i."utf8" $i; fi echo ""; done |
You can now simply run the script in your current directory and change all files recursively by confirming each, e.g.:
1 2 3 4 5 |
# ./iconv-inplace.sh application/controllers application/controllers/AccountController.php: text/x-c++; charset=iso-8859-1 Convert application/controllers/AccountController.php (y/n)? y |
Convert Database tables
Change your CREATE TABLE
statements for new tables:
1 2 3 4 5 |
CREATE TABLE tbl_name ( ... ) TYPE=InnoDB CHARACTER SET utf8 COLLATE utf8_general_ci; |
That’s how you convert existing tables in a different character set:
1 2 3 |
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8; |
Layout script
Add the following meta-tag to your layout script (usually located in /application/views/layouts/layout.phtml
):
1 2 3 |
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> |
Bootstrap definitions
Set the correct charset for your database connection in the database adapter used by your ZF project:
1 2 3 4 |
$dbAdapter->query("SET NAMES 'utf8'"); Zend_Db_Table::setDefaultAdapter($dbAdapter); |
Also set the encoding for your Zend_View object:
1 2 3 4 5 |
$view = Zend_Layout::getMvcInstance()->getView(); $view->setEncoding('UTF-8'); $view->doctype('XHTML1_TRANSITIONAL'); |
form-tags / Zend_Form
Change your form-tags and add accept-charset="utf-8"
:
1 2 3 |
<form accept-charset="utf-8" ... |
If you’re using Zend_Form, specify this attribute as follows:
1 2 3 4 5 6 7 8 9 |
class MyForm extends Zend_Form { public function init() { $this->setMethod('post'); $this->setAttrib('accept-charset', 'utf-8'); ... |
Zend_Mail
Zend_Mail also has to be aware of your charset:
1 2 3 |
$mail = new Zend_Mail('utf-8'); |
Final Cleanup
As final cleanup, do some search-replace:
- remove all occurences of utf8_encode(), utf8_decode()
- search for
iso-8859-1
, replace withutf-8
if appropriate
Jun 25, 2009 - 08:55 PM
Super instructions. Helped me very much to convert an existing project to UTF-8. Had some hard time with conversion until I found out that I have to use UTF-8(no BOM) for all text sources.
Aug 11, 2009 - 09:56 PM
Thanks! I always forget about the dbAdapter part!
Nov 28, 2009 - 02:56 AM
Nice tutorial. Let me add a few things.
Default charset in apache:
http://httpd.apache.org/docs/2.2/mod/core.html#adddefaultcharset
Default charset in php.ini:
http://us3.php.net/manual/en/ini.core.php#ini.default-charset
Default CSS charset:
http://www.w3.org/International/questions/qa-css-charset
FYI on header charsets vs meta charsets:
„The HTTP header is the preferred method, and it overrides the tag if present.“
http://diveintohtml5.org/semantics.html#encoding
Re the form charset:
Note: The accept-charset attribute does not work properly in Internet Explorer. If accept-charset=“ISO-8859-1″, IE will send data encoded as „Windows-1252“.
(but the good side is this is NOT a security issue)
http://www.w3schools.com/TAGS/att_form_accept_charset.asp
Last but not least, instead of issuing SET NAMES query which will be a call to the dB whether you need to connect or not, as of #ZF 1.8 and up, this will work in your app.ini:
db.params.charset = utf8 ;
If you’re running an older version, go with this:
db.params.driver_options.1002 = „SET NAMES utf8“
1002 maps to PDO::MYSQL_ATTR_INIT_COMMAND
Mrz 15, 2010 - 10:17 AM
YOU MADE MY DAY! THANKS!!
Florian
Jul 04, 2010 - 09:11 PM
Thank you, very helpful!
Please note, that you have to set manually the charset if you use the method htmlentities!
Jul 17, 2011 - 11:52 AM
This is fabulous!
Saved my life and hours of work.
Greetz Klaus
Feb 06, 2012 - 03:52 PM
And if failed with Error Like this:
line 1′ in /home/~/library/Zend/Controller/Response/Abstract.php:282 Stack trace: #0 /home/~/library/Zend/Controller/Response/Abstract.php(300): Zend_Controller_Response_Abstract->canSendHeaders(true) #1 /home/~/library/Zend/Controller/Response/Abstract.php(728): Zend_Controller_Response_Abstract->sendHeaders() #2 /home/~/library/Zend/Controller/Front.php(984): Zend_Controller_Response_Abstract->sendResponse() #3 /home/~/library/Zend/Application/Bootstrap/Bootstrap.php(77): Zend_Controller_Front->dispatch() #4 /home/~/library/Zend/Application.php(335): Zend_Application_Bootstrap_Bootstrap->run() #5 /home/~/public/index.php(31): Zend_Application->run() #6 {main} thrown in /home/~/library/Zend/Controller/Response/Abstract.php on line 282
???
And if I return to Ascii or (UTF-8 with BOM) everything work