1. Introduction
UTF-8 is the most common character encoding used in web applications. It supports all languages currently spoken in the world including Chinese, Korean, and Japanese.
In this article, we demonstrate all configuration needed to ensure UTF-8 in Tomcat.
2. Connector Configuration
A Connector listens for connections on a specific port. We need to make sure that all of our Connectors use UTF-8 to encode requests.
Let’s add the parameter URIEncoding=”UTF-8″ to all the Connectors in TOMCAT_ROOT/conf/server.xml:
<Connector URIEncoding="UTF-8" port="8080" redirectPort="8443" connectionTimeout="20000" protocol="HTTP/1.1"/> <Connector URIEncoding="UTF-8" port="8009" redirectPort="8443" protocol="AJP/1.3"/>
3. Character Set Filter
After configuring the connector, it’s time to force the web application to handle all requests and responses in UTF-8.
Let’s define a class named CharacterSetFilter:
public class CharacterSetFilter implements Filter { // ... public void doFilter( ServletRequest request, ServletResponse response, FilterChain next) throws IOException, ServletException { request.setCharacterEncoding("UTF-8"); response.setContentType("text/html; charset=UTF-8"); response.setCharacterEncoding("UTF-8"); next.doFilter(request, response); } // ... }
We need to add the filter to our application’s web.xml so that it’s applied to all requests and responses:
<filter> <filter-name>CharacterSetFilter</filter-name> <filter-class>com.baeldung.CharacterSetFilter</filter-class> </filter> <filter-mapping> <filter-name>CharacterSetFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping>
4. Servlet Page Encoding
The other part of our web application we need to configure is servlet pages.
The best way to ensure UTF-8 in servlet pages is to add this tag at the top of each JSP page:
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>
5. HTML Page Encoding
While servlet page encoding tells JVM how to handle page characters, HTML page encoding tells the browser how to handle page characters.
We should add this <meta> tag in the head section of all HTML pages:
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />
6. MySQL Server Configuration
Now, that our Tomcat is configured, it’s time to configure the database.
We assume that a MySQL server is used. The configuration file is named my.ini on Windows and my.cnf on Linux.
We need to find the configuration file, search for these parameters, and edit them accordingly:
[client] default-character-set = utf8mb4 [mysql] default-character-set = utf8mb4 [mysqld] character-set-client-handshake = FALSE character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci
We need to restart MySQL server for the changes to take effect.
7. MySQL Database Configuration
MySQL server character set configuration is only applied to new databases. We need to migrate old ones manually. This can be easily achieved using a few commands.
For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
For each VARCHAR or TEXT column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(69) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
8. Conclusion
In this article, we demonstrated how to ensure Tomcat uses the UTF-8 encoding.