http response亂碼的真相

這個是很久以前的筆記，最近遇到一個編碼問題，重新把它翻出來了。

這個隻和java servlet有關，現在通常都用各種框架，很少會直接用到Servlet了。

查看servlet源代碼的方法

查看servlet源代碼的方法。因為servlet隻是一些接口，並不是真正的實現，所以，如果想看真正的代碼。
要去下對應的服務器的實現的源代碼。比如Tomcat的代碼在這裏：
https://archive.apache.org/dist/tomcat/tomcat-6/v6.0.33/src/

Servlet裏的PrintWriter和ServletOutputStream

在servlet裏有兩種方法可以輸出：

PrintWriter writer = response.getWriter();
ServletOutputStream outputStream = response.getOutputStream();

其中PrintWriter隻提供了一係列的println函數，不能寫二進製內容。其實這個是很合理的，下麵會解釋原因。
ServletOutputStream則有println係列函數和wirte係列函數。

當使用ServletOutputStream來輸出中文字符，則會出現設置了CharacterEncoding，而無效的情況。

response.setCharacterEncoding("utf-8");  //這句話並不能解決編碼問題
ServletOutputStream outputStream = response.getOutputStream();
outputStream.println("中文");

我們在瀏覽器上，可以查看頁麵編碼，可以發現的確是utf-8編碼，但是為什麼response.setCharacterEncoding("utf-8")，而還是亂碼？

真正的罪人是ServletOutputStream，它根本沒有實現編碼轉換。我們可以看下它是怎樣實現的：

   public void print(String s) throws IOException {
	if (s==null) s="null";
	int len = s.length();
	for (int i = 0; i < len; i++) {
	    char c = s.charAt (i);

	    //
	    // XXX NOTE:  This is clearly incorrect for many strings,
	    // but is the only consistent approach within the current
	    // servlet framework.  It must suffice until servlet output
	    // streams properly encode their output.
	    //
	    if ((c & 0xff00) != 0) {	// high order byte must be zero
		String errMsg = lStrings.getString("err.not_iso8859_1");
		Object[] errArgs = new Object[1];
		errArgs[0] = new Character(c);
		errMsg = MessageFormat.format(errMsg, errArgs);
		throw new CharConversionException(errMsg);
	    }
	    write (c);
	}
    }

很明顯，它根本沒有進行編碼轉換：XXX NOTE: This is clearly incorrect for many strings。。

我們再用PrintWriter來輸出：

response.setCharacterEncoding("utf-8");
PrintWriter writer = response.getWriter();
writer.println("中文");

我們可以在瀏覽器上查看，頁麵編碼是utf-8，則顯示是正確的中文字符。

我們再看看PrintWriter是怎樣工作的：
在Tomcat中PrintWriter實際上是org.apache.catalina.connector.CoyoteWriter類，

    public void print(String s) {
        if (s == null) {
            s = "null";
        }
        write(s);
    }

    public void write(String s, int off, int len) {

        if (error)
            return;

        try {
            ob.write(s, off, len);
        } catch (IOException e) {
            error = true;
        }

    }
    public void write(String s) {
        write(s, 0, s.length());
    }
    public void write(String s, int off, int len) {

        if (error)
            return;

        try {
            ob.write(s, off, len);   //ob是org.apache.catalina.connector.OutputBuffer類
        } catch (IOException e) {
            error = true;
        }

    }

org.apache.catalina.connector.OutputBuffer類中的write函數：

    public void write(String s, int off, int len)
        throws IOException {

        if (suspended)
            return;

        charsWritten += len;
        if (s == null)
            s = "null";
//這裏進行編碼轉換，conv的聲明：protected C2BConverter conv;
//在調試過程中可以看到C2BConverter中的存放的正是utf-8編碼。
        conv.convert(s, off, len);  
        conv.flushBuffer();

    }

至此，我們終於找到了真相。PrintWriter會在底層把字符串的編碼轉換為對應的CharacterEncoding的編碼。
這也就是為什麼PrintWriter沒有提供wirte係列函數的原因。
BTW：怎樣用ServletOutputStream來輸出我們想要的編碼字符串？
在剛才的代碼中，我們可以看到ServletOutputStream的pirnt係列函數實際上什麼轉換工作都沒有做。所以我們可以先把字符串轉換成想要的編碼，再寫到ServletOutputStream中。
如：

response.setCharacterEncoding("utf-8");  
ServletOutputStream outputStream = response.getOutputStream();
PrintStream printStream = new PrintStream(outputStream);
printStream.write("中文".getBytes("utf-8"));

tomcat裏一勞永逸解決亂碼問題

要想在tomcat中一勞永逸解決亂碼問題，可以這樣做：

1.設置tomcat，conf/server.xml文件中，useBodyEncodingForURI="true"：

    <Connector port="8080" protocol="HTTP/1.1" 
               connectionTimeout="20000" 
               redirectPort="8443" useBodyEncodingForURI="true"/>

2.增加一個filter：

public class CodeFilter implements Filter {
	@Override
	public void init(FilterConfig filterConfig) throws ServletException {	
	}
	@Override
	public void doFilter(ServletRequest request, ServletResponse response,
			FilterChain chain) throws IOException, ServletException {
		request.setCharacterEncoding("utf-8");
		response.setCharacterEncoding("utf-8");
		chain.doFilter(request, response);
	}
	@Override
	public void destroy() {	
	}
}

3.在web.xml中配置filter：

	<filter>
		<filter-name>CodeFilter</filter-name>
		<filter-class>com.leg.filter.CodeFilter</filter-class>
	</filter>
	
	<filter-mapping>
		<filter-name>CodeFilter</filter-name>
		<url-pattern>*</url-pattern>
	</filter-mapping>

最後更新：2017-04-03 14:54:35

http response亂碼的真相

查看servlet源代碼的方法

Servlet裏的PrintWriter和ServletOutputStream

tomcat裏一勞永逸解決亂碼問題

上一篇： Eclipse SVN (Subclipse的更新日誌)、版本集合(1.10.0起)、更新、安裝方法！

下一篇： mooc 第五章習題

相關內容

熱門內容

最新內容

http response亂碼的真相

查看servlet源代碼的方法

Servlet裏的PrintWriter和ServletOutputStream

tomcat裏一勞永逸解決亂碼問題

上一篇： Eclipse SVN (Subclipse的更新日誌)、版本集合(1.10.0起)、更新、安裝方法！

下一篇： mooc 第五章 習題

相關內容

熱門內容

最新內容

下一篇： mooc 第五章習題