閱讀526 返回首頁    go 阿裏雲 go 技術社區[雲棲]


PostgreSQL 中生成隨機漢字

標簽

PostgreSQL , 字符集 , 漢字編碼 , 隨機漢字 , chr , ascii , unicode


背景

漢字unicode編碼範圍

php中utf-8編碼下用正則表達式匹配漢字的最終正確表達式 ——   
/[\x{4e00}-\x{9fa5}]/u  
\u4e00-\u9fa5 (中文)  
\x3130-\x318F (韓文)  
\xAC00-\xD7A3 (韓文)  
\u0800-\u4e00 (日文)  

4e00 對應 19968

9fa5 對應 40869

一共 20901 個漢字

例子1

  • 注意,為了保證能輸出所有的漢字,確保數據庫的編碼為UTF8,否則會報類似錯誤。
db=# select '\u9fa5'::text;  
ERROR:  Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8 at or near "'\u9fa5"  
LINE 1: select '\u9fa5'::text;  
               ^  

例子

set standard_conforming_strings =off;  
set escape_string_warning=off;  
  
postgres=# select '\u9fa5'::text;  
 text   
------  
 龥  
(1 row)  

例子2

除了unicode的寫法,PostgreSQL還提供了兩個函數,支持數值的寫法。

Function Return Type Description Example Result
ascii(string) int ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character. ascii('x') 120
chr(int) text Character with the given code. For UTF8 the argument is treated as a Unicode code point. For other multibyte encodings the argument must designate an ASCII character. The NULL (0) character is not allowed because text data types cannot store such bytes. chr(65) A
postgres=# select chr(19968);  
 chr   
-----  
 一  
(1 row)  

輸出所有漢字

例子

do language plpgsql $$  
declare  
  res text := '';  
begin  
  for i in 19968..40869 loop  
    res := res||chr(i);  
  end loop;  
  raise notice '%', res;  
end;  
$$;  
  
NOTICE:  一丁丂七丄丅丆萬丈三上下丌不與丏丐醜丒專且丕世丗丘丙業叢東絲.............................  

輸出隨機漢字

例子

創建函數

create or replace function gen_hanzi(int) returns text as $$  
declare  
  res text;  
begin  
  if $1 >=1 then  
    select string_agg(chr(19968+(random()*20901)::int), '') into res from generate_series(1,$1);  
    return res;  
  end if;  
  return null;  
end;  
$$ language plpgsql strict;  

使用函數,生成隨機漢字

postgres=# select gen_hanzi(10);  
      gen_hanzi         
----------------------  
 析埲錀噝穎燈嬪閸醿厧  
(1 row)  
  
postgres=# select gen_hanzi(10);  
      gen_hanzi         
----------------------  
 仫哸擡襖批梹繜嚪隸別  
(1 row)  

參考

https://baike.baidu.com/item/Unicode

https://unicode.org/charts/

https://en.wikipedia.org/wiki/Unicode

《PostgreSQL 轉義、UNICODE、與SQL注入》

https://stackoverflow.com/questions/3970795/how-do-you-create-a-random-string-thats-suitable-for-a-session-id-in-postgresql

https://www.postgresql.org/docs/9.6/static/functions-string.html

最後更新:2017-04-26 17:31:25

  上一篇:go 文本挖掘之特征選擇(Python版)
  下一篇:go PostgreSQL 十億級模煳查詢最佳實踐