263 阿裏雲技術社區[雲棲]

C++流實現內幕---由boost::lexical_cast引發的一個問題

中午同事碰見一個關於使用boost::lexical_cast產生異常的問題，關鍵代碼如下

string str(8,'/0');
strncpy(&str.at(0),"1234567",7);
cout << lexical_cast<int>(str) << endl;

結果運行的時候發生如下異常

terminate called after throwing an instance of 'boost::bad_lexical_cast'
what(): bad lexical cast: source type value could not be interpreted as target

我們知道boost::lexical_cast最終使用的stringstream實現的數值類型轉換，所以，我們使用如下例子，做測試

stringstream ss;
ss << str;
ss >> result;
cout << "new Result: " << result << endl;

編譯運行後，輸出

new Result: 1234567

可以正常顯示，好像沒有問題，

我們察看一下boost的源代碼

vim /usr/include/boost/lexical_cast.hpp

察看lexical_cast函數

template<typename Target, typename Source>
Target lexical_cast(Source arg)
{
detail::lexical_stream<Target, Source> interpreter;
Target result;
if(!(interpreter << arg && interpreter >> result))
throw_exception(bad_lexical_cast(typeid(Target), typeid(Source)));
return result;
}

可見lexical_cast函數非常簡單，就是具體執行operator<<和operator>>兩個操作，隻要這兩個操作有一個失敗就拋出一個異常，為了確認是那步出的錯，我們在程序中手工執行這兩個操作。代碼如下

detail::lexical_stream<int, string> interpreter;
int result;
if(!(interpreter << str ))
{
cout << "Error 1" << endl;
}
if(!(interpreter >> result))
{
cout << "Error 2" << endl;
}
cout << result << endl;

編譯運行後輸出

Error 2

從這裏我們知道，lexical_cast是在執行輸出流的時候發生的問題，察看detail的operator>>函數，其源代碼如下

template<typename InputStreamable>
bool operator>>(InputStreamable &output)
{
return !is_pointer<InputStreamable>::value &&
stream >> output &&
(stream >> std::ws).eof();
}

根據以上代碼和我們使用stringstream做的測試，基本上可以確定在stream>>output（包括次步）都是正確的，可能出現問題的是(stream >> std::ws).eof();

這裏解釋下std::ws和stringstring::eof()函數

Std::ws函數聲明在

/usr/include/c++/3.4.4/bits/istream.tcc

源代碼如下

// 27.6.1.4 Standard basic_istream manipulators
template<typename _CharT, typename _Traits>
basic_istream<_CharT,_Traits>&
ws(basic_istream<_CharT,_Traits>& __in)
{
typedef basic_istream<_CharT, _Traits> __istream_type;
typedef typename __istream_type::__streambuf_type __streambuf_type;
typedef typename __istream_type::__ctype_type __ctype_type;
typedef typename __istream_type::int_type __int_type;
const __ctype_type& __ct = use_facet<__ctype_type>(__in.getloc());
const __int_type __eof = _Traits::eof();
__streambuf_type* __sb = __in.rdbuf();
__int_type __c = __sb->sgetc();
while (!_Traits::eq_int_type(__c, __eof)
&& __ct.is(ctype_base::space, _Traits::to_char_type(__c)))
__c = __sb->snextc();
if (_Traits::eq_int_type(__c, __eof))
__in.setstate(ios_base::eofbit);
return __in;
}

主要作用是過濾輸入流中的空格，/n/r等字符。stream >> std::ws目的就是把輸入流中轉換完整形後的剩餘流內容（假如有的話）寫入std::ws，當然隻能寫入其中的空格和/n/r等字符。

stringstring::eof()函數參考 https://www.cppreference.com/wiki/io/eof 部分

該函數的主要作用是，如果到達流的結束位置返回true，否則返回false

根據以上信息，我們編寫測試用例

stringstream ss;
ss << str;
ss >> result;
cout << "new Result: " << result << endl;
cout << ss.eof() << endl;
cout << (ss >> std::ws).eof() << endl;

編譯運行後輸出

new Result: 1234567

由此可見，雖然我們使用ss時，可以輸出想要的正確結果，但是我們缺少最後的安全驗證，而boost::lexical_cast就做了這方麵的驗證。

其實例子中的’/0’在開始的時候，起了不小的誤導作用，開始以為是boost::lexical_cast無法處理最後末尾是’/0’的字符串，到現在其實不然，我們把’/0’轉換為’a’字符一樣會出現這種問題，但是我們使用’/n’,’/r’和空格等字符就不會出現這種問題，現在我們知道其根源就是在字符轉換過程中輸入流沒有輸入全部字符，所以流的結束標誌EOF,一直為0。

其實在上麵的應用中我們不能一直認為boost::lexical_cast的方法一定是好的。在我們編成過程中，常見的轉換是把一段字符串中含有數字和字母的字符串中的數字串轉換為整形，這樣的如果我們使用boost::lexical_cast的話，永遠得不到正確結果了，每次都會有異常拋出，這時候我們可以使用stringstream，轉換後不判斷eof()，這樣就可以得到我們想要的整數。

在上麵的測試中，突然想到一個變態的想法，STL中的字符串轉為整形的流實現是怎麼做的，不過SGI的STL真夠難堪的。

大體查找過程如下

(1): Vim /usr/include/c++/3.4.4/sstream

發現引用了istream

(2): Vim /usr/include/c++/3.4.4/ istream

發現operator<<(int)的實現在bits/istream.tcc文件中

(3): Vim /usr/include/c++/3.4.4/ bits/istream.tcc

發現const __num_get_type& __ng = __check_facet(this->_M_num_get);__ng.get(*this, 0, *this, __err, __l);所以查找__num_get_type類型中的get函數，同時發現istream.tcc中的#include <locale> 比較陌生，同時在istream中查找__num_get_type 類型為typedef num_get<_CharT, istreambuf_iterator<_CharT, _Traits> > __num_get_type; 所以，最終要查找的類型為num_get

(4): Vim /usr/include/c++/3.4.4/locale

發現這個文件中包括以下頭文件

#include <bits/localefwd.h>
#include <bits/locale_classes.h>
#include <bits/locale_facets.h>
#include <bits/locale_facets.tcc>

逐個察看

(5): Vim /usr/include/c++/3.4.4/ bits/localefwd.h

發現模板類num_get聲明

template<typename _CharT, typename _InIter = istreambuf_iterator<_CharT> >

class num_get;

(6): Vim /usr/include/c++/3.4.4/ bits/locale_facets.h

在這個文件中發現num_get的實現

template<typename _CharT, typename _InIter>

class num_get : public locale::facet

查找get方法

iter_type

get(iter_type __in, iter_type __end, ios_base& __io,

ios_base::iostate& __err, bool& __v) const

{ return this->do_get(__in, __end, __io, __err, __v); }

查找do_get方法

(7): Vim /usr/include/c++/3.4.4/ bits/locale_facets.tcc

發現

// _GLIBCXX_RESOLVE_LIB_DEFECTS
// 17. Bad bool parsing
template<typename _CharT, typename _InIter>
_InIter
num_get<_CharT, _InIter>::
do_get(iter_type __beg, iter_type __end, ios_base& __io,
ios_base::iostate& __err, bool& __v) const
{
if (!(__io.flags() & ios_base::boolalpha))
{
// Parse bool values as long.
// NB: We can't just call do_get(long) here, as it might
// refer to a derived class.
long __l = -1;
__beg = _M_extract_int(__beg, __end, __io, __err, __l);
if (__l == 0 || __l == 1)
__v = __l;
Else
...

查找_M_extract_int 方法

終於找到

template<typename _CharT, typename _InIter>
template<typename _ValueT>
_InIter
num_get<_CharT, _InIter>::
_M_extract_int(_InIter __beg, _InIter __end, ios_base& __io,
ios_base::iostate& __err, _ValueT& __v) const
{
typedef char_traits<_CharT> __traits_type;
typedef typename numpunct<_CharT>::__cache_type __cache_type;
__use_cache<__cache_type> __uc;
const locale& __loc = __io._M_getloc();
const __cache_type* __lc = __uc(__loc);
const _CharT* __lit = __lc->_M_atoms_in;
....

分析_M_extract_int的關鍵代碼，

如下

…
int __base = __oct ? 8 : (__basefield == ios_base::hex ? 16 : 10);
…
const _ValueT __new_result = __result * __base
- __digit;
__overflow |= __new_result > __result;
__result = __new_result;
++__sep_pos;
__found_num = true;
…

根據以上代碼C++中的流轉換，沒有使用什麼特別的技巧，在由字符串轉為數字時，使用的也是查找字符*10(8,16)的方法，隻是這個過程中多了很多步我們想不到的安全驗證。

總算搞明白了，sgi真不是給人看得，你也可以了解float類型是怎麼實現的，參考_M_extract_float函數。

最後更新：2017-04-02 00:06:39

C++流實現內幕---由boost::lexical_cast引發的一個問題

上一篇： [原創]再談 unlocker 編程”探險”及工作原理

下一篇： C#多線程學習(五) 多線程的自動管理(定時器)

相關內容

熱門內容

最新內容