閱讀264 返回首頁    go 阿裏雲 go 技術社區[雲棲]


ceph monitor功能的兼容性管理

軟件需要兼容舊版本

ceph是在一個不斷進化的軟件,會增加功能也會取消舊的功能,如何在ceph monitor的不同版本間保持兼容或者防止不兼容而產生錯誤,是需要認真思考的問題。而我們閱讀代碼時往往忽略了兼容性管理的代碼,而重點關心它的業務代碼,所以這裏特別寫一篇文章談談兼容性的管理。

ceph monitor兼容主要包括兩方麵:

  • 通訊時檢查對方的兼容性
  • 代碼訪問本機數據時的兼容性

Messenger的創建和功能位features的設置

ceph使用原生的Messenger的方式來通訊,在通訊開始前需要創建一個Messenger, 例如ceph monitor創建monitor之間通訊的messenger:

Messenger *msgr = Messenger::create(g_ceph_context, g_conf->ms_type,
                       entity_name_t::MON(rank),
                       "mon",
                       0);

ceph在建立連接時會告訴對方自己是什麼類型的節點,主要有幾種類型: monitor, osd, mds, client,上麵的代碼就說明自己是一個Monitor,
而任何一個Messenger都會準對某一種節點設置一個Policy, Policy的內容與兼容有關的是:

 /// Specify features supported locally by the endpoint.
uint64_t features_supported;
/// Specify features any remotes must have to talk to this endpoint.
int64_t features_required;

其中features_support表示本節點支持的功能,features_required表示對方必須具備的功能,每個功能一個bit位。缺省的Policy把featuires_supported設置成當前代碼支持的全部功能。即:CEPH_FEATURES_ALL。

而ceph為monitor之間設置的Policy:

msgr->set_policy(entity_name_t::TYPE_MON,
                    Messenger::Policy::lossless_peer_reuse(                      
                        supported,                                               
                        CEPH_FEATURE_UID |                                       
                        CEPH_FEATURE_MON_SINGLE_PAXOS)); 

初始設置的features_required僅僅包含了CEPH_FEATUIRE_UID和CEPH_FEATURE_MON_SINGLE_PAXOS,而features_supported則是全部功能, monitor之間的兼容性是在隨後的通訊過程中逐漸被檢測的。

通訊時的兼容性檢查

在連接建立時就檢查對方的功能位設置,在ceph messenger通訊協議中,雙方提供給對方支持的功能集,並且與本地Policy中設置的必需的功能位進行比較,例如:

ceph的simple messenger中,connect發起方會提供supported features:

    while (1) {
         delete authorizer;                                                          
         authorizer = msgr->get_authorizer(peer_type, false);                        
         bufferlist authorizer_reply;                                                

         ceph_msg_connect connect;                                                   
         connect.features = policy.features_supported;   

當接收到應答後,又會驗證對方支持的功能:

     if (reply.tag == CEPH_MSGR_TAG_READY ||
        reply.tag == CEPH_MSGR_TAG_SEQ) {                                       
        uint64_t feat_missing = policy.features_required & ~(uint64_t)reply.features;
        if (feat_missing) {                                                       
             ldout(msgr->cct,1) << "missing required features " << std::hex << feat_missing << std::dec << dendl;
             goto fail_locked;                                                           
       } 

查看reply的feature與本地必需的功能,如果缺少就會失敗。

一些內部兼容性的表示方法

ceph使用一個CompatSet的數據結構來表示功能集合:

    struct CompatSet {
        struct Feature {
            uint64_t id;
            string name;

            Feature(uint64_t _id, const char *_name) : id(_id), name(_name) {}
            Feature(uint64_t _id, const string& _name) : id(_id), name(_name) {}
        };

        struct FeatureSet {
            uint64_t mask;                                                              
            map <uint64_t,string> names;  
        };

        FeatureSet compat;
        FeatureSet ro_compat; 
        FeatureSet incompat;
    };

mask中的每一位代表代表一個功能, 兼容測試主要判斷是否可讀可寫。
測試是否可讀是通過readable成員函數來實現:

bool readable(CompatSet const& other) const {
    return !((other.incompat.mask ^ incompat.mask) & other.incompat.mask);
}

這個函數的意思是如果我的incompat不能全部包含對方的位域,我就無法讀取對方數據。

測試是否可寫是用writable成員函數來實現:

bool writeable(CompatSet const& other) const {
   return readable(other) &&
        !((other.ro_compat.mask ^ ro_compat.mask) & other.ro_compat.mask);        
}

這個函數的意思就是:除了readable,我在ro_compat全部的位域包含了對方的位域才能寫數據。

ceph monitor的內部兼容的保護

  • 每個Monitor被創建時,都需要初始化本地數據,其中mkfs函數是被調用的重要一環。首先Monitor::mkfs會在本地寫入一個兼容性集合,用以記錄用當前代碼生成這些數據庫的時候,具備什麼功能:

    int Monitor::mkfs()
    {
    MonitorDBStore::TransactionRef t(new MonitorDBStore::Transaction);
    
    // verify cluster fsid
    int r = check_fsid();
    if (r < 0 && r != -ENOENT)
     return r;
    
    bufferlist magicbl;
    magicbl.append(CEPH_MON_ONDISK_MAGIC);
    magicbl.append("\n");
    t->put(MONITOR_NAME, "magic", magicbl);
    
    features = get_initial_supported_features();    <<<<==============
    write_features(t); 
    
  • 當Monitor啟動時,代碼檢查本地文件係統上的數據結構是否兼容當前代碼,注意因為本地文件不被修改的情況下,
    ceph程序依然可以被升級或者用其他方法替換,所以程序啟動檢查本地數據是否兼容是必須的, ceph_mon.cc會調用check_features(),它檢查本地數據文件格式是否和當前代碼兼容:

int Monitor::check_features(MonitorDBStore *store)                              
{                                                                               
   CompatSet required = get_supported_features();                                
   CompatSet ondisk;                                                             

    read_features_off_disk(store, &ondisk);                                       

    if (!required.writeable(ondisk)) {                                            
      CompatSet diff = required.unsupported(ondisk);                              
      generic_derr << "ERROR: on disk data includes unsupported features: " << diff << dendl;
      return -EPERM;                                                              
    }                                                                             

    return 0;                                                                     
}               

而get_supported_features()就是當前Monitor代碼能支持的所有功能,read_features_off_disk()則把write_features()的數據讀出來,我們看到它用writable()測試當前代碼是否有能力可以寫本地文件係統上的數據。

read_features_off_disk讀出write_feature()生成的數據:

void Monitor::read_features_off_disk(MonitorDBStore *store, CompatSet *features)
{                                                                               
  bufferlist featuresbl;                                                        
  store->get(MONITOR_NAME, COMPAT_SET_LOC, featuresbl);                         
  if (featuresbl.length() == 0) {                                               
    generic_dout(0) << "WARNING: mon fs missing feature list.\n"                
            << "Assuming it is old-style and introducing one." << dendl;        
    //we only want the baseline ~v.18 features assumed to be on disk.           
    //If new features are introduced this code needs to disappear or            
    //be made smarter.                                                          
    *features = get_legacy_features();                                          

    bufferlist bl;                                                              
    features->encode(bl);                                                       
    MonitorDBStore::TransactionRef t(new MonitorDBStore::Transaction);          
    t->put(MONITOR_NAME, COMPAT_SET_LOC, bl);                                   
    store->apply_transaction(t);                                                
  } else {                                                                      
    bufferlist::iterator it = featuresbl.begin();                               
    features->decode(it);                                                       
  }                                                                             
}                                                  

作為一種特殊情況,如果數據是舊版的ceph monitor生成的,因為舊版沒有寫features到本地文件係統,所以read_features_off_disk會調用get_legacy_features()函數得到舊版本ceph monitor的功能集,這隻是一個簡單的構造:

CompatSet Monitor::get_legacy_features()                                        
{                                                                               
   CompatSet::FeatureSet ceph_mon_feature_compat;                                
   CompatSet::FeatureSet ceph_mon_feature_ro_compat;                             
   CompatSet::FeatureSet ceph_mon_feature_incompat;                              
   ceph_mon_feature_incompat.insert(CEPH_MON_FEATURE_INCOMPAT_BASE);             
   return CompatSet(ceph_mon_feature_compat, ceph_mon_feature_ro_compat,         
            ceph_mon_feature_incompat);                                          
}  
  • Monitor運行時檢查和設置features

    一旦ceph_mon.cc決定運行Monitor, 首先會調用成員函數preinit(), 而preinit的一項工作就時調用read_features()把
    本地文件中記錄的feature讀入到成員變量*features*中:

void Monitor::read_features()
{
   read_features_off_disk(store, &features);
   dout(10) << "features " << features << dendl;

   apply_compatset_features_to_quorum_requirements();
   dout(10) << "required_features " << required_features << dendl;
 }

當然它不會忘記按照本地數據中保存的功能位,要求monitor paxos集群的所有法人必須有對應的功能位:

void Monitor::apply_compatset_features_to_quorum_requirements()                 
{                                                                               
  required_features = 0;                                                        
  if (features.incompat.contains(CEPH_MON_FEATURE_INCOMPAT_OSD_ERASURE_CODES)) {
    required_features |= CEPH_FEATURE_OSD_ERASURE_CODES;                        
  }                                                                             
  if (features.incompat.contains(CEPH_MON_FEATURE_INCOMPAT_OSDMAP_ENC)) {       
    required_features |= CEPH_FEATURE_OSDMAP_ENC;                               
  }                                                                             
  if (features.incompat.contains(CEPH_MON_FEATURE_INCOMPAT_ERASURE_CODE_PLUGINS_V2)) {
    required_features |= CEPH_FEATURE_ERASURE_CODE_PLUGINS_V2;                  
  }                                                                             
  dout(10) << __func__ << " required_features " << required_features << dendl;  
}         

required_features的設置,可以防止不兼容的Monitor構成一個paxos集群,有幾個地方通過required_features阻斷這些不兼容的monitor之間的通訊:

收到一個探測包,發現對方不能提供相關的功能位,則阻斷通訊:

void Monitor::handle_probe_probe(MMonProbe *m)                                  
{                                                                               
   MMonProbe *r;                                                                 

   dout(10) << "handle_probe_probe " << m->get_source_inst() << *m               
        << " features " << m->get_connection()->get_features() << dendl;         
   uint64_t missing = required_features & ~m->get_connection()->get_features();  
   if (missing) {                                                                
       dout(1) << " peer " << m->get_source_addr() << " missing features "         
            << missing << dendl;                                                    
       if (m->get_connection()->has_feature(CEPH_FEATURE_OSD_PRIMARY_AFFINITY)) {  
           MMonProbe *r = new MMonProbe(monmap->fsid, MMonProbe::OP_MISSING_FEATURES,
                    name, has_ever_joined);                                      
           m->required_features = required_features;                                 
           m->get_connection()->send_message(r);                                     
     }
     goto out;
}

獲取數據複製的cookie時的,發現對方不能提供相關的功能位,阻斷通訊:

void Monitor::handle_sync_get_cookie(MMonSync *m)                               
{                                                                               
   if (is_synchronizing()) {                                                     
     _sync_reply_no_cookie(m);                                                   
     return;                                                                     
   }                                                                             

   assert(g_conf->mon_sync_provider_kill_at != 1);                               

   // make sure they can understand us.                                          
   if ((required_features ^ m->get_connection()->get_features()) &               
       required_features) {              <<=======================
       dout(5) << " ignoring peer mon." << m->get_source().num()                   
          << " has features " << std::hex                                         
          << m->get_connection()->get_features()                                  
          << " but we require " << required_features << std::dec << dendl;        
     return;                                                                     
   }           
  • paxos集群形成時的功能集兼容

一個接收到要求投票的請求的選舉器,檢查功能位是否兼容:通過獲得當前Monitor對兼容性的要求和對方能支持的功能集的比較來決定是否還要繼續:

void Elector::handle_propose(MMonElection *m)                                   
{                                                                               
...     
   uint64_t required_features = mon->get_required_features();                    
   dout(10) << __func__ << " required features " << required_features            
            << ", peer features " << m->get_connection()->get_features()         
            << dendl;                                                            
   if ((required_features ^ m->get_connection()->get_features()) &               
       required_features) {                                                      
     dout(5) << " ignoring propose from mon" << from                             
         << " without required features" << dendl;                               
     nak_old_peer(m);                                                            
     return;      

一個選舉器在接收到選舉應答時,檢查功能位是否兼容:通過獲得當前Monitor對兼容性的要求和對方能支持的功能集的比較來決定是否還要繼續:

void Elector::handle_ack(MMonElection *m)                                       
{                                                                               
  ...
  uint64_t required_features = mon->get_required_features();                    
  if ((required_features ^ m->get_connection()->get_features()) &               
      required_features) {                                                      
    dout(5) << " ignoring ack from mon" << from                                 
        << " without required features" << dendl;                               
    m->put();                                                                   
    return;                                                                     
  }         

一個提出選舉的Monitor,會在選舉過程中收集與各個monitor連接時對方提供的features, 記錄在案,在獲勝後,會求出這些Monitor共同支持的功能集:

void Elector::victory()                                                         
{                                                                               
  leader_acked = -1;                                                            
  electing_me = false;                                                          

  uint64_t features = CEPH_FEATURES_ALL;                                        
  set<int> quorum;                                                              
  for (map<int, uint64_t>::iterator p = acked_me.begin(); p != acked_me.end();  
       ++p) {                                                                   
    quorum.insert(p->first);                                                    
    features &= p->second;                                                      
  }

  ...
  mon->win_election(epoch, quorum, features, cmds, cmdsize, &copy_classic_mons);                 

最後得到的*features*變量包含這些monitor共同支持的集合,同時把這個features傳給Monitor類記錄在案。

而Monitor::win_election把features保存在qurum_features後調用finish_election, finish_election調用
apply_quorum_to_compatset_features(), apply_quorum_to_compatset_features就是把paxos集群中的monitor的共同的功能集合保存在本地文件中,以備下次ceph mon啟動時通過read_features讀回來:

void Monitor::apply_quorum_to_compatset_features()                              
{                                                                               
   CompatSet new_features(features);                                             
   if (quorum_features & CEPH_FEATURE_OSD_ERASURE_CODES) {                       
    new_features.incompat.insert(CEPH_MON_FEATURE_INCOMPAT_OSD_ERASURE_CODES);  
   }                                                                             
   if (quorum_features & CEPH_FEATURE_OSDMAP_ENC) {                              
     new_features.incompat.insert(CEPH_MON_FEATURE_INCOMPAT_OSDMAP_ENC);         
   }                                                                             
   if (quorum_features & CEPH_FEATURE_ERASURE_CODE_PLUGINS_V2) {                 
     new_features.incompat.insert(CEPH_MON_FEATURE_INCOMPAT_ERASURE_CODE_PLUGINS_V2);
   }                                                                             
   if (new_features.compare(features) != 0) {                                    
     CompatSet diff = features.unsupported(new_features);                        
     dout(1) << __func__ << " enabling new quorum features: " << diff << dendl;  
     features = new_features;                                                    

     MonitorDBStore::TransactionRef t(new MonitorDBStore::Transaction);          
     write_features(t);     <<<<<<<<<<<<<<保存
     store->apply_transaction(t);                                                

    apply_compatset_features_to_quorum_requirements();                          
   }                                                                             
}                    

總結

ceph monitor在通訊初始化時,聲明需要最小的功能集,它繞開了Messenger中對required_features的過分依賴,而是在通訊建立後動態地檢查是否兼容。

最後更新:2017-06-08 16:31:45

  上一篇:go  cephx: ceph的認證和加密協議
  下一篇:go  借著這朵“雲” ofo已在全球4個國家同步運營