閱讀510 返回首頁    go 阿裏雲 go 技術社區[雲棲]


Intel MIC架構下COI框架介紹

        開始介紹之前先寫一下曆史背景,為了最大限度地提高計算速度,單一地提高一個核的主頻以提高計算速率的方法已經不再適用。所以向量機、超標量計算機等紛紛出現,並行計算也再度成為了一個熱門的方向。現有的並行計算架構主要有兩個:GPGPU(通用GPU)以及Intel的MIC(Many Integrated Core)架構。通用GPU加速主要是利用GPU本身具有多線程的特性,將計算密集型任務遷移到GPU上並將計算任務劃分到多個線程內同時進行計算,再將計算結果傳回已達到提高計算速率的效果。主要的用到的技術就是CUDA。但是CUDA有一個很大的缺點就是:要編寫CUDA的代碼工作量很大,通常情況下需要將原有的在CPU下可以運行的代碼做大量的改動。而這一點正是Intel MIC架構的優勢,通常情況下可以在CPU上直接運行的代碼也可以在MIC卡上直接運行。如果想了解更多MIC的相關知識可以持續關注我的博客,我將在接下來對MIC進行更詳細的介紹。

        MIC編程主要有native模式、對稱模式以及offload模式。native模式就是程序隻在MIC卡上運行,並沒有將CPU的運行效能全部運行起來;對稱模式就是CPU和MIC卡運行的程序完全相同,可以看成是對等的節點;而offload模式則是以CPU上運行程序為主,但是將一些計算密集型的任務offload到MIC上進行運算。從代碼的角度來看,在MIC上運行的並不是一套完整的代碼,而隻是一些代碼片段。但是使用過offload模式進行大規模數值模式程序編寫的人應該都有體會,offload的方式有這麼兩個缺點:1、數據傳輸會是一個瓶頸,很多時候不是運算不夠快而是數據的傳輸跟不上;2、offload模式通常隻適用於一些較為扁平的數據結構的操作。博主手頭的程序都是大量使用了麵向對象特性的程序,offload模式使用起來很麻煩,對於程序的結構也不能很好的維護。而針對這些問題,本著提高傳輸效率以及增強編碼人員自由度的目的,Intel推出了COI(Coprocessor Offload Infrastructure)。

  • COI基本概念

      COI是MIC架構下的分載模式的一個庫,mic雖然提供了通過簡單的編譯製導語句(#pragma offload target)的方式來將部fen代碼和數據分載到Xeon Phi上進行計算。但這樣的方式太過簡單,且自由度較小。但如果使用COI,用戶可以獲取更大的自由度,包括控製CPU和MIC的同步,控製MIC卡上程序的創建和退出;起止端的異步操作;起止端的數據緩衝。為開發更加靈活的MIC程序提供了便利。

      利用COI編寫的程序在編譯的時候就會生成兩個可執行文件。一個在CPU上執行,一個在MIC上執行。啟動程序隻需要調用CPU端的可執行文件即可。係統就會在需要的時候把MIC端可執行文件和所需要的庫發送到MIC端並執行MIC端的程序。接下來開始詳細介紹COI。

           術語說明:因為COI可以實現CPU到MIC的分載,同時也可以實現MIC到CPU的分載,所以在下文中使用source和sink來表示起止端。

  • 基本概念

  • Enumeration:COIEngine,COISysinfo

列舉出硬件的信息如MIC卡的數量,線程數,核數,cache等等。示例代碼如下:

<span >COIENGINE               engine;
      COIFUNCTION             func[1];

      const char*             SINK_NAME = "coi_simple_sink_mic";
  
      // Make sure there is an Intel(r) Xeon Phi(tm) device available
      //                                                                                                                                                                                                                                  
      CHECK_RESULT(
      COIEngineGetCount(COI_ISA_MIC, &num_engines));
  
      printf("%u engines available\n", num_engines);
  
      // If there isn't at least one engine, there is something wrong
      //
      if (num_engines < 1)
      {
          printf("ERROR: Need at least 1 engine\n");
          return -1;
      }</span>

  • Process Management:COIProcess

      COIProcess是指source端在sink端創建的一個進程在source端創建的一個進程在source端的一個句柄。source端負責該進程的創建,開啟以及銷毀等工作。其主要的作用有:

  • 抽象sink端的進程的運行
  • 提供開啟和停止遠程進程的各種API已經load動態鏈接庫
  • 提供在遠端查詢函數並執行函數的功能

 具體示例代碼如下:

   COIRESULT               result = COI_ERROR;
    COIPROCESS              proc;
    COIENGINE               engine;     
 result = COIProcessCreateFromFile(
          engine,         // The engine to create the process on.
          SINK_NAME,      // The local path to the sink side binary to launch.
          0, NULL,        // argc and argv for the sink process.
          false, NULL,    // Environment variables to set for the sink process.
          true, NULL,     // Enable the proxy but don't specify a proxy root path.
          0,              // The amount of memory to pre-allocate
                          // and register for use with COIBUFFERs.
          NULL,           // Path to search for dependencies
          &proc           // The resulting process handle.
      );
      if (result != COI_SUCCESS)
      {
          printf("COIProcessCreateFromFile result %s\n",
                 COIResultGetName(result));
          return -1;
      }
  
      printf("Sink process created, press enter to destroy it.\n");
      getchar();
  
      // Destroy the process
      //
      result = COIProcessDestroy(
          proc,           // Process handle to be destroyed
          -1,             // Wait indefinitely until main() (on sink side) returns
          false,          // Don't force to exit. Let it finish executing
                          // functions enqueued and exit gracefully
          &sink_return,   // Don't care about the exit result.
         &exit_reason
     );
 
     if (result != COI_SUCCESS)
     {
         printf("COIProcessDestroy result %s\n", COIResultGetName(result));
         return -1;
     }

  • 執行流:COIPipeline

         COIPipeline類似於RPC(Remote Procedure Call)的機製,可以講一係列指令序列插入到COIPipeline中,這些指令可以順序地在sink端執行,這裏的指令序列主要是要在遠程調用的函數序列。其主要有以下幾個重要的性質:

  • 在COIPipeline中插入的函數會在sink端按序執行。
  • COIPipeline就是一種遠程調用的機製。因為可以在遠端調用完整函數,所以有了比單一offload更多的自由度。
  • COIPipeline在插入函數時除了插入該函數需要的參數外,還可以傳遞一塊buffer
  • 從source到sink端的數據傳輸使用的是SCIF

        示例代碼如下:(source端)

CHECK_RESULT(
    COIProcessCreateFromFile(
        engine,             // The engine to create the process on.
        SINK_NAME,          // The local path to the sink side binary to launch.
        0, NULL,            // argc and argv for the sink process.
        false, NULL,        // Environment variables to set for the sink
                            // process.
        true, NULL,         // Enable the proxy but don't specify a proxy root
                            // path.
        0,                  // The amount of memory to pre-allocate
                            // and register for use with COIBUFFERs.
        NULL,               // Path to search for dependencies
        &proc               // The resulting process handle.
    )); 
    printf("Created sink process %s\n", SINK_NAME);

    // Pipeline:
    // After a sink side process is created, multiple pipelines can be created
    // to that process. Pipelines are queues where functions(represented by
    // COIFUNCTION) to be executed on sink side can be enqueued.

    // The following call creates a pipeline associated with process created
    // earlier.
    CHECK_RESULT(
    COIPipelineCreate(
        proc,           // Process to associate the pipeline with
        NULL,           // Do not set any sink thread affinity for the pipeline
        0,              // Use the default stack size for the pipeline thread
        &pipeline       // Handle to the new pipeline
    ));
    printf("Created pipeline\n");


    // Retrieve handle to function belonging to sink side process

    const char* func_name = "Foo";
    CHECK_RESULT(
    COIProcessGetFunctionHandles(
        proc,       // Process to query for the function
        1,          // The number of functions to look up
        &func_name, // The name of the function to look up
        func        // A handle to the function
    ));
    printf("Got handle to sink function %s\n", func_name);

    const char *misc_data = "Hello COI";
    int strlength =  (int)strlen(misc_data) + 1;

    // Enough to hold the return value

    char* return_value = (char*) malloc(strlength);
    if (return_value == NULL) {
        fprintf(stderr, "failed to allocate return value\n");
        return -1;
    }


    // Enqueue the function for execution
    // Pass in misc_data and return value pointer to run function
    // Get an event to wait on until the run function completion
    CHECK_RESULT(
    COIPipelineRunFunction(
        pipeline, func[0],         // Pipeline handle and function handle
        0, NULL, NULL,             // Buffers and access flags
        0, NULL,                   // Input dependencies
        misc_data,   strlength,    // Misc Data to pass to the function
        return_value, strlength,   // Return values that will be passed back
        &completion_event          // Event to signal when it completes
    ));
    printf("Called sink function %s(\"%s\" [%d bytes])\n",
                                              func_name, misc_data, strlength);

    // Now wait indefinitely for the function to complete
    CHECK_RESULT(
    COIEventWait(
        1,                         // Number of events to wait for
        &completion_event,         // Event handles
        -1,                        // Wait indefinitely
        true,                      // Wait for all events
        NULL, NULL                 // Number of events signaled
                                   // and their indices
    ));
    printf("Function returned \"%s\"\n", return_value);

            sink端:

// main is automatically called whenever the source creates a process.
// However, once main exits, the process that was created exits.
int main(int argc, char** argv)
{
    UNUSED_ATTR COIRESULT result;
    UNREFERENCED_PARAM (argc);
    UNREFERENCED_PARAM (argv);

    // Functions enqueued on the sink side will not start executing until
    // you call COIPipelineStartExecutingRunFunctions(). This call is to
    // synchronize any initialization required on the sink side

    result = COIPipelineStartExecutingRunFunctions();

    assert(result == COI_SUCCESS);

    // This call will wait until COIProcessDestroy() gets called on the source
    // side. If COIProcessDestroy is called without force flag set, this call
    // will make sure all the functions enqueued are executed and does all
    // clean up required to exit gracefully.

    COIProcessWaitForShutdown();

    return 0;
}

// Prototype of run function that can be retrieved on the source side.
// Copies misc data to return pointer.
COINATIVELIBEXPORT
void Foo (uint32_t         in_BufferCount,
          void**           in_ppBufferPointers,
          uint64_t*        in_pBufferLengths,
          void*            in_pMiscData,
          uint16_t         in_MiscDataLength,
          void*            in_pReturnValue,
          uint16_t         in_ReturnValueLength)
{

    UNREFERENCED_PARAM(in_BufferCount);
    UNREFERENCED_PARAM(in_ppBufferPointers);
    UNREFERENCED_PARAM(in_pBufferLengths);
    UNREFERENCED_PARAM(in_pMiscData);
    UNREFERENCED_PARAM(in_MiscDataLength);

    assert (in_MiscDataLength>=in_ReturnValueLength);
    if(in_pMiscData!=NULL && in_pReturnValue!=NULL)
    {   
        memcpy(in_pReturnValue, in_pMiscData, in_ReturnValueLength);
    }
}

         source端的COIEventWait暫時可以不關心,接下來會詳細介紹。通過上述代碼可以發現,在source端,將對應的函數插入COIPipeline隊列之後,該函數將會在sink端執行。但前提是該函數必須在sink端的代碼中被申明。不過被插入到在COIPipeline中的函數並不是會立即在sink端執行。首先是要在sink端調用COIPipelineStartExecutingRunFunctions()之後才能被執行。另外如果被插入的執行函數有相關的buffer,那麼函數也必須在buffer可用的時候才能執行,此外通過COIEvent也可以來控製函數的執行。更多的內容會在後麵講到。

  • COIBuffer

    • COIBuffer用於管理在遠程設備上的數據。

    • Buffer可以通過傳入執行函數給遠程端點,也可以直接使用讀寫API。

    • COI的runtime(運行時環境)來管理buffer

    • Buffer用於在source和sink端的數據傳輸,buffer可以使source和sink端的讀寫異步,隱藏掉通信延遲。
    • Buffer可能是位於device也可能是位於host端的物理內存中。
    • Buffer實際是利用SCIF的內存窗口實現的
    • 數據的傳輸實際利用的是readfrom/writeto  API
    • map操作可以訪問到buffer對應的區域,而不需要將其數據移到host上

COIBuffer的操作比較複雜,更多的細節會在後續的博客中涉及到。

  • COIEvent

        COIEvent可以用來創建依賴關係,從而使source和sink端進行同步,可以通過創建事件然後等待事件來達到同步,有點類似於MPI中的MPI_Barrier函數。一個函數的執行可以有一個先導COIEvent,隻有當該event被消費之後,位於COIPipeline中的對應函數才能被執行(該函數此時必須位於COIPipeline隊列的首位)。同時,當該函數被執行結束之後,也可以產生一個事件,遠端的程序可以通過等待對應的時候來進行同步操作。具體看示例代碼:

        source端:

// This tutorial demonstrates:
// 1. Registering a User event
// 2. Pass the event to the sink side                                                                                                                                                                                                       
// 3. Signaling the event from the sink side and using it
//    to synchronize on the source side.

// It first enqueues a run function with a registered user event (which
// is not signaled) as input dependency. Then a second function is enqueued
// on a different pipeline that signals the user event.

// User events are one shot events. Once they are signaled they
// can't be signaled again. You have to register them again to enable
// signaling.
</span><span ><span >...


    //Create two pipelines

   CHECK_RESULT(
   COIPipelineCreate(
        proc,            // Process to associate the pipeline with
        NULL,            // Do not set any sink thread affinity for the pipeline
        0,               // Use the default stack size for the pipeline thread
        &pipeline[0]     // Handle to the new pipeline
    ));

   CHECK_RESULT(
   COIPipelineCreate(
        proc,            // Process to associate the pipeline with
        NULL,            // Do not set any sink thread affinity for the pipeline
        0,               // Use the default stack size for the pipeline thread
        &pipeline[1]     // Handle to the new pipeline
    ));
    printf("Created sink process %s and two pipelines\n", SINK_NAME);

    // Retrieve handle to functions belonging to sink side process

    const char* names[] = {"Return2","SignalUserEvent"};

    CHECK_RESULT(
    COIProcessGetFunctionHandles(
        proc,        // Process to query for the function
        2,           // The number of functions to query
        names,       // The name of the function
        func         // A handle to the function
    ));
    printf("Got handles to functions %s and %s\n", names[0], names[1]);


    uint64_t return_value = 0;


    COIEVENT  user_event;

    // Register this event so that it can be signaled
    CHECK_RESULT(
    COIEventRegisterUserEvent(&user_event));
    printf("Registered user event\n");


    // Now pass this registered user event as an input dependency to the run
    // function. This run function will not be started until the user event
    // is signaled.
    CHECK_RESULT(
    COIPipelineRunFunction(
        pipeline[0], func[0],                  // Pipeline handle and function
                                               // handle
        0, NULL, NULL,                         // Buffers and access flags to
                                               // pass to the function
        1, &user_event,                        // Input dependencies
        NULL, 0,                               // Misc data to pass to
                                               // the function
        &return_value, sizeof(return_value),   // Return value passed back
                                               // from the function
        &completion_event                      // Event to signal when
                                               // the function is complete
    ));
    printf("Enqueued sink function %s depending on user event\n", names[0]);

    // Sleep for 2 sec which is enough for run function to be started on sink
    // side
#ifndef _WIN32
    sleep(2);
#else
    Sleep(2000);
#endif
    // Now try waiting for the completion_event. It should return
    // COI_TIME_OUT_REACHED (as the event isn't signaled)
    if(COIEventWait(1, &completion_event, 0, true, NULL, NULL) !=
        COI_TIME_OUT_REACHED)
    {
        printf("Error: Did not execute as expected\n");
        return -1;
    }
    printf("As expected, event wait timed out\n");

    // User event handles can be passed down to run function as misc
    // data (or via buffers) and on sink side can be type-casted back to
    // COIEVENT object to signal them.
    CHECK_RESULT(
    COIPipelineRunFunction(
        pipeline[1], func[1],                   // Pipeline handle and function
                                                // handle
        0, NULL, NULL,                          // Buffers and access flags to
                                                // pass to the function
        0, NULL,                                // Input dependencies
        &user_event, sizeof(user_event),        // Misc data to pass to
                                                // the function
        NULL,0,                                 // Return value passed back
                                                // from the function
        NULL                                    // Event to signal when
                                                // the function is complete
    ));
    printf("Enqueued sink function %s passing user event as misc arg\n",
                                                                    names[1]);

    // Wait until the user event is signaled
    CHECK_RESULT(
    COIEventWait(
        1,                      // Number of events to wait for
        &user_event,          // Event handles
        -1,                     // Wait indefinitely
        true,                   // Wait for all events
        NULL, NULL              // Number of events signaled
                                // and their indices
    ));
    printf("Successfully waited for user event (signaled sink side)\n");

    // Once a user event is signaled the first run function will be able to
    // proceed. Wait until the function finishes (-1 wait indefinite)
    CHECK_RESULT(
    COIEventWait(
        1,                      // Number of events to wait for
        &completion_event,    // Event handles
        -1,                     // Wait indefinitely
        true,                   // Wait for all events
        NULL, NULL              // Number of events signaled
                                // and their indices
    ));
    printf("Sink function %s completed since user event signaled\n", names[0]);


// Unregister the event to cleanup
    CHECK_RESULT(
    COIEventUnregisterUserEvent(user_event));

MIC端:
// main is automatically called whenever the source creates a process.
// However, once main exits, the process that was created exits.
int main(int argc, char** argv)
{
    UNUSED_ATTR COIRESULT result;
    UNREFERENCED_PARAM (argc);
    UNREFERENCED_PARAM (argv);

    // Functions enqueued on the sink side will not start executing until
    // you call COIPipelineStartExecutingRunFunctions()
    // This call is to synchronize any initialization required on the sink side

    result = COIPipelineStartExecutingRunFunctions();

    assert(result == COI_SUCCESS);

    // This call will wait until COIProcessDestroy() gets called on the source
    // side. If COIProcessDestroy is called without force flag set, this call
    // will make sure all the functions enqueued are executed and does all
    // clean up required to exit gracefully.
    COIProcessWaitForShutdown();

    return 0;
}

// Prototype of run functions that can be retrieved on the sink side

// This Function just returns 2
COINATIVELIBEXPORT
void Return2(uint32_t         in_BufferCount,
             void**           in_ppBufferPointers,
             uint64_t*        in_pBufferLengths,
             void*            in_pMiscData,
             uint16_t         in_MiscDataLength,
             void*            in_pReturnValue,
             uint16_t         in_ReturnValueLength)
{
    UNREFERENCED_PARAM(in_BufferCount);
    UNREFERENCED_PARAM(in_ppBufferPointers);
    UNREFERENCED_PARAM(in_pBufferLengths);
    UNREFERENCED_PARAM(in_pMiscData);
    UNREFERENCED_PARAM(in_MiscDataLength);

    if (sizeof(uint64_t) <= in_ReturnValueLength)
    {   
        *(uint64_t*)(in_pReturnValue) = 2;
    }   
}

//Assumes a user_event is passed as Misc_data and signals it
COINATIVELIBEXPORT
void SignalUserEvent(uint32_t         in_BufferCount,
                       void**           in_ppBufferPointers,
                       uint64_t*        in_pBufferLengths,
                       void*            in_pMiscData,
                       uint16_t         in_MiscDataLength,
                       void*            in_pReturnValue,
                       uint16_t         in_ReturnValueLength)
{
    UNREFERENCED_PARAM(in_BufferCount);
    UNREFERENCED_PARAM(in_ppBufferPointers);
    UNREFERENCED_PARAM(in_pBufferLengths);
    UNREFERENCED_PARAM(in_MiscDataLength);
    UNREFERENCED_PARAM(in_pReturnValue);
    UNREFERENCED_PARAM(in_ReturnValueLength);

    COIEVENT user_event;

    assert(in_pMiscData != NULL);
    assert(in_MiscDataLength >= sizeof(user_event));

    memcpy(&user_event, in_pMiscData, sizeof(user_event));

    COIEventSignalUserEvent(user_event);
}

      可以看到,雖然在source端的代碼中func[0]被先於func1[1]插入COIPipeline中,但是func[0]存在一個輸入依賴——user_event,而user_event被當作func[1]的參數傳輸到sink端,而隻有當user_event在sink端被消費(singnaled)之後,func[0]方可以執行。





最後更新:2017-04-03 05:40:13

  上一篇:go Swift使用閉包表達式
  下一篇:go android 尺寸適配相關