閱讀664 返回首頁    go 財經資訊


數據上傳到MaxCompute(原ODPS)__用戶指南_推薦引擎-阿裏雲

大數據計算服務MaxCompute(原ODPS)用於在推薦引擎中對大批量離線數據進行計算和存儲。開通ODPS和詳細操作請點擊大數據平台幫助指南,購買鏈接請點擊開通大數據計算服務(原ODPS)

① 在MaxCompute(原ODPS)中創建Project項目

您需要先創建MaxCompute的項目空間(Project),該項目空間用於在推薦引擎中進行離線數據計算,創建方法請參考創建項目空間

創建完項目空間後,請記錄下項目名稱、Access Key ID和SECRET,後續在配置推薦引擎時會使用。

如果想了解數據開發(DATA IDE)的更多功能,請點擊數據開發概述,您可以通過數據開發的控製台管理MaxCompute,進行數據ETL等操作。

② 將示例數據導入到MaxCompute的數據表中

A場景:數據在本地

本示例中,我們使用MaxCompute dship命令將本地數據導入到MaxCompute的數據表中。

上傳工具:dship (MaxCompute新版推薦使用 MaxCompute Tunnel命令,您也可以參考TUNNEL命令手冊使用TUNNEL來完成數據上傳)

  1. create table movielens_1m_movies (
  2. movie_id string,
  3. title string,
  4. genres string
  5. );
  6. dship u D:workDatasetMovieLensml-1mmovies.dat alidata_rp.movielens_1m_movies -h false -fd :: -rd n
  7. create table movielens_1m_users (
  8. user_id string,
  9. gender string,
  10. age bigint,
  11. occupation string,
  12. zipcode string
  13. );
  14. dship u D:workDatasetMovieLensml-1musers.dat alidata_rp.movielens_1m_users -h false -fd :: -rd n
  15. create table movielens_1m_ratings (
  16. user_id string,
  17. movie_id string,
  18. rate double,
  19. ts bigint
  20. );
  21. dship u D:workDatasetMovieLensml-1mratings.dat alidata_rp.movielens_1m_ratings -h false -fd :: -rd n
B場景:數據在RDS中
  • 先看RDS中的數據字段,然後在MaxCompute中創建一個一模一樣字段的表,然後在大數據開發DataIDE中創建一個同步任務,源數據表設置成RDS的表,目標表設置成MaxCompute的表。
  • 等數據同步到MaxCompute後,再在DataIDE中加工數據,加工成推薦要求的數據
其他場景

當您的數據保存在其他數據源時,您可以參考雲上數據集成方案

③ 在MaxCompute中創建推薦引擎需要的數據表

數據規範參照數據格式規範

  1. create table aliyun_re_demo_ml1m_user_meta (
  2. user_id string,
  3. tags string
  4. ) partitioned by (ds string);
  5. create table aliyun_re_demo_ml1m_user_meta_config (
  6. config_name string,
  7. config_value string
  8. ) partitioned by (ds string);
  9. create table aliyun_re_demo_ml1m_item_meta (
  10. item_id string,
  11. category string,
  12. keywords string,
  13. description string,
  14. properties string,
  15. bizinfo string
  16. ) partitioned by (ds string);
  17. create table aliyun_re_demo_ml1m_item_meta_config (
  18. config_name string,
  19. config_value string
  20. ) partitioned by (ds string);
  21. create table aliyun_re_demo_ml1m_user_behavior (
  22. user_id string,
  23. item_id string,
  24. bhv_type string,
  25. bhv_amt double,
  26. bhv_cnt double,
  27. bhv_datetime datetime,
  28. content string,
  29. media_type string,
  30. pos_type string,
  31. position string,
  32. env string,
  33. trace_id string
  34. ) partitioned by (ds string);
  35. create table aliyun_re_demo_ml1m_rec_item_info (
  36. item_id string,
  37. item_info string
  38. ) partitioned by (ds string);

④ 將示例數據導入到步驟③創建的數據表

注意:用戶表和物品表需要在DS分區中導入全量,行為表在DS分區中導入每天的增量即可。

  1. -------------------------------------------------------------
  2. insert overwrite table aliyun_re_demo_ml1m_user_meta partition (ds='recent')
  3. select
  4. user_id,
  5. concat('age03', age, '02gender03', gender, '02occupation03', occupation) as tags
  6. from alidata_rp.movielens_1m_users
  7. ;
  8. insert overwrite table aliyun_re_demo_ml1m_user_meta_config partition (ds='recent')
  9. select *
  10. from (
  11. select 'age', 'sv_enum' from dual
  12. union all
  13. select 'gender', 'sv_enum' from dual
  14. union all
  15. select 'occupation', 'sv_enum' from dual
  16. ) t
  17. ;
  18. -------------------------------------------------------------
  19. insert overwrite table aliyun_re_demo_ml1m_item_meta partition (ds='recent')
  20. select
  21. movie_id as item_id,
  22. t2.category as category,
  23. REGEXP_REPLACE(t1.genres, '\|', '02') as keywords,
  24. title as description,
  25. concat('genres03', REGEXP_REPLACE(t1.genres, '\|', '04')) as properties,
  26. null
  27. from alidata_rp.movielens_1m_movies t1
  28. join (
  29. select distinct genres, category
  30. from (
  31. select genres, DENSE_RANK() over(partition by 1 order by genres) as category
  32. from alidata_rp.movielens_1m_movies
  33. ) t
  34. ) t2
  35. on t1.genres = t2.genres
  36. ;
  37. insert overwrite table aliyun_re_demo_ml1m_item_meta_config partition (ds='recent')
  38. select 'genres', 'mv_enum' from dual
  39. ;
  40. -------------------------------------------------------------
  41. insert overwrite table aliyun_re_demo_ml1m_rec_item_info partition (ds='recent')
  42. select
  43. movie_id as item_id,
  44. concat('{"title":"', title, '","genres":"', genres, '"}') as item_info
  45. from alidata_rp.movielens_1m_movies
  46. ;
  47. -------------------------------------------------------------
  48. insert overwrite table aliyun_re_demo_ml1m_user_behavior partition (ds='recent')
  49. select
  50. user_id, movie_id as item_id,
  51. "grade" as bhv_type,
  52. rate as bhv_amt, 1.0 as bhv_cnt,
  53. FROM_UNIXTIME(ts) as bhv_datetime,
  54. null, null, null, null, null, null
  55. from alidata_rp.movielens_1m_ratings
  56. ;

最後更新:2016-11-23 16:04:08

  上一篇:go 步驟五:啟用推薦業務__快速入門_推薦引擎-阿裏雲
  下一篇:go 創建表格存儲實例(可選)__用戶指南_推薦引擎-阿裏雲