閱讀303 返回首頁    go 阿裏雲 go 技術社區[雲棲]


PostgreSQL 多元線性回歸 - 1 MADlib的安裝

MADlib 是伯克利大學的一個開源軟件項目. 主要目的是擴展數據庫的分析能力. 支持PostgreSQL和Greenplum數據庫. 
可以非常方便的加載到PostgreSQL或Greenplum, 擴展數據庫的分析功能. 當然這和PostgreSQL本身支持模塊化加載是分布開的. 
在數據庫中呈現給用戶的是一堆分析函數. 1.0包含71個聚合函數和786個普通函數.
https://db.cs.berkeley.edu/w/source-code/
An open source machine learning library on RDBMS for Big Data age

MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.

The MADlib mission is to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. The library consists of various analytics methods including linear regression, logistic regression, k-means clustering, decision tree, support vector machine and more. That's not all; there is also super-efficient user-defined data type for sparse vector with a number of arithmetic methods. It can be loaded and run in PostgreSQL 8.4 to 9.1 as well as Greenplum 4.0 to 4.2. This talk covers its concept overall with some introductions to the problems we are tackling and the solutions for them. It will also contain some topics around parallel data processing which is very hot in both of research and commercial area these days.
MADLib需要用到Python 2.6或者更高版本, 同時需要PL/Python 2.6或者更高版本.
如果數據庫安裝時是低版本的python, 那麼需要在安裝好高版本的python後重新編譯一下.
安裝python 2.7.5 , 需要用到動態庫, 所以在安裝python是需要使用--enable-shared選項.
tar -jxvf Python-2.7.5.tar.bz2
cd Python-2.7.5
./configure --enable-shared
make
make install
如果報以下錯誤, 需要將lib庫加入到係統環境中, 
[root@db-192-168-100-216 ~]# python -V
python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory

[root@db-192-168-100-216 ~]# ldconfig -p|grep -i python
        libpython2.4.so.1.0 (libc6,x86-64) => /usr/lib64/libpython2.4.so.1.0
        libpython2.4.so (libc6,x86-64) => /usr/lib64/libpython2.4.so
        libboost_python.so.2 (libc6,x86-64) => /usr/lib64/libboost_python.so.2
        libboost_python.so.2 (libc6) => /usr/lib/libboost_python.so.2
        libboost_python.so (libc6,x86-64) => /usr/lib64/libboost_python.so
        libboost_python.so (libc6) => /usr/lib/libboost_python.so
加入係統環境  : 
[root@db-192-168-100-216 ~]# vi /etc/ld.so.conf.d/python2.7.conf
/usr/local/lib
[root@db-192-168-100-216 ~]# ldconfig 
[root@db-192-168-100-216 ~]# ldconfig -p|grep -i python
        libpython2.7.so.1.0 (libc6,x86-64) => /usr/local/lib/libpython2.7.so.1.0
        libpython2.7.so (libc6,x86-64) => /usr/local/lib/libpython2.7.so
        libpython2.4.so.1.0 (libc6,x86-64) => /usr/lib64/libpython2.4.so.1.0
        libpython2.4.so (libc6,x86-64) => /usr/lib64/libpython2.4.so
        libboost_python.so.2 (libc6,x86-64) => /usr/lib64/libboost_python.so.2
        libboost_python.so.2 (libc6) => /usr/lib/libboost_python.so.2
        libboost_python.so (libc6,x86-64) => /usr/lib64/libboost_python.so
        libboost_python.so (libc6) => /usr/lib/libboost_python.so
現在正常了 : 
[root@db-192-168-100-216 ~]# python -V
Python 2.7.5
安裝完python2.7.5後編譯PostgreSQL  : 
tar -jxvf postgresql-9.2.4.tar.bz2
cd postgresql-9.2.4
./configure --prefix=/home/pg92/pgsql9.2.4 --with-pgport=2921 --with-perl --with-tcl --with-python --with-openssl --with-pam --without-ldap --with-libxml --with-libxslt --enable-thread-safety --with-wal-blocksize=16 && gmake world && gmake install-world
初始化, 啟動數據庫 : 
[root@db-192-168-100-216 ~]# su - pg92
pg92@db-192-168-100-216-> initdb -D $PGDATA -E UTF8 --locale=C -W -U postgres
pg_ctl start
psql
create database digoal;
安裝madlib 1.0 : 
wget https://www.madlib.net/files/madlib-1.0-Linux.rpm
rpm -ivh madlib-1.0-Linux.rpm
安裝完後的目錄在/usr/local/madlib
rpm -ql madlib
/usr/local/madlib/.....
將madlib安裝到數據庫中 : 
確保psql以及python在路徑中.
pg92@db-192-168-100-216-> which psql
~/pgsql/bin/psql
pg92@db-192-168-100-216-> which python
/usr/local/bin/python
pg92@db-192-168-100-216-> python -V
Python 2.7.5
pg92@db-192-168-100-216-> /usr/local/madlib/bin/madpack -p postgres -c postgres@127.0.0.1:2921/digoal install
檢查安裝是否正確.
pg92@db-192-168-100-216-> /usr/local/madlib/bin/madpack -p postgres -c postgres@127.0.0.1:2921/digoal install-check
madlib安裝在一個名為madlib的schema中.
pg92@db-192-168-100-216-> psql
psql (9.2.4)
Type "help" for help.
digoal=# \dn
  List of schemas
  Name  |  Owner   
--------+----------
 madlib | postgres
 public | postgres
(2 rows)
新增表和多個函數 : 
digoal=# set search_path="$user",madlib,public;
SET
digoal=# \dt
              List of relations
 Schema |       Name       | Type  |  Owner   
--------+------------------+-------+----------
 madlib | migrationhistory | table | postgres
 madlib | training_info    | table | postgres
(2 rows)
digoal=# select * from migrationhistory;
 id | version |          applied           
----+---------+----------------------------
  1 | 1.0     | 2013-07-31 15:05:50.900619
(1 row)

digoal=# select * from training_info ;
 classifier_name | result_table_oid | training_table_oid | training_metatable_oid | training_encoded_table_oid | validation_table_oi
d | how2handle_missing_value | split_criterion | sampling_percentage | num_feature_chosen | num_trees 
-----------------+------------------+--------------------+------------------------+----------------------------+--------------------
--+--------------------------+-----------------+---------------------+--------------------+-----------
(0 rows)

最後更新:2017-04-01 13:38:49

  上一篇:go 產品工作速查手冊
  下一篇:go PostgreSQL "物聯網"應用 - 1 實時流式數據處理案例(萬億每天)