MariaDB的CONNECT存储引擎在arm64平台上的表现

作者: zhaorenhai

MariaDB的CONNECT存储引擎是一个让人感到兴奋的存储引擎。

之所以说让人感到兴奋,是因为这个存储引擎能做的事实在是太多了。除了不适合用于OLTP领域之外,几乎是上天入地,无所不能。它可以直接用SQL去查询外部文件,支持各种普通的文本文件,日志文件,只要告诉CONNECT固定的文件格式,就可以用SQL去查询,去分析。它还支持JSON,XML,INI这些常见的文件格式。除了支持本地的文件,还可以通过REST接口直接查询远程的数据。对于本地的文件,除了查询,还支持插入,删除等功能。它还可以实现Oracle里面的dblink功能,也就是可以远程查询另外一个数据库的表,甚至是不同类型的数据库,包括MySQL,PostgreSQL,Oracle等等,甚至可以是NoSQL类型的MongoDB。当然所谓的跨源查询,多库查询,更是不在话下。当然在此基础上,将外部文件导入到数据库就更简单了,而且比mysqlimport更强大,更灵活,因为可以实现各种各样的格式转换和过滤。将数据导出到外部文件也是一样。它还支持对压缩文件的查询(目前只有zip格式)。甚至对于二进制格式的文件查询和更新都支持。。这个引擎还支持类似于Oracle的dual的虚拟表功能。还可以支持单行转多行,行列转换,多表映射成一个表等各种各样的功能。

总之有了这个CONNECT存储引擎,可以说,一切都可以转化成SQL,SQL也可以转化成一切。

接下来我们就看看CONNECT存储引擎在arm64平台上的表现。

(注意本文不是CONNECT的详细使用指导,由于CONNECT涉及内容太广,具体每种用法的详细指导、各种参数、以及限制说明等等请参考官方指导https://mariadb.com/kb/en/connect/)

我们的测试平台使用的是华为云的鲲鹏虚拟机。OS采用Ubuntu18.04版本。

用如下命令查询

1
sudo apt search mariadb |grep connect

输出

1
mariadb-plugin-connect/bionic-updates,bionic-security 1:10.1.47-0ubuntu0.18.04.1 arm64

可以看出来当前MariaDB已经发布了arm64平台的CONNECT插件,说明这个插件的基本功能在arm64上是没问题的。

如果你自己编译了MariaDB的代码(具体的编译方法请参考这篇博客里的内容),然后去跑CONNECT的测试用例的话,也可以发现,除了一些测试用力由于环境原因被skip以外,大部分测试用例都是可以成功通过的,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
mariadb@host1:~/build-mariadb-server/mysql-test$ ./mysql-test-run --suite=connect
Logging: /home/mariadb/server/mysql-test/mysql-test-run.pl --suite=connect
vardir: /home/mariadb/build-mariadb-server/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/mariadb/build-mariadb-server/mysql-test/var'...
Checking supported features...
MariaDB Version 10.6.0-MariaDB

- SSL connections supported
Using suites: connect
Collecting tests...
Installing system database...

==============================================================================

TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
connect.grant2 [ disabled ] Until fixed
connect.jdbc [ disabled ] Variable settings depend on machine configuration
connect.jdbc_new [ disabled ] Variable settings depend on machine configuration
connect.jdbc_oracle [ disabled ] Variable settings depend on machine configuration
connect.jdbc_postgresql [ disabled ] Variable settings depend on machine configuration
connect.json_java_2 [ disabled ] Need MongoDB running and its Java Driver installed
connect.json_java_3 [ disabled ] Need MongoDB running and its Java Driver installed
connect.json_mongo_c [ disabled ] Need MongoDB running and its C Driver installed
connect.mongo_c [ disabled ] Need MongoDB running and its C Driver installed
connect.mongo_java_2 [ disabled ] Need MongoDB running and its Java Driver installed
connect.mongo_java_3 [ disabled ] Need MongoDB running and its Java Driver installed
connect.tbl_thread [ disabled ] Bug MDEV-9844,10179,14214 03/01/2018 OB Option THREAD removed
connect.json [ pass ] 23
connect.part_file [ pass ] 41
connect.part_table [ pass ] 46
connect.drop-open-error [ pass ] 7
connect.secure_file_priv [ pass ] 5
connect.alter [ pass ] 27
connect.alter_xml [ skipped ] Need windows
connect.alter_xml2 [ pass ] 7
connect.bin [ pass ] 7
connect.csv [ pass ] 16
connect.datest [ pass ] 4
connect.dbf [ pass ] 34
connect.dir [ pass ] 3
connect.endian [ pass ] 6
connect.fix [ pass ] 18
connect.fmt [ pass ] 5
connect.general [ pass ] 3
connect.grant [ pass ] 68
connect.grant3 [ pass ] 2
connect.index [ pass ] 153
connect.infoschema-9739 [ skipped ] Need windows
connect.infoschema2-9739 [ pass ] 3
connect.ini [ pass ] 22
connect.ini_grant [ pass ] 9
connect.json_udf [ pass ] 31
connect.json_udf_bin [ pass ] 45
connect.mrr [ pass ] 14
connect.mul [ pass ] 5
connect.mul_new [ pass ] 7
connect.mysql [ pass ] 51
connect.mysql_discovery [ pass ] 9
connect.mysql_exec [ pass ] 9
connect.mysql_grant [ pass ] 7
connect.mysql_index [ pass ] 173
connect.mysql_new [ pass ] 24
connect.null [ pass ] 9
connect.occur [ pass ] 12
connect.odbc [ skipped ] No ODBC support
connect.odbc_firebird [ skipped ] No ODBC support
connect.odbc_oracle [ skipped ] No ODBC support
connect.odbc_postgresql [ skipped ] No ODBC support
connect.odbc_sqlite3 [ skipped ] No ODBC support
connect.odbc_sqlite3_grant [ skipped ] No ODBC support
connect.odbc_xls [ skipped ] No ODBC support
connect.pivot [ pass ] 22
connect.tbl [ pass ] 13
connect.temporary [ pass ]
connect.type_inet6 [ pass ] 2
connect.unsigned [ pass ] 6
connect.upd [ pass ] 218
connect.updelx [ pass ] 282
connect.updelx2 [ pass ] 5
connect.vcol [ pass ] 2
connect.vec [ pass ] 10
connect.xcol [ pass ] 5
connect.xml [ skipped ] Need windows
connect.xml2 [ pass ] 22
connect.xml2_grant [ pass ] 14
connect.xml2_html [ pass ] 3
connect.xml2_mdev5261 [ pass ] 5
connect.xml2_mult [ pass ] 9
connect.xml2_zip [ pass ] 4
connect.xml_grant [ skipped ] Need windows
connect.xml_html [ skipped ] Need windows
connect.xml_mdev5261 [ skipped ] Need windows
connect.xml_mult [ skipped ] Need windows
connect.xml_zip [ skipped ] Need windows

connect.zip [ pass ] 13
--------------------------------------------------------------------------

The servers were restarted 3 times
Spent 1.540 of 11 seconds executing testcases

Completed: All 53 tests were successful.

15 tests were skipped, 15 by the test itself.

接下来我们测试一下CONNECT存储引擎可能会涉及到性能的几个表类型,看看性能情况怎么样。

在mariadb的配置文件里,添加如下配置:

1
plugin_dir = /home/mariadb/build-mariadb-server/storage/connect

重启数据库。

登陆到mariadb,执行如下语句,启用CONNECT存储引擎:

1
install soname 'ha_connect';

DOS表类型

DOS表类型实际上对应的就是普通的文本文件,文件中每一列的数据是定长的,只有最后一列是变长的。

我们有如下格式的一个文件,大小在659M,行数为1300多万行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
root@mariadb-arm-test:~# ls -lh /tmp/data_dos.txt
-rw-r--r-- 1 root root 659M Dec 11 16:56 /tmp/data_dos.txt
root@mariadb-arm-test:~# wc -l /tmp/data_dos.txt
13819904 /tmp/data_dos.txt
root@mariadb-arm-test:~# tail /tmp/data_dos.txt
53975 1352378138480 dxmpjtsqpmjhltthckyniaxdw
53976 1364107020874 wyqnxsuexajbvekyqamxpzlcoy
53977 1374147913184 ltmwgochhwfwpzbsquuttiglcls
53978 1382141036405 gmedqtymkbkxntltajxbhlaeomda
53979 1391311715118 kfnwwabonxwyfauqlbeguhstbrumj
53980 1303473032800 padqsyhvhvltcpebmtkt
53981 1313713811959 nwduknfpyaeplwpmpozwh
53982 1321669920517 aglfvcespetmimkugoyygu
53983 1333059736260 ltccjcvptqcrwklutmrlles
53984 1342207932901 vqkkrkrqkmcxvkdmiwyfylxe

我们用CONNECT存储引擎来查询一下试试。

登陆到数据库,创建表:

1
2
3
4
5
6
create table userlist (
id char(5) not null,
phonenum char(13) not null flag=6,
userid char(30) not null flag=20
)
engine=CONNECT table_type=DOS file_name='/tmp/data_dos.txt';

建表速度很快

1
Query OK, 0 rows affected (0.01 sec)

查询一下总条数,速度也很快,只要13秒

1
2
3
4
5
6
7
MariaDB [test]> select count(*) from userlist;
+----------+
| count(*) |
+----------+
| 13819904 |
+----------+
1 row in set (13.350 sec)

再试试带条件查询和插入

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
MariaDB [test]> pager md5sum;select * from userlist where id = 23456;
PAGER set to 'md5sum'
8f3941802b868b2bbb43390ab5a2f1c8 -
256 rows in set (9.221 sec)

MariaDB [test]> pager md5sum;select * from userlist where id = 23456;
PAGER set to 'md5sum'
8f3941802b868b2bbb43390ab5a2f1c8 -
256 rows in set (9.221 sec)

MariaDB [test]> pager md5sum;select * from userlist where userid like 'ltccjc%';
PAGER set to 'md5sum'
2bde22161d7a5bad42c377b7290cdff4 -
256 rows in set (19.196 sec)

MariaDB [test]> pager md5sum;select * from userlist where userid like 'ltccjc%';
PAGER set to 'md5sum'
2bde22161d7a5bad42c377b7290cdff4 -
256 rows in set (19.279 sec)

MariaDB [test]> insert into userlist values('76245','1812345678901', 'swweyuoyslwtqwtoeurwio');
Query OK, 1 row affected (0.000 sec)

MariaDB [test]> delete from userlist where id=76245 and phonenum = '1812345678901';
Query OK, 1 row affected (9.719 sec)

可以看到查询速度还是比较快的,从1300多万条记录查询出256条记录只要9秒. 不过like查询稍微慢一点,要19秒。
插入也很快,插入实际上就是在原文件的最后插入一条记录。

不建议在这种DOS类型的表上面进行Update操作,可能会有奇怪的事情发生。因为Update的时候默认不是更新原文件,而是更新一个临时文件。有一个参数connect_use_tempfile可以配置支持更新源文件,但是在变长记录的文件里面也不建议使用。如果确定要用这个功能,建议自己测试以后再使用。

FIX表类型

FIX表类型和DOS的区别就在于最后一列的宽度是定长的,还是变长的,如果是定长的,就是FIX类型,如果是变长的,就是DOS类型。

现在我们来测试一下FIX类型,我们有如下文件,大小为667M,1343万行,最后一列长度都是一样的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mariadb@mariadb-arm-test:/tmp$ ls -lhrt data_fix.txt
-rw-rw-r-- 1 mariadb mariadb 667M Dec 13 16:29 data_fix.txt
mariadb@mariadb-arm-test:/tmp$ wc -l data_fix.txt
13432192 data_fix.txt
mariadb@mariadb-arm-test:/tmp$ tail data_fix.txt
104930 1302533021891 idmujolnoimfhafacordyrrzmmjjuf
104931 1313195411180 bwuhbnyhxscqprsizgyjkotqfyndda
104932 1323234034390 frynxnmkttbxzibvsdzokelmnqwkwc
104933 1331271621869 sthxlnhbfqpnrpkiahgvqrlfhrcfxu
104934 1342061419324 hheddnlwxslnqttygyalcamwdxujpp
104935 1351001433696 ajlhunoosohztneqfxzotfbgfdegrj
104936 1362857527592 qraermmsdzuxoogprmxsxrqmktfrwr
104937 1372706024878 tkozushzmqhuxkavdbcxwnmksiceji
104938 1382111414256 osllmzidwpmdbxzkzfkelsnwcdtstr
104939 1393077713849 dzafytqywdyahhjvhvdosnsduwwpvw

登陆到MariaDB数据库,我们创建一个FIX表:

1
2
3
4
5
6
create table userlist_fix (
id char(6) not null,
phonenum char(13) not null flag=7,
userid char(30) not null flag=21
)
engine=CONNECT table_type=FIX file_name='/tmp/data_fix.txt' lrecl=52;

注意那个lrecl=52是最长行的字节数长度,包含换行符。

我们把上面的操作再来一遍:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
MariaDB [test]> select count(*) from userlist_fix;
+----------+
| count(*) |
+----------+
| 13432193 |
+----------+
1 row in set (0.000 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where id = 23456;
PAGER set to 'md5sum'
0871c0088df3f4dbf37b12e56611a0b0 -
128 rows in set (3.535 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where id = 23456;
PAGER set to 'md5sum'
0871c0088df3f4dbf37b12e56611a0b0 -
128 rows in set (3.535 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where userid like 'ltccjc%';
PAGER set to 'md5sum'
Empty set (9.344 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where userid like 'ltccjc%';
PAGER set to 'md5sum'
Empty set (9.325 sec)

MariaDB [test]> insert into userlist_fix values('76245','1812345678901', 'swweyuoyslwtqwtoeurwio');
Query OK, 1 row affected (0.000 sec)

可以看出来,FIX格式每一种类型的操作都比DOS格式快了很多,说明MariaDB在定长格式的处理里面,利用了定长的特性,性能会更好。 最后一行,我们插入的数据不是对齐的,MariaDB插入数据库的时候,会自动给你补齐:

1
2
3
4
5
6
7
8
9
10
11
mariadb@mariadb-arm-test:/tmp$ tail data_fix.txt
104930 1302533021891 idmujolnoimfhafacordyrrzmmjjuf
104931 1313195411180 bwuhbnyhxscqprsizgyjkotqfyndda
104932 1323234034390 frynxnmkttbxzibvsdzokelmnqwkwc
104933 1331271621869 sthxlnhbfqpnrpkiahgvqrlfhrcfxu
104934 1342061419324 hheddnlwxslnqttygyalcamwdxujpp
104935 1351001433696 ajlhunoosohztneqfxzotfbgfdegrj
104936 1362857527592 qraermmsdzuxoogprmxsxrqmktfrwr
104937 1372706024878 tkozushzmqhuxkavdbcxwnmksiceji
104938 1382111414256 osllmzidwpmdbxzkzfkelsnwcdtstr
76245 1812345678901 swweyuoyslwtqwtoeurwio

定长的类型,对于update和delete操作也是比较友好的,速度也很快,而且都是在原文件上直接操作的。

1
2
3
4
5
6
MariaDB [test]> update userlist_fix set phonenum = '1811111111111' where id= 104938;
Query OK, 128 rows affected (3.513 sec)
Rows matched: 128 Changed: 128 Warnings: 0

MariaDB [test]> delete from userlist_fix where phonenum = '1811111111111';
Query OK, 128 rows affected (3.672 sec)

所以我们在根据原始文本文件建表的时候,如果有可能处理成FIX类型的,尽量处理成FIX类型的。

关于connect_work_size参数

接下来我们修改一下connect_work_size参数,这个参数从官网描述来看,应该是CONNECT引擎使用的内存区域,我们来增大一下看看有没有效果。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
MariaDB [test]> select @@session.connect_work_size;
+-----------------------------+
| @@session.connect_work_size |
+-----------------------------+
| 67108864 |
+-----------------------------+
1 row in set (0.000 sec)

MariaDB [test]> set session connect_work_size = 1073741824;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> select count(*) from userlist;
+----------+
| count(*) |
+----------+
| 13819650 |
+----------+
1 row in set (13.576 sec)

MariaDB [test]> pager md5sum;select * from userlist where id = 23456;
PAGER set to 'md5sum'
8f3941802b868b2bbb43390ab5a2f1c8 -
256 rows in set (9.306 sec)

MariaDB [test]> pager md5sum;select * from userlist where id = 23456;
PAGER set to 'md5sum'
8f3941802b868b2bbb43390ab5a2f1c8 -
256 rows in set (9.305 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where id = 23456;
PAGER set to 'md5sum'
6daf8a0815666570cee0011046e3b568 -
128 rows in set (3.526 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where id = 23456;
PAGER set to 'md5sum'
6daf8a0815666570cee0011046e3b568 -
128 rows in set (3.526 sec)

根据上面输出可以看出来,并没有效果。说明CONNECT存储引擎处理文件时,并没有把整个文件缓存起来,增大内存空间,并没有用。我们速度之所以这么快,也得益于我们用的华为云的虚拟机采用的存储是SSD,速度比较快。

CSV表类型

CSV表类型对应的就是普通的CSV格式的文本文件,不过这里的分隔符可以指定,就是不仅仅可以是逗号,也可以是指定的其他字符。

我们来测试下CSV表类型的性能。

我们创建一个如下格式的CSV文件,大小为603M,1390多万行。列与列之间用竖线分隔。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mariadb@mariadb-arm-test:/tmp$ wc -l data.csv
13909120 data.csv
mariadb@mariadb-arm-test:/tmp$ ls -lhrt data.csv
-rw-rw-r-- 1 mariadb mariadb 603M Dec 14 12:01 data.csv
mariadb@mariadb-arm-test:/tmp$ tail -10 data.csv
1411|1311008423658|hudulfqgofqahfxwtqihs
1412|1321119524436|atacjjbibtzqwohjnrtebc
1413|1334167818661|tnhdlrpbpaxzfyiymholbfy
1414|1344170515956|xlwsuiotrfhmoltiscorhdec
1415|1351643025744|httzsnqpjydxeinvwcaekgydt
1416|1363624529884|tyyttcbsqjiwufvmpuveototmq
1417|1371750537023|diwtmqdriozmoscypldbcdrqvfd
1418|1383308741505|gqchktgbsstyixidjztgtpgutzeb
1419|1393553934078|culiovhkrdovnbcdestvnqqubwdzl
1420|1302340336368|bfjdepbcrefcfvbhznbd

使用如下sql创建CSV类型的表

1
2
3
4
5
6
create table userlist_csv (
id int not null,
phonenum char(13) not null,
userid char(29) not null)
engine=CONNECT table_type=CSV file_name='/tmp/data.csv'
header=0 sep_char='|' quoted=0;

然后我们进行一些增删改查操作:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
MariaDB [test]> select count(*) from userlist_csv;
9229c8871853852300fca7ce0bb33598 -
1 row in set (17.835 sec)

MariaDB [test]> select count(*) from userlist_csv;
9229c8871853852300fca7ce0bb33598 -
1 row in set (17.712 sec)

MariaDB [test]> pager md5sum;select * from userlist_csv where id = 23456;
PAGER set to 'md5sum'
9a02ca9b5c3044a73f4441c10945ff91 -
140 rows in set (9.356 sec)

MariaDB [test]> pager md5sum;select * from userlist_csv where id = 23456;
PAGER set to 'md5sum'
9a02ca9b5c3044a73f4441c10945ff91 -
140 rows in set (9.355 sec)

MariaDB [test]> pager md5sum;select * from userlist_csv where userid like 'ltccjc%';
PAGER set to 'md5sum'
Empty set (21.788 sec)

MariaDB [test]> pager md5sum;select * from userlist_csv where userid like 'ltccjc%';
PAGER set to 'md5sum'
Empty set (21.781 sec)

MariaDB [test]> insert into userlist_csv values('76245','1812345678901', 'swweyuoyslwtqwtoeurwio');
Query OK, 1 row affected (0.000 sec)

MariaDB [test]> update userlist_csv set phonenum='1811111111111' where id=76245;
Query OK, 141 rows affected (6.803 sec)
Rows matched: 141 Changed: 141 Warnings: 0

MariaDB [test]> delete from userlist_csv where id=76245 and phonenum='1811111111111';
Query OK, 141 rows affected (6.674 sec)

可以看出来,性能还是不错的,count的查询和like查询比DOS类型稍微慢一点,等于查询和插入操作都和DOS类型差不多,删除操作比DOS类型还要快一点。

FMT表类型

FMT表类型相对于CSV更灵活点,可以通过格式匹配来指定每一个字段的格式,长度。FMT在分析各种类型的日志的时候很有用。

我们用它来分析一个操作系统auth日志。

这个auth日志有600多M,500多万行。

1
2
3
4
root@mariadb-arm-test:/var/log# ls -lhrt auth.log.1
-rw-r----- 1 syslog adm 625M Dec 14 15:11 auth.log.1
root@mariadb-arm-test:/var/log# wc -l auth.log.1
5890048 auth.log.1

截取部分文件内容,格式如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Dec 12 16:39:23 localhost groupadd[32556]: group added to /etc/group: name=mariadb, GID=1000
Dec 12 16:39:23 localhost groupadd[32556]: group added to /etc/gshadow: name=mariadb
Dec 12 16:39:23 localhost groupadd[32556]: new group: name=mariadb, GID=1000
Dec 12 16:39:23 localhost useradd[32560]: new user: name=mariadb, UID=1000, GID=1000, home=/home/mariadb, shell=/bin/bash
Dec 12 16:39:43 localhost passwd[32569]: pam_unix(passwd:chauthtok): password changed for mariadb
Dec 12 16:39:46 localhost chfn[32570]: changed user 'mariadb' information
Dec 12 16:45:36 localhost su[1805]: Successful su for mariadb by root
Dec 12 16:45:36 localhost su[1805]: + /dev/pts/0 root:mariadb
Dec 12 16:45:36 localhost su[1805]: pam_unix(su:session): session opened for user mariadb by root(uid=0)
Dec 12 16:45:36 localhost su[1805]: pam_systemd(su:session): Cannot create session: Already running in a session
Dec 12 16:50:01 localhost sudo: mariadb : TTY=pts/0 ; PWD=/home/mariadb ; USER=root ; COMMAND=/usr/bin/apt-get install build-essential libncurses5-dev gnutls-dev libcurl4-gnutls-dev
Dec 12 16:50:01 localhost sudo: pam_unix(sudo:session): session opened for user root by root(uid=0)
Dec 12 16:50:04 localhost sudo: pam_unix(sudo:session): session closed for user root
Dec 12 16:50:14 localhost sudo: mariadb : TTY=pts/0 ; PWD=/home/mariadb ; USER=root ; COMMAND=/usr/bin/apt-get install build-essential libncurses5-dev gnutls-dev libcurl4-gnutls-dev
Dec 12 16:50:14 localhost sudo: pam_unix(sudo:session): session opened for user root by root(uid=0)
Dec 12 16:50:25 localhost sudo: pam_unix(sudo:session): session closed for user root
Dec 12 16:50:49 localhost sudo: mariadb : TTY=pts/0 ; PWD=/home/mariadb ; USER=root ; COMMAND=/usr/bin/apt-get install zlib1g-dev ccache libnuma-dev libxml2-dev cmake bison
Dec 12 16:50:49 localhost sudo: pam_unix(sudo:session): session opened for user root by root(uid=0)
Dec 12 16:51:08 localhost sudo: pam_unix(sudo:session): session closed for user root
Dec 12 16:56:01 localhost CRON[7999]: pam_unix(cron:session): session opened for user root by (uid=0)
Dec 12 16:56:01 localhost CRON[7999]: pam_unix(cron:session): session closed for user root
Dec 12 17:17:01 localhost CRON[16493]: pam_unix(cron:session): session opened for user root by (uid=0)
Dec 12 17:17:01 localhost CRON[16493]: pam_unix(cron:session): session closed for user root

创建表:

1
2
3
4
5
6
7
8
create table authlog (
month char(3) not null field_format='%n%s%n',
day char(2) not null field_format=' %n%s%n',
time char(8) not null field_format=' %n%s%n',
hostname char(9) not null field_format=' %n%s%n',
module char(20) not null field_format=' %n%s%n',
message char(255) not null field_format=' %n%255[^\n]%n')
engine=CONNECT table_type=FMT file_name='/var/log/auth.log.1' ;

查询一下,现在看格式是不是好看多了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
MariaDB [test]>  select * from authlog limit 10;
+-------+-----+----------+-----------+----------------------+-----------------------------------------------------------------------+
| month | day | time | hostname | module | message |
+-------+-----+----------+-----------+----------------------+-----------------------------------------------------------------------+
| Dec | 2 | 20:47:26 | localhost | systemd-logind[721]: | Power key pressed. |
| Dec | 2 | 20:47:26 | localhost | systemd-logind[721]: | Powering Off... |
| Dec | 2 | 20:47:26 | localhost | systemd-logind[721]: | System is powering down. |
| Dec | 2 | 20:47:27 | localhost | systemd: | pam_unix(systemd-user:session): session closed for user root |
| Dec | 11 | 10:07:26 | localhost | systemd-logind[960]: | New seat seat0. |
| Dec | 11 | 10:07:26 | localhost | systemd-logind[960]: | Watching system buttons on /dev/input/event0 (Power Button) |
| Dec | 11 | 10:07:26 | localhost | systemd-logind[960]: | Watching system buttons on /dev/input/event2 (QEMU QEMU USB Keyboard) |
| Dec | 11 | 10:07:50 | localhost | sshd[2384]: | Server listening on 0.0.0.0 port 22. |
| Dec | 11 | 10:07:50 | localhost | sshd[2384]: | Server listening on :: port 22. |
| Dec | 11 | 10:07:51 | localhost | sshd[2384]: | Received signal 15; terminating. |
+-------+-----+----------+-----------+----------------------+-----------------------------------------------------------------------+

我们来进行一些查询:

1
2
3
4
5
6
7
MariaDB [test]> select count(*) from authlog where month='Dec' and day='12' and module like 'sshd%';
+----------+
| count(*) |
+----------+
| 3170304 |
+----------+
1 row in set (15.609 sec)

速度还可以接受。 当然你如果用如下的方式,用grep命令去过滤,速度会更快一点:

1
2
3
4
5
6
7
root@mariadb-arm-test:/var/log# echo 3 > /proc/sys/vm/drop_caches
root@mariadb-arm-test:/var/log# time grep 'Dec 12' auth.log.1 |grep sshd |wc -l
3170304

real 0m3.610s
user 0m1.745s
sys 0m1.170s

但是用SQL,有如下几个优点:

1、数据是按字段分隔开的,结果会更精确。

2、你如果想把日志导入到数据库里面,就很方便了,直接create table tablename as select * from authlog就搞定了。

3、有时候会需要把日志里面的内容和表里面的内容进行关联查询,创建成表,就很方便了。

总结

CONNECT的功能很多,像解析xml,json,远程查询其他数据库功能,行列转换,虚拟列等等,我们都没有演示。不过这些功能通过之前跑的测试用例可以发现在arm64上都是支持的。而且这些功能涉及性能的场景不太多,像一般xml和json不会有太大的文件,远程查询其实更依赖的是远程数据库的性能。对于这些功能具体的用法,感兴趣的小伙伴可以参考MariaDB官方Knowledge Base指导,写的都是很详细的。

本文主要测试了几个CONNECT可能会涉及到性能相关,需要解析大文件的表类型,发现在arm64上性能其实也是很不错的。另外,本文测试的查询场景,都是未建索引的场景,如果数据的选择性很好的话,可以选择建立索引,再去查询,几乎就是瞬间就可以返回了。当然索引需要占用一定的空间,另外也要求对磁盘有写权限。具体用法请参考MariaDB官方指导。

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×