Test MariaDB CONNECT storage engine on arm64 platform

Author:zhaorenhai

MariaDB’s connect storage engine is an exciting storage engine.

The reason why it’s exciting is that the storage engine can do so much. Except for OLTP field, it is almost omnipotent. It can directly use SQL to query external files. It supports various common text files and log files. As long as you tell CONNECT the record format, you can use SQL to query and analyze. It also supports common file formats such as JSON, XML and INI. In addition to supporting local files, remote data can also be queried directly through the rest interface. For local files, in addition to query, it also supports insert, delete and other functions. It can also realize the dblink function in Oracle, that is, it can remotely query the tables of another database, even different types of databases, including Mysql, PostgreSQL, Oracle, etc., and even MongoDB of NoSQL type. Of course, on this basis, it is easier to import external files into the database, and it is more powerful and flexible than mysqlimport, because it can realize various format conversion and filtering. The same is true for exporting data to an external file. It also supports querying compressed files (currently only in ZIP format). Even binary file queries and updates are supported. The engine also supports virtual table which functions similar to Oracle’s dual. It can also support a variety of functions, such as single row to multi row, row column conversion, multi table mapping to a table.

In a word, with the CONNECT storage engine, everything can be transformed into SQL, and SQL can be transformed into everything.

Next, let’s take a look at the performance of the connect storage engine on the arm64 platform.

(note that this article is not a detailed guide to the use of CONNECT. As CONNECT involves too many contents, please refer to the official guidance for detailed guidance, various parameters, and restriction of each usage https://mariadb.com/kb/en/connect/ )

Our test platform uses Kunpeng virtual machine of Huawei cloud. The OS is Ubuntu 18.04 version.

Execute the following command:

1
sudo apt search mariadb |grep connect

Output:

1
mariadb-plugin-connect/bionic-updates,bionic-security 1:10.1.47-0ubuntu0.18.04.1 arm64

It can be seen that MariaDB has released the CONNECT plugin of arm64 platform, which shows that the basic functions of this plugin should be OK on arm64.

If you compile MariaDB code yourself (please refer to this blog for specific compilation methods), you can run the connect test case, you would find that most of the test cases are successfully on arm64, except some skipped due to environmental reasons .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
mariadb@host1:~/build-mariadb-server/mysql-test$ ./mysql-test-run --suite=connect
Logging: /home/mariadb/server/mysql-test/mysql-test-run.pl --suite=connect
vardir: /home/mariadb/build-mariadb-server/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/mariadb/build-mariadb-server/mysql-test/var'...
Checking supported features...
MariaDB Version 10.6.0-MariaDB

- SSL connections supported
Using suites: connect
Collecting tests...
Installing system database...

==============================================================================

TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
connect.grant2 [ disabled ] Until fixed
connect.jdbc [ disabled ] Variable settings depend on machine configuration
connect.jdbc_new [ disabled ] Variable settings depend on machine configuration
connect.jdbc_oracle [ disabled ] Variable settings depend on machine configuration
connect.jdbc_postgresql [ disabled ] Variable settings depend on machine configuration
connect.json_java_2 [ disabled ] Need MongoDB running and its Java Driver installed
connect.json_java_3 [ disabled ] Need MongoDB running and its Java Driver installed
connect.json_mongo_c [ disabled ] Need MongoDB running and its C Driver installed
connect.mongo_c [ disabled ] Need MongoDB running and its C Driver installed
connect.mongo_java_2 [ disabled ] Need MongoDB running and its Java Driver installed
connect.mongo_java_3 [ disabled ] Need MongoDB running and its Java Driver installed
connect.tbl_thread [ disabled ] Bug MDEV-9844,10179,14214 03/01/2018 OB Option THREAD removed
connect.json [ pass ] 23
connect.part_file [ pass ] 41
connect.part_table [ pass ] 46
connect.drop-open-error [ pass ] 7
connect.secure_file_priv [ pass ] 5
connect.alter [ pass ] 27
connect.alter_xml [ skipped ] Need windows
connect.alter_xml2 [ pass ] 7
connect.bin [ pass ] 7
connect.csv [ pass ] 16
connect.datest [ pass ] 4
connect.dbf [ pass ] 34
connect.dir [ pass ] 3
connect.endian [ pass ] 6
connect.fix [ pass ] 18
connect.fmt [ pass ] 5
connect.general [ pass ] 3
connect.grant [ pass ] 68
connect.grant3 [ pass ] 2
connect.index [ pass ] 153
connect.infoschema-9739 [ skipped ] Need windows
connect.infoschema2-9739 [ pass ] 3
connect.ini [ pass ] 22
connect.ini_grant [ pass ] 9
connect.json_udf [ pass ] 31
connect.json_udf_bin [ pass ] 45
connect.mrr [ pass ] 14
connect.mul [ pass ] 5
connect.mul_new [ pass ] 7
connect.mysql [ pass ] 51
connect.mysql_discovery [ pass ] 9
connect.mysql_exec [ pass ] 9
connect.mysql_grant [ pass ] 7
connect.mysql_index [ pass ] 173
connect.mysql_new [ pass ] 24
connect.null [ pass ] 9
connect.occur [ pass ] 12
connect.odbc [ skipped ] No ODBC support
connect.odbc_firebird [ skipped ] No ODBC support
connect.odbc_oracle [ skipped ] No ODBC support
connect.odbc_postgresql [ skipped ] No ODBC support
connect.odbc_sqlite3 [ skipped ] No ODBC support
connect.odbc_sqlite3_grant [ skipped ] No ODBC support
connect.odbc_xls [ skipped ] No ODBC support
connect.pivot [ pass ] 22
connect.tbl [ pass ] 13
connect.temporary [ pass ]
connect.type_inet6 [ pass ] 2
connect.unsigned [ pass ] 6
connect.upd [ pass ] 218
connect.updelx [ pass ] 282
connect.updelx2 [ pass ] 5
connect.vcol [ pass ] 2
connect.vec [ pass ] 10
connect.xcol [ pass ] 5
connect.xml [ skipped ] Need windows
connect.xml2 [ pass ] 22
connect.xml2_grant [ pass ] 14
connect.xml2_html [ pass ] 3
connect.xml2_mdev5261 [ pass ] 5
connect.xml2_mult [ pass ] 9
connect.xml2_zip [ pass ] 4
connect.xml_grant [ skipped ] Need windows
connect.xml_html [ skipped ] Need windows
connect.xml_mdev5261 [ skipped ] Need windows
connect.xml_mult [ skipped ] Need windows
connect.xml_zip [ skipped ] Need windows

connect.zip [ pass ] 13
--------------------------------------------------------------------------

The servers were restarted 3 times
Spent 1.540 of 11 seconds executing testcases

Completed: All 53 tests were successful.

15 tests were skipped, 15 by the test itself.

Next, let’s test several table types that the CONNECT storage engine may involve in performance.

In the configuration file of MariaDB, add the following configuration:

1
plugin_dir = /home/mariadb/build-mariadb-server/storage/connect

Restart the database.

Log in to MariaDB and execute the following statement to enable the connect storage engine:

1
install soname 'ha_connect';

DOS Table Type

DOS table type actually corresponds to ordinary text file. The data of each column in the file is fixed length, only the last column is variable length.

We have a file in the following format, the size is 659M, and the number of lines is more than 13 million.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
root@mariadb-arm-test:~# ls -lh /tmp/data_dos.txt
-rw-r--r-- 1 root root 659M Dec 11 16:56 /tmp/data_dos.txt
root@mariadb-arm-test:~# wc -l /tmp/data_dos.txt
13819904 /tmp/data_dos.txt
root@mariadb-arm-test:~# tail /tmp/data_dos.txt
53975 1352378138480 dxmpjtsqpmjhltthckyniaxdw
53976 1364107020874 wyqnxsuexajbvekyqamxpzlcoy
53977 1374147913184 ltmwgochhwfwpzbsquuttiglcls
53978 1382141036405 gmedqtymkbkxntltajxbhlaeomda
53979 1391311715118 kfnwwabonxwyfauqlbeguhstbrumj
53980 1303473032800 padqsyhvhvltcpebmtkt
53981 1313713811959 nwduknfpyaeplwpmpozwh
53982 1321669920517 aglfvcespetmimkugoyygu
53983 1333059736260 ltccjcvptqcrwklutmrlles
53984 1342207932901 vqkkrkrqkmcxvkdmiwyfylxe

Let’s use CONNECT storage engine to query it.

Log in to the database and create a table:

1
2
3
4
5
6
create table userlist (
id char(5) not null,
phonenum char(13) not null flag=6,
userid char(30) not null flag=20
)
engine=CONNECT table_type=DOS file_name='/tmp/data_dos.txt';

It’s very fast:

1
Query OK, 0 rows affected (0.01 sec)

Query the total number of records, also fast, it’s 13 seconds:

1
2
3
4
5
6
7
MariaDB [test]> select count(*) from userlist;
+----------+
| count(*) |
+----------+
| 13819904 |
+----------+
1 row in set (13.350 sec)

Try the conditional query , insert, delete:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
MariaDB [test]> pager md5sum;select * from userlist where id = 23456;
PAGER set to 'md5sum'
8f3941802b868b2bbb43390ab5a2f1c8 -
256 rows in set (9.221 sec)

MariaDB [test]> pager md5sum;select * from userlist where id = 23456;
PAGER set to 'md5sum'
8f3941802b868b2bbb43390ab5a2f1c8 -
256 rows in set (9.221 sec)

MariaDB [test]> pager md5sum;select * from userlist where userid like 'ltccjc%';
PAGER set to 'md5sum'
2bde22161d7a5bad42c377b7290cdff4 -
256 rows in set (19.196 sec)

MariaDB [test]> pager md5sum;select * from userlist where userid like 'ltccjc%';
PAGER set to 'md5sum'
2bde22161d7a5bad42c377b7290cdff4 -
256 rows in set (19.279 sec)

MariaDB [test]> insert into userlist values('76245','1812345678901', 'swweyuoyslwtqwtoeurwio');
Query OK, 1 row affected (0.000 sec)

MariaDB [test]> delete from userlist where id=76245 and phonenum = '1812345678901';
Query OK, 1 row affected (9.719 sec)

You can see the query is fast, with 256 records queried from over 13 million records taking only 9 seconds. However, like queries are slightly slower, taking 19 seconds. Insertion is also fast. Inserting is essentially inserting a record at the end of the original file.

Update operations on tables of this DOS type are not recommended and strange things may occur. Because Update does not update the original file by default, it updates a temporary file. There is one parameterconnect_Use_Tempfile can be configured to support updating source files, but is not recommended for files with variable length records. If you are sure you want to use this feature, it is recommended that you test it yourself before using it.

FIX Table Type

The difference between FIX table type and DOS is whether the width of the last column is fixed or variable, FIX if it is fixed, or DOS if it is variable.

Now let’s test the FIX type. We have the following file, 667M in size, 13.43 million lines, and the last column is the same length.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mariadb@mariadb-arm-test:/tmp$ ls -lhrt data_fix.txt
-rw-rw-r-- 1 mariadb mariadb 667M Dec 13 16:29 data_fix.txt
mariadb@mariadb-arm-test:/tmp$ wc -l data_fix.txt
13432192 data_fix.txt
mariadb@mariadb-arm-test:/tmp$ tail data_fix.txt
104930 1302533021891 idmujolnoimfhafacordyrrzmmjjuf
104931 1313195411180 bwuhbnyhxscqprsizgyjkotqfyndda
104932 1323234034390 frynxnmkttbxzibvsdzokelmnqwkwc
104933 1331271621869 sthxlnhbfqpnrpkiahgvqrlfhrcfxu
104934 1342061419324 hheddnlwxslnqttygyalcamwdxujpp
104935 1351001433696 ajlhunoosohztneqfxzotfbgfdegrj
104936 1362857527592 qraermmsdzuxoogprmxsxrqmktfrwr
104937 1372706024878 tkozushzmqhuxkavdbcxwnmksiceji
104938 1382111414256 osllmzidwpmdbxzkzfkelsnwcdtstr
104939 1393077713849 dzafytqywdyahhjvhvdosnsduwwpvw

Logging in to the MariaDB database, create a FIX table:

1
2
3
4
5
6
create table userlist_fix (
id char(6) not null,
phonenum char(13) not null flag=7,
userid char(30) not null flag=21
)
engine=CONNECT table_type=FIX file_name='/tmp/data_fix.txt' lrecl=52;

Note that lrecl=52 is the length of the longest line of bytes and contains line breaks.

Let’s repeat the query, insertion operations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
MariaDB [test]> select count(*) from userlist_fix;
+----------+
| count(*) |
+----------+
| 13432193 |
+----------+
1 row in set (0.000 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where id = 23456;
PAGER set to 'md5sum'
0871c0088df3f4dbf37b12e56611a0b0 -
128 rows in set (3.535 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where id = 23456;
PAGER set to 'md5sum'
0871c0088df3f4dbf37b12e56611a0b0 -
128 rows in set (3.535 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where userid like 'ltccjc%';
PAGER set to 'md5sum'
Empty set (9.344 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where userid like 'ltccjc%';
PAGER set to 'md5sum'
Empty set (9.325 sec)

MariaDB [test]> insert into userlist_fix values('76245','1812345678901', 'swweyuoyslwtqwtoeurwio');
Query OK, 1 row affected (0.000 sec)

You can see that each type of FIX type operates much faster than the DOS type, indicating that MariaDB takes advantage of the fixed-length feature in the processing of fixed-length formats and has better performance. Last line, the data we inserted is not aligned, MariaDB will automatically complete for you when it inserts the database:

1
2
3
4
5
6
7
8
9
10
11
mariadb@mariadb-arm-test:/tmp$ tail data_fix.txt
104930 1302533021891 idmujolnoimfhafacordyrrzmmjjuf
104931 1313195411180 bwuhbnyhxscqprsizgyjkotqfyndda
104932 1323234034390 frynxnmkttbxzibvsdzokelmnqwkwc
104933 1331271621869 sthxlnhbfqpnrpkiahgvqrlfhrcfxu
104934 1342061419324 hheddnlwxslnqttygyalcamwdxujpp
104935 1351001433696 ajlhunoosohztneqfxzotfbgfdegrj
104936 1362857527592 qraermmsdzuxoogprmxsxrqmktfrwr
104937 1372706024878 tkozushzmqhuxkavdbcxwnmksiceji
104938 1382111414256 osllmzidwpmdbxzkzfkelsnwcdtstr
76245 1812345678901 swweyuoyslwtqwtoeurwio

Fixed-length types are also friendly to update and delete operations, are fast, and operate directly on the original file.

1
2
3
4
5
6
MariaDB [test]> update userlist_fix set phonenum = '1811111111111' where id= 104938;
Query OK, 128 rows affected (3.513 sec)
Rows matched: 128 Changed: 128 Warnings: 0

MariaDB [test]> delete from userlist_fix where phonenum = '1811111111111';
Query OK, 128 rows affected (3.672 sec)

So when we build a table from the original text file, if it is possible to process it as FIX type, try to process it as FIX type.

About connect_work_size

Next let’s modify connect_work_size parameter, which should be the memory area used by the CONNECT engine from the description of the Knowledge Base website. We enlarge it to see if it works.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
MariaDB [test]> select @@session.connect_work_size;
+-----------------------------+
| @@session.connect_work_size |
+-----------------------------+
| 67108864 |
+-----------------------------+
1 row in set (0.000 sec)

MariaDB [test]> set session connect_work_size = 1073741824;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> select count(*) from userlist;
+----------+
| count(*) |
+----------+
| 13819650 |
+----------+
1 row in set (13.576 sec)

MariaDB [test]> pager md5sum;select * from userlist where id = 23456;
PAGER set to 'md5sum'
8f3941802b868b2bbb43390ab5a2f1c8 -
256 rows in set (9.306 sec)

MariaDB [test]> pager md5sum;select * from userlist where id = 23456;
PAGER set to 'md5sum'
8f3941802b868b2bbb43390ab5a2f1c8 -
256 rows in set (9.305 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where id = 23456;
PAGER set to 'md5sum'
6daf8a0815666570cee0011046e3b568 -
128 rows in set (3.526 sec)

MariaDB [test]> pager md5sum;select * from userlist_fix where id = 23456;
PAGER set to 'md5sum'
6daf8a0815666570cee0011046e3b568 -
128 rows in set (3.526 sec)

As you can see from the above output, it has no effect. When the CONNECT storage engine processes files, it does not cache the entire file, increasing the memory space, and is not useful. The reason we’re so fast is also due to the fact that our Huawei Cloud virtual machine uses SSD storage, which is faster.

CSV Table Type

The CSV table type corresponds to ordinary CSV text files, but the delimiter here can be specified, either as a comma or as other specified characters.

Let’s test the performance of the CSV table type.

We create a CSV file in following format, 603M in size, with over 13.9 million lines. Columns are separated by |.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mariadb@mariadb-arm-test:/tmp$ wc -l data.csv
13909120 data.csv
mariadb@mariadb-arm-test:/tmp$ ls -lhrt data.csv
-rw-rw-r-- 1 mariadb mariadb 603M Dec 14 12:01 data.csv
mariadb@mariadb-arm-test:/tmp$ tail -10 data.csv
1411|1311008423658|hudulfqgofqahfxwtqihs
1412|1321119524436|atacjjbibtzqwohjnrtebc
1413|1334167818661|tnhdlrpbpaxzfyiymholbfy
1414|1344170515956|xlwsuiotrfhmoltiscorhdec
1415|1351643025744|httzsnqpjydxeinvwcaekgydt
1416|1363624529884|tyyttcbsqjiwufvmpuveototmq
1417|1371750537023|diwtmqdriozmoscypldbcdrqvfd
1418|1383308741505|gqchktgbsstyixidjztgtpgutzeb
1419|1393553934078|culiovhkrdovnbcdestvnqqubwdzl
1420|1302340336368|bfjdepbcrefcfvbhznbd

Create a CSV-type table using the following SQL:

1
2
3
4
5
6
create table userlist_csv (
id int not null,
phonenum char(13) not null,
userid char(29) not null)
engine=CONNECT table_type=CSV file_name='/tmp/data.csv'
header=0 sep_char='|' quoted=0;

Then we do some query ,insertion and deletion operations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
MariaDB [test]> select count(*) from userlist_csv;
9229c8871853852300fca7ce0bb33598 -
1 row in set (17.835 sec)

MariaDB [test]> select count(*) from userlist_csv;
9229c8871853852300fca7ce0bb33598 -
1 row in set (17.712 sec)

MariaDB [test]> pager md5sum;select * from userlist_csv where id = 23456;
PAGER set to 'md5sum'
9a02ca9b5c3044a73f4441c10945ff91 -
140 rows in set (9.356 sec)

MariaDB [test]> pager md5sum;select * from userlist_csv where id = 23456;
PAGER set to 'md5sum'
9a02ca9b5c3044a73f4441c10945ff91 -
140 rows in set (9.355 sec)

MariaDB [test]> pager md5sum;select * from userlist_csv where userid like 'ltccjc%';
PAGER set to 'md5sum'
Empty set (21.788 sec)

MariaDB [test]> pager md5sum;select * from userlist_csv where userid like 'ltccjc%';
PAGER set to 'md5sum'
Empty set (21.781 sec)

MariaDB [test]> insert into userlist_csv values('76245','1812345678901', 'swweyuoyslwtqwtoeurwio');
Query OK, 1 row affected (0.000 sec)

MariaDB [test]> update userlist_csv set phonenum='1811111111111' where id=76245;
Query OK, 141 rows affected (6.803 sec)
Rows matched: 141 Changed: 141 Warnings: 0

MariaDB [test]> delete from userlist_csv where id=76245 and phonenum='1811111111111';
Query OK, 141 rows affected (6.674 sec)

You can see that the performance is still good, count query and like query are slightly slower than the DOS type, both query and insert operations are similar to the DOS type, and delete operation is faster than the DOS type.

FMT Table Type

The FMT table type is more flexible than CSV, and you can specify the format and length of each field by format matching. FMT is useful when analyzing various types of logs.

We use it to analyze an operating system auth log.

This auth log has over 600 M, more than 5 million rows.

1
2
3
4
root@mariadb-arm-test:/var/log# ls -lhrt auth.log.1
-rw-r----- 1 syslog adm 625M Dec 14 15:11 auth.log.1
root@mariadb-arm-test:/var/log# wc -l auth.log.1
5890048 auth.log.1

Intercept part of the file content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Dec 12 16:39:23 localhost groupadd[32556]: group added to /etc/group: name=mariadb, GID=1000
Dec 12 16:39:23 localhost groupadd[32556]: group added to /etc/gshadow: name=mariadb
Dec 12 16:39:23 localhost groupadd[32556]: new group: name=mariadb, GID=1000
Dec 12 16:39:23 localhost useradd[32560]: new user: name=mariadb, UID=1000, GID=1000, home=/home/mariadb, shell=/bin/bash
Dec 12 16:39:43 localhost passwd[32569]: pam_unix(passwd:chauthtok): password changed for mariadb
Dec 12 16:39:46 localhost chfn[32570]: changed user 'mariadb' information
Dec 12 16:45:36 localhost su[1805]: Successful su for mariadb by root
Dec 12 16:45:36 localhost su[1805]: + /dev/pts/0 root:mariadb
Dec 12 16:45:36 localhost su[1805]: pam_unix(su:session): session opened for user mariadb by root(uid=0)
Dec 12 16:45:36 localhost su[1805]: pam_systemd(su:session): Cannot create session: Already running in a session
Dec 12 16:50:01 localhost sudo: mariadb : TTY=pts/0 ; PWD=/home/mariadb ; USER=root ; COMMAND=/usr/bin/apt-get install build-essential libncurses5-dev gnutls-dev libcurl4-gnutls-dev
Dec 12 16:50:01 localhost sudo: pam_unix(sudo:session): session opened for user root by root(uid=0)
Dec 12 16:50:04 localhost sudo: pam_unix(sudo:session): session closed for user root
Dec 12 16:50:14 localhost sudo: mariadb : TTY=pts/0 ; PWD=/home/mariadb ; USER=root ; COMMAND=/usr/bin/apt-get install build-essential libncurses5-dev gnutls-dev libcurl4-gnutls-dev
Dec 12 16:50:14 localhost sudo: pam_unix(sudo:session): session opened for user root by root(uid=0)
Dec 12 16:50:25 localhost sudo: pam_unix(sudo:session): session closed for user root
Dec 12 16:50:49 localhost sudo: mariadb : TTY=pts/0 ; PWD=/home/mariadb ; USER=root ; COMMAND=/usr/bin/apt-get install zlib1g-dev ccache libnuma-dev libxml2-dev cmake bison
Dec 12 16:50:49 localhost sudo: pam_unix(sudo:session): session opened for user root by root(uid=0)
Dec 12 16:51:08 localhost sudo: pam_unix(sudo:session): session closed for user root
Dec 12 16:56:01 localhost CRON[7999]: pam_unix(cron:session): session opened for user root by (uid=0)
Dec 12 16:56:01 localhost CRON[7999]: pam_unix(cron:session): session closed for user root
Dec 12 17:17:01 localhost CRON[16493]: pam_unix(cron:session): session opened for user root by (uid=0)
Dec 12 17:17:01 localhost CRON[16493]: pam_unix(cron:session): session closed for user root

Create the table:

1
2
3
4
5
6
7
8
create table authlog (
month char(3) not null field_format='%n%s%n',
day char(2) not null field_format=' %n%s%n',
time char(8) not null field_format=' %n%s%n',
hostname char(9) not null field_format=' %n%s%n',
module char(20) not null field_format=' %n%s%n',
message char(255) not null field_format=' %n%255[^\n]%n')
engine=CONNECT table_type=FMT file_name='/var/log/auth.log.1' ;

Query it , the format looks better now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
MariaDB [test]>  select * from authlog limit 10;
+-------+-----+----------+-----------+----------------------+-----------------------------------------------------------------------+
| month | day | time | hostname | module | message |
+-------+-----+----------+-----------+----------------------+-----------------------------------------------------------------------+
| Dec | 2 | 20:47:26 | localhost | systemd-logind[721]: | Power key pressed. |
| Dec | 2 | 20:47:26 | localhost | systemd-logind[721]: | Powering Off... |
| Dec | 2 | 20:47:26 | localhost | systemd-logind[721]: | System is powering down. |
| Dec | 2 | 20:47:27 | localhost | systemd: | pam_unix(systemd-user:session): session closed for user root |
| Dec | 11 | 10:07:26 | localhost | systemd-logind[960]: | New seat seat0. |
| Dec | 11 | 10:07:26 | localhost | systemd-logind[960]: | Watching system buttons on /dev/input/event0 (Power Button) |
| Dec | 11 | 10:07:26 | localhost | systemd-logind[960]: | Watching system buttons on /dev/input/event2 (QEMU QEMU USB Keyboard) |
| Dec | 11 | 10:07:50 | localhost | sshd[2384]: | Server listening on 0.0.0.0 port 22. |
| Dec | 11 | 10:07:50 | localhost | sshd[2384]: | Server listening on :: port 22. |
| Dec | 11 | 10:07:51 | localhost | sshd[2384]: | Received signal 15; terminating. |
+-------+-----+----------+-----------+----------------------+-----------------------------------------------------------------------+

Let’s do some queries:

1
2
3
4
5
6
7
MariaDB [test]> select count(*) from authlog where month='Dec' and day='12' and module like 'sshd%';
+----------+
| count(*) |
+----------+
| 3170304 |
+----------+
1 row in set (15.609 sec)

The speed is acceptable.

Of course, if you use the grep command to filter in the following way, it will be faster:

1
2
3
4
5
6
7
root@mariadb-arm-test:/var/log# echo 3 > /proc/sys/vm/drop_caches
root@mariadb-arm-test:/var/log# time grep 'Dec 12' auth.log.1 |grep sshd |wc -l
3170304

real 0m3.610s
user 0m1.745s
sys 0m1.170s

However, with SQL, there are several advantages:

  1. Data is separated by fields and the results are more accurate.

  2. If you want to import the log into the database, it’s very convenient. Just `create table tablename as select * from authlog’.

  3. Sometimes it is convenient to create a table when you want to join the contents of the log and the contents of some tables.

Summary

CONNECT has many functions, such as parsing xml, json, remote querying other database functions, row and column conversion, virtual columns and so on, which we have not demonstrated. However, these features can be found to be supported on arm64 by the test cases previously run. And there are not many scenarios where these functions involve performance. Like XML and Json, you hardly can find some files which are several hundreds megabytes. Remote queries on remote databases actually rely more on the performance of remote databases themselves. For specific usage of these features, you can refer to MariaDB’s official Knowledge Base guide, which is detailed.

This article mainly tests several CONNECT storage engine scenarios may be related to performance, need to parse large files, found that the performance on arm64 is also good. In addition, the query scenarios tested in this paper are all unindexed scenes. If the data’s selectivity is good, you can choose to index and query again, then the result can be returned instantly. Of course, the index needs to take up a certain amount of disk space, and it also requires write permission to the disk. Please refer to the official guidance of MariaDB for specific usage.

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×