Apache Tomcat X86 vs ARM64 性能比拼

译者: wangxiyuan
作者: Martin Grigorov
原文链接: https://medium.com/@martin.grigorov/compare-apache-tomcat-performance-on-x86-64-and-arm64-cpu-architectures-aacfbb0b5bb6

Tomcat PMC Martin Grigorov带来的Tomcat X86 VS ARM64性能测试。

大多数软件开发人员通常不会考虑他们的软件将在何种 CPU 架构上运行。 尽管没有官方的统计数据,但根据我的经验,大多数桌面和后端应用软件都运行在 x86_64架构(英特尔和 AMD 处理器)上,大多数移动和物联网设备都运行在 ARM 架构上。 开发人员使用一些高级编程语言为各自的 CPU 架构编写软件,并不考虑在运行时执行何种汇编指令。 而这正是高级编程语言的目的—- 让编译器处理低级硬件指令,并简化我们的任务,使其只专注于高级业务相关问题。

生活简单而美好,但有时候,笔记本电脑和台式机硬件及软件制造业的巨头会说,我们的软件必须在不同的架构上运行——先是从 PowerPC 到英特尔,现在从英特尔到 ARM64(消息来源: Bloomberg & AppleInsider)。 由于电力消耗较低,甚至一些较大的云供应商也开始提供 ARM64虚拟机(如亚马逊 AWS华为云Linaro)。 但还有以下不确定性:

  • 我的软件能在新的 CPU 架构上运行吗?
  • 我需要做出什么样的改变才能让它发挥作用
  • 它会像以前一样表现出色吗

为了能够回答这些问题,你必须撸起袖子进行测试!

您可以在任何云供应商上部署软件。 有些还提供免费试用期! 或者如果你的预算很少,你可以试试 RaspberryPi

根据您编写软件所使用的编程语言,您可能需要进行一些更改,或者根本不需要更改! 如果你使用一个直译语言文件(例如 Python,Perl,Ruby,JVM,…) ,那么解释器已经支持 ARM64的可能性相当高,你可以不做任何改变就继续使用它! 但是,如果你的软件需要被编译,那么你需要调整你的工具链,并确保有 ARM64二进制文件为你所有的依赖! 根据您的软件开发堆栈,您的修改量可能会有所不同!

一旦我们的软件在新架构上运行良好,我们将能够检查它是否像以前那样执行良好。 最近一些用户在 Apache Tomcat 邮件列表中询问是否支持 ARM64架构。 因为 Apache Tomcat 大部分代码是用 Java 编写的,所以它可以基本的运行在ARM64上。 如果您需要使用 libtcnative 和 / 或 mod_jk,那么您需要自己在 ARM64上构建它们。 Apache Tomcat 团队使用 TravisCI 在 ARM64上测试 JavaC 代码,目前还没有已知的问题!

为了比较某些软件的两个版本的性能,通常您将在同一个硬件上运行测试,但在这种情况下,由于我们使用不同的 CPU 架构,这是不可能的。 在我的测试中,我使用了两个具有类似规范的 vm:

  • X86_64处理器是:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    Architecture:        x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 8
    On-line CPU(s) list: 0-7
    Thread(s) per core: 2
    Core(s) per socket: 4
    Socket(s): 1
    NUMA node(s): 1
    Vendor ID: GenuineIntel
    CPU family: 6
    Model: 85
    Model name: Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
    Stepping: 7
    CPU MHz: 3000.000
    BogoMIPS: 6000.00
    Hypervisor vendor: KVM
    Virtualization type: full
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 1024K
    L3 cache: 30976K
    NUMA node0 CPU(s): 0-7
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear flush_l1d arch_capabilities
  • Arm64处理器是:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    Architecture:        aarch64
    Byte Order: Little Endian
    CPU(s): 8
    On-line CPU(s) list: 0-7
    Thread(s) per core: 1
    Core(s) per socket: 8
    Socket(s): 1
    NUMA node(s): 1
    Vendor ID: 0x48
    Model: 0
    Stepping: 0x1
    BogoMIPS: 200.00
    L1d cache: 64K
    L1i cache: 64K
    L2 cache: 512K
    L3 cache: 32768K
    NUMA node0 CPU(s): 0-7
    Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

两个虚拟机具有相同数量的 RAM、磁盘和网络连接。

测试应用程序基于 Spring Boot (2.2.7) ,运行嵌入式 Apache Tomcat 9.0.x + OpenSSL 1.1.1h-dev 和 Apache Apr 1.7.x。 每晚构建,并且有一个单独的 REST 客户端,该客户端公开一个用于创建实体的 PUT Endpoint、一个用于读取它的 GET Endpoint、一个用于更新它的 POST Endpoint和一个用于删除它的 DELETE Endpoint。 它使用 Memcached 作为数据库。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
package info.mgsolutions.testbed.rest;

import info.mgsolutions.testbed.domain.Error;
import info.mgsolutions.testbed.domain.Person;
import info.mgsolutions.testbed.domain.Response;
import lombok.extern.slf4j.Slf4j;
import net.rubyeye.xmemcached.MemcachedClient;
import net.rubyeye.xmemcached.exception.MemcachedException;
import net.rubyeye.xmemcached.transcoders.SerializingTranscoder;
import org.springframework.http.HttpStatus;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.http.server.ServletServerHttpRequest;
import org.springframework.web.bind.annotation.DeleteMapping;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.PutMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.util.UriComponents;
import org.springframework.web.util.UriComponentsBuilder;

import javax.servlet.http.HttpServletRequest;
import java.net.URI;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.concurrent.TimeoutException;

/**
* A REST endpoint that uses Memcached to get its data.
*/
@RestController
@RequestMapping("testbed/memcached")
@Slf4j
public class MemcachedTestController {

public static final int TTL_IN_SECONDS = 1000;

private final SerializingTranscoder coder = new SerializingTranscoder();
private final MemcachedClient client;

public MemcachedTestController(MemcachedClient client) {
this.client = client;
}

@PutMapping(value ="", consumes = MediaType.APPLICATION_JSON_VALUE, produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<Response> create(@RequestBody Person person,
HttpServletRequest servletRequest) throws InterruptedException, MemcachedException, TimeoutException {

final String base64Name = base64(person.name);
Person existing = client.get(base64Name);
if (existing != null) {
log.info("Create: Person with name {} already exists!", person.name);
Error error = new Error("Person with name " + person.name + " already exists!");
return ResponseEntity.status(HttpStatus.NOT_ACCEPTABLE)
.body(error);
}
log.info("Create: Going to create '{}'", person);
client.set(base64Name, TTL_IN_SECONDS, person, coder);

ServletServerHttpRequest request = new ServletServerHttpRequest(servletRequest);
final UriComponents uriComponents = UriComponentsBuilder.fromHttpRequest(request).build();
final URI uri = uriComponents.encode(StandardCharsets.UTF_8).toUri();
return ResponseEntity.created(uri).contentType(MediaType.APPLICATION_JSON).body(person);
}

@GetMapping(value ="", consumes = MediaType.ALL_VALUE, produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<Person> get(@RequestParam String name) throws InterruptedException, MemcachedException, TimeoutException {

Person person = (Person) client.get(base64(name), coder);
if (person == null) {
log.info("Get: Cannot find a person with name {}!", name);
return ResponseEntity.notFound().build();
}

log.info("Get: Found person with name: {}!", name);
return ResponseEntity.ok().body(person);
}

@PostMapping(value ="", consumes = MediaType.APPLICATION_JSON_VALUE, produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<Person> update(@RequestBody Person person) throws InterruptedException, MemcachedException, TimeoutException {
final String name = person.name;
final String base64Name = base64(name);
final Person existing = (Person) client.get(base64Name, coder);
if (existing != null) {
log.info("Update: Going to update: {}", person);
client.set(base64Name, TTL_IN_SECONDS, person, coder);
return ResponseEntity.ok().build();
}

log.info("Update: Cannot find a person with name {}!", person.name);
return ResponseEntity.notFound().build();
}

@DeleteMapping(value ="", produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<Person> delete(@RequestParam String name) throws InterruptedException, MemcachedException, TimeoutException {

final String base64Name = base64(name);
final Person person = client.get(base64Name);
if (person != null) {
log.info("Delete: Going to delete: {}", person);
client.delete(base64Name);
return ResponseEntity.ok().build();
}

log.info("Delete: Cannot find a person with name {}!", name);
return ResponseEntity.notFound().build();
}

private String base64(String name) {
return Base64.getEncoder().encodeToString(name.getBytes(StandardCharsets.UTF_8));
}

}

对于负载测试,我使用了 Apache JMeter 5.2.1和基于主干代码的 wrk。 Jmeter 用于一个真实的情况场景,有1000个并发用户,在 HTTP 请求之间有一个适应期和一段思考时间。 然后用 wrk 测试最大吞吐量。

使用以下参数执行 JMeter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
jmeter.sh \
--testfile JMeter_plan.jmx \
--logfile $RESULTS_FILE \
--reportoutputfolder $RESULTS_FOLDER \
--reportatendofloadtests \
--nongui \
--forceDeleteResultFile \
--jmeterproperty httpclient4.validate_after_inactivity=4900 \
--jmeterproperty httpclient4.time_to_live=120000 \
-Jhost=$JMETER_HOST \
-Jport=$JMETER_PORT \
-Jprotocol=$JMETER_PROTOCOL \
-JresourceFolder=$JMETER_RESOURCE_FOLDER \
-Jusers=1000 \
-JrampUpSecs=5 \
-Jloops=10 \
-JrequestPath=/testbed/memcached

重用 HTTPS 连接需要 httpclient4. * * 属性,否则 Keep-Alive 不会有效。

Jmeter 和 wrk 的结果与存储在 Elasticsearch 的 Logstash 一起解析,并由 Kibana 进行可视化。

Jmeter 的响应时间:
image

正如你所看到的,在5月8日之前 HTTPS 的结果并不是很好。 没有重用 HTTPS 连接,每个请求都进行了 TLS 握手,尽管请求头“ Connection: keep-alive”。 因为 wrk 没有这样的问题,我在 JMeter 邮件列表中询问过,他们给了我上面提到的 httpclient4参数。 (谢谢你,菲利普 · 穆瓦德!) . 不管有没有 HttpClient 的调整,我们看到 x8664和 arm64的响应时间非常相似。 太棒了!

对于 wrk 的吞吐量测试,我使用以下参数运行它:

1
wrk -c96 -t8 -d30s -s /scripts/wrk-report-to-csv.lua $HOST:$PORT

例如,8个线程将使用96个 HTTP (s)连接访问服务器30秒。

为了收集 CSV 文件中的摘要,我使用了这个自定义 Lua 脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
-- Initialize the pseudo random number generator
-- Resource: http://lua-users.org/wiki/MathLibraryTutorial
math.randomseed(os.time())
math.random(); math.random(); math.random()

local _request = {}
method = ''

-- Load URL config from the file
function load_request_objects_from_file(csvFile)
local data = {}


for line in io.lines(csvFile) do
local idx = string.find(line, ":")
local key = string.sub(line, 0, idx-1)
local value = string.sub(line, idx+1, string.len(line))
data[key] = value
end

return data
end

function trim(s)
return s:match "^%s*(.-)%s*$"
end

function readMethod()
local method = ''
local f = io.open('/data/method.txt',"r")
if f ~= nil then
method = f:read("*all")
io.close(f)
return trim(method)
else
print('Cannot read the method name from /data/method.txt')
os.exit(123)
end
end

function init(args)
local method = readMethod()
-- Load request config from file
_request = load_request_objects_from_file("/data/" .. method ..".conf")

--for i, val in pairs(_request) do
-- print('Request:\t', i, val)
--end

if _request[method] ~= nil then
print("multiplerequests: No requests found.")
os.exit()
end

print("multiplerequests: Found a " .. _request.method .. " request")
end


request = function()
local request_object = _request

-- Return the request object with the current URL path
local headers = {}
headers["Content-type"] = "application/json"

local url = wrk.format(request_object.method, request_object.path, headers, request_object.body)
return url
end

function done(summary, latency, reqs)

local date_table = os.date("*t")
local ms = string.match(tostring(os.clock()), "%d%.(%d+)") / 1000
local hour, minute, second = date_table.hour, date_table.min, date_table.sec
local year, month, day = date_table.year, date_table.month, date_table.day
local timeStamp = string.format("%04d-%02d-%02dT%02d:%02d:%02d.%03d", year, month, day, hour, minute, second, ms)
print("Timestamp: " .. timeStamp)

local method = readMethod()
file = io.open('/results/today/' .. method .. '.csv', 'w')
io.output(file)

-- summary
io.write("timeStamp,")
io.write("duration_microseconds,")
io.write("num_requests,")
io.write("total_bytes,")
io.write("connect_errors,")
io.write("read_errors,")
io.write("write_errors,")
io.write("error_status_codes,")
io.write("timeouts,")
io.write("requests_per_sec,")
io.write("bytes_per_sec,")
-- latency
io.write("lat_min_microseconds,")
io.write("lat_max_microseconds,")
io.write("lat_mean_microseconds,")
io.write("lat_stdev_microseconds,")
io.write("lat_percentile_90_microseconds,")
io.write("lat_percentile_95_microseconds,")
io.write("lat_percentile_99_microseconds\n")

-- summary
io.write(string.format("%s,", timeStamp))
io.write(string.format("%d,", summary.duration))
io.write(string.format("%d,", summary.requests))
io.write(string.format("%d,", summary.bytes))
io.write(string.format("%d,", summary.errors.connect))
io.write(string.format("%d,", summary.errors.read))
io.write(string.format("%d,", summary.errors.write))
io.write(string.format("%d,", summary.errors.status))
io.write(string.format("%d,", summary.errors.timeout))
io.write(string.format("%.2f,", summary.requests/(summary.duration / 1000 / 1000)))
io.write(string.format("%.2f,", summary.bytes/summary.duration))
-- latency
io.write(string.format("%.2f,", latency.min))
io.write(string.format("%.2f,", latency.max))
io.write(string.format("%.2f,", latency.mean))
io.write(string.format("%.2f,", latency.stdev))
io.write(string.format("%.2f,", latency:percentile(90.0)))
io.write(string.format("%.2f,", latency:percentile(95.0)))
io.write(string.format("%.2f\n", latency:percentile(99.0)))

end

结果显示,x86_64上的 Tomcat 比 arm64快两倍:
image

我将试图找出这种差异的原因,并在后续的帖子中与你分享。 如果你有什么想法和建议,我很乐意试试!

祝你黑客生活愉快,注意安全!

The majority of the software developers usually do not think about the CPU architecture their software will run on. I do not have official statistics but in my experience most of the software for desktop and backend applications run on x86_64 architecture (Intel and AMD processors) and most of the mobile and IoT devices run on ARM architecture. The developers write their software for the respective CPU architecture using some high level programming language and do not think what kind of Assembly instructions are being executed at runtime. And this is the purpose of the high level programming languages — to let the compiler deal with the low level hardware instructions and simplify our task to focus only on the high level business related problems.

Life is simple and beautiful but there are times when a big player in the laptop and desktop hardware and software manufacturing comes and says that our software will have to run on a different architecture — first from PowerPC to Intel and now from Intel to ARM64 (sources: Bloomberg & AppleInsider). Due to the lower consumption of electricity even several of the bigger cloud providers started providing ARM64 virtual machines (e.g. Amazon AWS, HuaweiCloud, Linaro). And here comes the uncertainty —

  • Will my software run on the new CPU architecture ?!
  • What kind of changes I will have to do to make it work ?!
  • Will it perform as good as before ?!

To be able to answer these questions you will have to roll up your sleeves and test!

You can deploy your software on any of the cloud providers. Some of them give free trial period! Or if you are on a low budget you can experiment on RaspberryPi.

Depending on what programming language you use to write your software you might need to do some changes or not at all! If you use an interpreted language (e.g. Python, Perl, Ruby, JVM, …) then the chances the interpreter already supports ARM64 are pretty high and you are good to go without any changes! But if your software needs to be compiled then you will need to adapt your toolchain and make sure that there are ARM64 binaries for all your dependencies! Depending on your software development stack your mileage may vary!

Once our software runs fine on the new architecture we will be able to check whether it performs as good as before. Recently some users have asked in Apache Tomcat mailing lists whether ARM64 architecture is supported. Since Apache Tomcat is written mostly in Java it “Just Works”. If you need to use libtcnative and/or mod_jk then you will need to build them yourself on ARM64. Apache Tomcat team uses TravisCI to test both Java and C code on ARM64 and there are no known issues at the moment!

To compare the performance of two versions of some software usually you will run it on the same hardware but in this case since we use different CPU architectures this makes it impossible. For my tests I have used two VMs with similar specifications:

  • The x86_64 processor is:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    Architecture:        x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 8
    On-line CPU(s) list: 0-7
    Thread(s) per core: 2
    Core(s) per socket: 4
    Socket(s): 1
    NUMA node(s): 1
    Vendor ID: GenuineIntel
    CPU family: 6
    Model: 85
    Model name: Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
    Stepping: 7
    CPU MHz: 3000.000
    BogoMIPS: 6000.00
    Hypervisor vendor: KVM
    Virtualization type: full
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 1024K
    L3 cache: 30976K
    NUMA node0 CPU(s): 0-7
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear flush_l1d arch_capabilities
  • The ARM64 processor is:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    Architecture:        aarch64
    Byte Order: Little Endian
    CPU(s): 8
    On-line CPU(s) list: 0-7
    Thread(s) per core: 1
    Core(s) per socket: 8
    Socket(s): 1
    NUMA node(s): 1
    Vendor ID: 0x48
    Model: 0
    Stepping: 0x1
    BogoMIPS: 200.00
    L1d cache: 64K
    L1i cache: 64K
    L2 cache: 512K
    L3 cache: 32768K
    NUMA node0 CPU(s): 0-7
    Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

Both VMs have same amount of RAM, disk and network connectivity.

The test application is based on Spring Boot (2.2.7) running an embedded Apache Tomcat 9.0.x nightly builds and has a single REST controller that exposes a PUT endpoint for creating an entity, a GET endpoint to read it, a POST endpoint to update it and a DELETE endpoint to remove it. It uses Memcached as a database.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
package info.mgsolutions.testbed.rest;

import info.mgsolutions.testbed.domain.Error;
import info.mgsolutions.testbed.domain.Person;
import info.mgsolutions.testbed.domain.Response;
import lombok.extern.slf4j.Slf4j;
import net.rubyeye.xmemcached.MemcachedClient;
import net.rubyeye.xmemcached.exception.MemcachedException;
import net.rubyeye.xmemcached.transcoders.SerializingTranscoder;
import org.springframework.http.HttpStatus;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.http.server.ServletServerHttpRequest;
import org.springframework.web.bind.annotation.DeleteMapping;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.PutMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.util.UriComponents;
import org.springframework.web.util.UriComponentsBuilder;

import javax.servlet.http.HttpServletRequest;
import java.net.URI;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.concurrent.TimeoutException;

/**
* A REST endpoint that uses Memcached to get its data.
*/
@RestController
@RequestMapping("testbed/memcached")
@Slf4j
public class MemcachedTestController {

public static final int TTL_IN_SECONDS = 1000;

private final SerializingTranscoder coder = new SerializingTranscoder();
private final MemcachedClient client;

public MemcachedTestController(MemcachedClient client) {
this.client = client;
}

@PutMapping(value ="", consumes = MediaType.APPLICATION_JSON_VALUE, produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<Response> create(@RequestBody Person person,
HttpServletRequest servletRequest) throws InterruptedException, MemcachedException, TimeoutException {

final String base64Name = base64(person.name);
Person existing = client.get(base64Name);
if (existing != null) {
log.info("Create: Person with name {} already exists!", person.name);
Error error = new Error("Person with name " + person.name + " already exists!");
return ResponseEntity.status(HttpStatus.NOT_ACCEPTABLE)
.body(error);
}
log.info("Create: Going to create '{}'", person);
client.set(base64Name, TTL_IN_SECONDS, person, coder);

ServletServerHttpRequest request = new ServletServerHttpRequest(servletRequest);
final UriComponents uriComponents = UriComponentsBuilder.fromHttpRequest(request).build();
final URI uri = uriComponents.encode(StandardCharsets.UTF_8).toUri();
return ResponseEntity.created(uri).contentType(MediaType.APPLICATION_JSON).body(person);
}

@GetMapping(value ="", consumes = MediaType.ALL_VALUE, produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<Person> get(@RequestParam String name) throws InterruptedException, MemcachedException, TimeoutException {

Person person = (Person) client.get(base64(name), coder);
if (person == null) {
log.info("Get: Cannot find a person with name {}!", name);
return ResponseEntity.notFound().build();
}

log.info("Get: Found person with name: {}!", name);
return ResponseEntity.ok().body(person);
}

@PostMapping(value ="", consumes = MediaType.APPLICATION_JSON_VALUE, produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<Person> update(@RequestBody Person person) throws InterruptedException, MemcachedException, TimeoutException {
final String name = person.name;
final String base64Name = base64(name);
final Person existing = (Person) client.get(base64Name, coder);
if (existing != null) {
log.info("Update: Going to update: {}", person);
client.set(base64Name, TTL_IN_SECONDS, person, coder);
return ResponseEntity.ok().build();
}

log.info("Update: Cannot find a person with name {}!", person.name);
return ResponseEntity.notFound().build();
}

@DeleteMapping(value ="", produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<Person> delete(@RequestParam String name) throws InterruptedException, MemcachedException, TimeoutException {

final String base64Name = base64(name);
final Person person = client.get(base64Name);
if (person != null) {
log.info("Delete: Going to delete: {}", person);
client.delete(base64Name);
return ResponseEntity.ok().build();
}

log.info("Delete: Cannot find a person with name {}!", name);
return ResponseEntity.notFound().build();
}

private String base64(String name) {
return Base64.getEncoder().encodeToString(name.getBytes(StandardCharsets.UTF_8));
}

}

For load testing I have used Apache JMeter 5.2.1 and wrk from its master branch. JMeter is used for a real case scenario with 1000 simultaneous users, ramp-up period and think time between the HTTP requests. And wrk is used to test the maximal throughput.
JMeter is executed with these arguments:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
jmeter.sh \
--testfile JMeter_plan.jmx \
--logfile $RESULTS_FILE \
--reportoutputfolder $RESULTS_FOLDER \
--reportatendofloadtests \
--nongui \
--forceDeleteResultFile \
--jmeterproperty httpclient4.validate_after_inactivity=4900 \
--jmeterproperty httpclient4.time_to_live=120000 \
-Jhost=$JMETER_HOST \
-Jport=$JMETER_PORT \
-Jprotocol=$JMETER_PROTOCOL \
-JresourceFolder=$JMETER_RESOURCE_FOLDER \
-Jusers=1000 \
-JrampUpSecs=5 \
-Jloops=10 \
-JrequestPath=/testbed/memcached

The httpclient4.** properties are needed to reuse the HTTPS connections, otherwise Keep-Alive was not effective.

The results from both JMeter and wrk are parsed with Logstash, stored in Elasticsearch and visualized by Kibana.

JMeter’s response times:
image

As you can see the results for HTTPS were not very good before May 8th. The HTTPS connections were not reused and TLS handshake has been done for each request, despite request header “Connection: keep-alive”. Since there was no such issue with wrk I’ve asked at JMeter mailing lists and they gave me the httpclient4 arguments above. (Thank you, Philippe Mouawad!). With or without the HttpClient tweak we see that the response times are very similar for x86_64 and arm64.

For the throughput test with wrk I have run it with these parameters:

1
wrk -c96 -t8 -d30s -s /scripts/wrk-report-to-csv.lua $HOST:$PORT

i.e. 8 threads will hit the server for 30 seconds using 96 HTTP(S) connections.

To collect the summary in a CSV file I used this custom Lua script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
-- Initialize the pseudo random number generator
-- Resource: http://lua-users.org/wiki/MathLibraryTutorial
math.randomseed(os.time())
math.random(); math.random(); math.random()

local _request = {}
method = ''

-- Load URL config from the file
function load_request_objects_from_file(csvFile)
local data = {}


for line in io.lines(csvFile) do
local idx = string.find(line, ":")
local key = string.sub(line, 0, idx-1)
local value = string.sub(line, idx+1, string.len(line))
data[key] = value
end

return data
end

function trim(s)
return s:match "^%s*(.-)%s*$"
end

function readMethod()
local method = ''
local f = io.open('/data/method.txt',"r")
if f ~= nil then
method = f:read("*all")
io.close(f)
return trim(method)
else
print('Cannot read the method name from /data/method.txt')
os.exit(123)
end
end

function init(args)
local method = readMethod()
-- Load request config from file
_request = load_request_objects_from_file("/data/" .. method ..".conf")

--for i, val in pairs(_request) do
-- print('Request:\t', i, val)
--end

if _request[method] ~= nil then
print("multiplerequests: No requests found.")
os.exit()
end

print("multiplerequests: Found a " .. _request.method .. " request")
end


request = function()
local request_object = _request

-- Return the request object with the current URL path
local headers = {}
headers["Content-type"] = "application/json"

local url = wrk.format(request_object.method, request_object.path, headers, request_object.body)
return url
end

function done(summary, latency, reqs)

local date_table = os.date("*t")
local ms = string.match(tostring(os.clock()), "%d%.(%d+)") / 1000
local hour, minute, second = date_table.hour, date_table.min, date_table.sec
local year, month, day = date_table.year, date_table.month, date_table.day
local timeStamp = string.format("%04d-%02d-%02dT%02d:%02d:%02d.%03d", year, month, day, hour, minute, second, ms)
print("Timestamp: " .. timeStamp)

local method = readMethod()
file = io.open('/results/today/' .. method .. '.csv', 'w')
io.output(file)

-- summary
io.write("timeStamp,")
io.write("duration_microseconds,")
io.write("num_requests,")
io.write("total_bytes,")
io.write("connect_errors,")
io.write("read_errors,")
io.write("write_errors,")
io.write("error_status_codes,")
io.write("timeouts,")
io.write("requests_per_sec,")
io.write("bytes_per_sec,")
-- latency
io.write("lat_min_microseconds,")
io.write("lat_max_microseconds,")
io.write("lat_mean_microseconds,")
io.write("lat_stdev_microseconds,")
io.write("lat_percentile_90_microseconds,")
io.write("lat_percentile_95_microseconds,")
io.write("lat_percentile_99_microseconds\n")

-- summary
io.write(string.format("%s,", timeStamp))
io.write(string.format("%d,", summary.duration))
io.write(string.format("%d,", summary.requests))
io.write(string.format("%d,", summary.bytes))
io.write(string.format("%d,", summary.errors.connect))
io.write(string.format("%d,", summary.errors.read))
io.write(string.format("%d,", summary.errors.write))
io.write(string.format("%d,", summary.errors.status))
io.write(string.format("%d,", summary.errors.timeout))
io.write(string.format("%.2f,", summary.requests/(summary.duration / 1000 / 1000)))
io.write(string.format("%.2f,", summary.bytes/summary.duration))
-- latency
io.write(string.format("%.2f,", latency.min))
io.write(string.format("%.2f,", latency.max))
io.write(string.format("%.2f,", latency.mean))
io.write(string.format("%.2f,", latency.stdev))
io.write(string.format("%.2f,", latency:percentile(90.0)))
io.write(string.format("%.2f,", latency:percentile(95.0)))
io.write(string.format("%.2f\n", latency:percentile(99.0)))

end

the results show that Tomcat on x86_64 is twice faster than on arm64:
image

I will try to find out

what is the reason for this difference and share it with you in a follow up post. If you have any ideas I would be happy to test them!

Happy hacking and stay safe!

#

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×