This is a stripped down version of the evaluation section. Instead of focusing on the results and their meaning with respect to the research question, we will instead briefly detail what each of the R chunks do.
The first step is to load the data and map the names used in the ReBench file
to names that are better suited for the paper. The mapping is defined in the
scripts/data-processing.R
file. Furthermore, the setup chunk also defines
a small help method for simple boxplots:
# load libraries, the data, and prepare it
source("scripts/init.R", chdir=TRUE)
opts_chunk$set(dev='svg')
simple_boxplot <- function(data_set, vm, x = "Benchmark", y = "Value") {
data_vm <- droplevels(subset(data_set, VM == vm))
p <- ggplot(data_vm, aes_string(x=x, y=y))
p + geom_boxplot(outlier.size = 0.9) + theme_simple()
}
The section 2.2 uses some statistics from the reflection benchmarks, which are calculated as follows:
stats <- ddply(data, ~ Benchmark + VM,
summarise,
Time.mean = mean(Value),
Time.geomean = geometric.mean(Value),
Time.stddev = sd(Value),
Time.median = median(Value),
max = max(Value),
min = min(Value))
# helper function: gm == geometric mean
gm <- function (data, bench) {data[(data$Benchmark==bench), c('Time.geomean')]}
# Note, the Java measuments are reported as `operations per time unit` (ops/s)
# this has different semantics than the PyPy numbers
direct_mean <- gm(stats, "benchmarks.DynamicProxy.directAdd")
proxied_mean <- gm(stats, "benchmarks.DynamicProxy.proxiedAdd")
direct_mean <- gm(stats, "benchmarks.MethodInvocation.testDirectCall")
handle_finalvar_mean <- gm(stats, "benchmarks.MethodInvocation.testHandleCallFromFinalVar")
handle_mutablevar_mean <- gm(stats, "benchmarks.MethodInvocation.testHandleCallFromMutableVar")
handle_staticfinalvar_mean <- gm(stats, "benchmarks.MethodInvocation.testHandleCallFromStaticFinalVar")
refl_finalvar_mean <- gm(stats, "benchmarks.MethodInvocation.testReflectiveCallFromFinalVar")
refl_mutablevar_mean <- gm(stats, "benchmarks.MethodInvocation.testReflectiveCallFromMutableVar")
refl_staticfinal_mean <- gm(stats, "benchmarks.MethodInvocation.testReflectiveCallFromStaticFinalVar")
direct <- gm(stats, "DynamicDirect")
proxied <- gm(stats, "DynamicProxy")
## Note, the PyPy numbers are time measurements, and therefore have different
## semantics than the Java measurements
direct <- gm(stats, "MethodDirect")
direct_static <- gm(stats, "MethodDirectStatic")
refl_bound <- gm(stats, "MethodReflectiveBound")
refl_unbound <- gm(stats, "MethodReflectiveUnbound")
refl_static_bound <- gm(stats, "MethodReflectiveStaticBound")
refl_static_unbound <- gm(stats, "MethodReflectiveStaticUnbound")
direct <- gm(stats, "OMOPDirect")
proxied <- gm(stats, "OMOPProxy")
The following chunk creates figure 4 of the paper.
In the first step, we filter the data set to the omop
experiment and drop
the Dispatch
benchmark, which is redundant with the DispatchEnforcedStd
benchmark. Furthermore, we filter out the first 50 iterations, which include
warmup behavior.
Afterwards, we process the data to distinguish the benchmarks executed with and without the OMOP. Then the data is normalized, the names are prepared for the paper, and finally, the plot itself is constructed.
# Ignore Dispatch, DispatchEnforced and DispatchEnforcedStd
# are the proper benchmarks
omop <- droplevels(subset(data, Suite == "omop" & Benchmark != "Dispatch" & Iteration > 50))
# create a column with a boolean indicating whether the benchmark was
# executed with or without the OMOP (enforced) based on the benchmark
# name, and set the benchmark name to the common variant without the
# string 'Enforced' in it.
omop <- ddply(omop, ~ Benchmark + VM + Suite, transform,
Var = grepl("Enforced$", Benchmark),
Benchmark = gsub("(Enforced)|(Std)", "", Benchmark))
omop$Benchmark <- factor(omop$Benchmark)
rtruffle <- "RTruffleSOM (OMOP)"
truffle <- "TruffleSOM.ns (OMOP)"
# normalize data
norm_omop <- ddply(omop, ~ Benchmark + VM + Suite, transform,
RuntimeRatio = Value / geometric.mean(Value[Var == FALSE]))
norm_omop_enforced <- droplevels(subset(
norm_omop, Var == TRUE & (VM == rtruffle | VM == truffle) & Benchmark != "Dispatch"))
# Rename
levels(norm_omop_enforced$VM) <- map_names(
levels(norm_omop_enforced$VM),
list("RTruffleSOM (OMOP)" = "SOM[MT]",
"TruffleSOM.ns (OMOP)" = "SOM[PE]"))
levels(norm_omop_enforced$Benchmark) <- map_names(levels(norm_omop_enforced$Benchmark),
list("AddDispatch" = "dispatch",
"AddFieldWrite" = "field write",
"FieldRead" = "field read",
"GlobalRead" = "global read",
"ReqPrim" = "exec. primitive"))
# construct boxplot, indicate expected value with dashed line at 1.0
p <- ggplot(norm_omop_enforced, aes(x = Benchmark, y = RuntimeRatio))
p <- p + facet_grid(~VM, labeller = label_parsed)
p <- p + geom_hline(yintercept = 1, linetype = "dashed")
p <- p + geom_boxplot(outlier.size = 0.9) + theme_simple()
p <- p + scale_y_continuous(name="Runtime normalized to\nrun without OMOP") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=0.5),
panel.border = element_rect(colour = "black", fill = NA))
p
The next chunk creates figure 5 of the paper.
First, the chunk prepares the data for the micro and macro-benchmarks, discarding the iterations that include warmup behavior. Since some of the benchmarks warmup up very late on top of Truffle+Graal, they are handled explicitly.
Then the results are normalized and the measurements of executions without the OMOP are discarded. After normalizing, it is nicer in the graph to only include the measurements of the experiments running with the OMOP. The ones without are at the 1-line anyway.
The last step is to adapt the naming and construct the boxplot.
# discard measurements including warmup time
micro <- droplevels(subset(data,
((Suite == "micro-steady-omop" & Iteration >= 210 & Iteration <= 340 & Benchmark != "TreeSort" & Benchmark != "Fannkuch") |
(Suite == "micro-steady-omop" & Iteration >= 210 + 200 & Iteration <= 340 + 200 & Benchmark == "TreeSort") |
(Suite == "micro-steady-omop" & Iteration >= 210 - 150 & Iteration <= 340 - 150 & Benchmark == "Fannkuch"))
& Benchmark != "Sieve" & Benchmark != "Queens"))
# discard measurements including warmup time
macro <- droplevels(subset(data, Suite == "macro-steady-omop" & Iteration >= 600 & Iteration <= 990))
omop <- rbind(micro, macro)
rtruffle <- "RTruffleSOM (OMOP)"
truffle <- "TruffleSOM.os (OMOP)"
# normalize measurements
norm_omop <- ddply(omop, ~ Benchmark + VM + Suite, transform,
RuntimeRatio = Value / geometric.mean(Value[Var == "false"]))
norm_omop_enforced <- droplevels(subset(norm_omop, Var == "true" & (VM == rtruffle | VM == truffle)))
# adapt naming to paper
levels(norm_omop_enforced$VM) <- map_names(levels(norm_omop_enforced$VM),
list("RTruffleSOM (OMOP)" = "SOM[MT]",
"TruffleSOM.os (OMOP)" = "SOM[PE]"))
# construct plot
p <- ggplot(norm_omop_enforced, aes(x = Benchmark, y = RuntimeRatio))
p <- p + facet_grid(~VM, labeller = label_parsed)
p <- p + geom_hline(yintercept = 1, linetype = "dashed")
p <- p + geom_boxplot(outlier.size = 0.9) + theme_simple()
p <- p + scale_y_continuous(name="Runtime normalized to\nrun without OMOP") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=0.5),
panel.border = element_rect(colour = "black", fill = NA))
p
The paper also refers to averages, minimum, and maximum values. These are calculated in the following chunk.
tdata <- droplevels(subset(norm_omop, Var == "true" & (VM == rtruffle | VM == truffle)))
# get averages, min, max, etc for each benchmark
stats <- ddply(tdata, ~ Benchmark + VM,
summarise,
RR.mean = mean(RuntimeRatio),
RR.geomean = geometric.mean(RuntimeRatio),
RR.stddev = sd(RuntimeRatio),
RR.median = median(RuntimeRatio),
max = max(RuntimeRatio),
min = min(RuntimeRatio))
# then, get averages, min, max, etc for each VM
overall <- ddply(stats, ~ VM,
summarise,
mean = mean(RR.geomean),
geomean = geometric.mean(RR.geomean),
stddev = sd(RR.geomean),
median = median(RR.geomean),
max = max(RR.geomean),
min = min(RR.geomean))
rtruffle_mean <- overall[overall$VM==rtruffle,]$geomean
rtruffle_min <- overall[overall$VM==rtruffle,]$min
rtruffle_max <- overall[overall$VM==rtruffle,]$max
truffle_mean <- overall[overall$VM==truffle, ]$geomean
truffle_min <- overall[overall$VM==truffle, ]$min
truffle_max <- overall[overall$VM==truffle, ]$max
per <- function (val) { round((val * 100) - 100, digits=1) }
Figure 6 of the paper is constructed by the next chunk.
As previously, first the warmup iterations are discarded, and then the data is normalized. Afterwards a boxplot is constructed.
ruby <- droplevels(subset(data, Suite == "ruby-image-libs" & Iteration > 10))
# normalize measurements for each benchmark to the unoptimized version
norm_ruby <- ddply(ruby, ~ Benchmark + Suite, transform,
SpeedUp = geometric.mean(Value[VM == "JRuby-meta-uncached"]) / Value)
norm_opt <- droplevels(subset(norm_ruby, VM == "JRuby"))
# create boxplot
p <- simple_boxplot(norm_opt, "JRuby", y = "SpeedUp")
p <- p + scale_y_continuous(limits=c(9.8,20), name="Speedup over unoptimized\n(higher is better)") + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=0.5))
p
Figure 7 is constructed in the same way.
First, we discard the warmup iterations, then we normalize the data, adapt names for the paper, drop the baseline from the data set, and finally construct the boxplot.
rtruffle <- "RTruffleSOM"
truffle <- "TruffleSOM.os"
# discard warmup iterations
refl <- droplevels(subset(data, Suite == "reflection" & (VM == rtruffle | VM == truffle) & Iteration >= 50 & Iteration <= 130))
prox <- droplevels(subset(data, Suite == "proxy" & (VM == rtruffle | VM == truffle) & Iteration >= 50 & Iteration <= 130))
# normalize data
norm_refl <- ddply(refl, ~ VM + Suite, transform,
RuntimeRatio = Value / geometric.mean(Value[Benchmark == "DirectAdd"]))
norm_prox <- ddply(prox, ~ VM + Suite, transform,
RuntimeRatio = Value / geometric.mean(Value[Benchmark == "IndirectAdd"]))
norm_both <- rbind(norm_refl, norm_prox)
# beautify names
levels(norm_both$VM) <- map_names(levels(norm_both$VM),
list("RTruffleSOM" = "SOM[MT]",
"TruffleSOM.os" = "SOM[PE]"))
# Show only the reflective version
norm_both <- droplevels(subset(norm_both, Benchmark != "DirectAdd" & Benchmark != "IndirectAdd"))
# Construct boxplot
p <- ggplot(norm_both, aes(x = Benchmark, y = RuntimeRatio))
p <- p + facet_grid(~VM, labeller = label_parsed)
p <- p + geom_hline(yintercept = 1, linetype = "dashed")
p <- p + geom_boxplot(outlier.size = 0.9) + theme_simple()
p <- p + scale_y_continuous(name="Runtime normalized to\nnon-reflective operation") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust=0.5),
panel.border = element_rect(colour = "black", fill = NA))
p