Peephole optimization: add `opt_respond_to` to Ruby VM, part 4

on The Ruby Syntax Holy Grail: addition opt_respond_to
on the Ruby VM, part 3I found what I call the “Holy Grail” of Ruby syntax. I’m exaggerating this, but it’s a readable, step-by-step way to see how a large part of Ruby syntax is compiled. Here’s a snippet of it as a reminder:
// prism_compile.c
static void
pm_compile_node(rb_iseq_t *iseq, const pm_node_t *node, LINK_ANCHOR *const ret, bool popped, pm_scope_node_t *scope_node)
{
const pm_parser_t *parser = scope_node->parser;
//...
switch (PM_NODE_TYPE(node)) {
//...
case PM_ARRAY_NODE: {
// (foo, bar, baz)
// ^^^^^^^^^^^^^^^
const pm_array_node_t *cast = (const pm_array_node_t *) node;
pm_compile_array_node(iseq, (const pm_node_t *) cast, &cast->elements, &location, ret, popped, scope_node);
return;
}
//...
case PM_MODULE_NODE: {
// module Foo; end
//...
}
//...
}
The file that the code resides in, prism_compile.c
is the very big. pm_compile_node
1800+ lines, and the total file 11 THOUSANDS lines. It’s scary to say the least, but there are some obvious directions I can ignore – I’m trying to optimize a method call to respond_to?
so I can avoid most of the Ruby syntax.
However, where should I go, specifically?
Sage wisdom
Helpfully, I got two identical sets of directions based on part 3. One from Kevin Newtonmaker of Prism:
And one from byrootwho inspired this entire series:
https://bsky.app/profile/byroot.bsky.social/post/3le6xypzykc2x
I don’t want to jump to conclusions, but I think I need to look into the peephole optimizer 😆.
And what exactly is the a “peephole optimizer”? Kevin describes the process as “specialization comes after compilation”. from Wikipedia:
Peephole optimization is an optimization technique performed on a small set of instructions created by the compiler, known as a peephole or window, which involves replacing the instructions with a logical equivalent which set with better performance.
https://en.wikipedia.org/wiki/Peephole_optimization
It seems to fit my goal. I want to replace the current one opt_send_without_block
teaching with a specialist opt_respond_to
teaching, optimized for respond_to?
method call.
Finding the optimizer
So where are the peephole optimizations happening in CRuby today? on Etienneof PRhe added the optimization code to a function called… iseq_peephole_optimize
. A bit on the nose, isn’t it? Kevin commented ALSO MENTION iseq_peephole_optimize
– seems to have won.
I want to make the connection between iseq_peephole_optimize
and where we left off pm_compile_node
. Let’s check some code!
Disassemble current optimization
I will use Étienne’s frozen array optimization to get to the optimizer and see what it has to do with it. If you want to follow along, start the setup instructions from part 3.
His optimization only applies to array and hash literals that are frozen. So we’re going to write a teensy Ruby program to demonstrate, and put it together test.rb
in the root of our CRuby project:
The best way to run test.rb
here will be used make
. This will not only run the file, but also make sure that things like C files are recompiled if necessary when you make changes. Let’s run our file, but drop the instructions it generates for the Ruby VM:
RUNOPT0=--dump=insns make runruby
RUNOPT0
we will add an option to ruby
call, that’s it effective ruby --dump=insns test.rb
. Here are the instructions we saw – we can confirm that we got the optimized one opt_ary_freeze
tutorial from Étienne PR:
== disasm: #./test.rb:3 (3,0)-(3,12)>
0000 putself ( 3)(Li)
0001 opt_ary_freeze (),
0004 opt_send_without_block
0006 leave
You won’t know what the code actually does until you run it. For now, I’m just reading and navigating through the CRuby source. iseq_peephole_optimize
live in compile.c
– let’s set a breakpoint and see 🕵🏼♂️.
Using the debugger
We can debug C code in CRuby almost as easily as we can use a debugger
/binding.pry
.
For MacOS, you can use lldb
and for Docker/Linux, you can use gdb
. I will do everything lldb
to begin with, but I will show some equivalent commands for gdb
afterwards.
Let’s start by looking at the peephole optimization code for ().freeze
within the iseq_peephole_optimize
. I’ll add comments above each line to explain what I think it does:
// compile.c
static int
iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcallopt)
{
// ...
// if the instruction is a `newarray` of zero length
3469: if (IS_INSN_ID(iobj, newarray) && iobj->operands(0) == INT2FIX(0)) {
// grab the next element after the current instruction
3470: LINK_ELEMENT *next = iobj->link.next;
// if `next` is an instruction, and the instruction is `send`
3471: if (IS_INSN(next) && (IS_INSN_ID(next, send))) {
3472: const struct rb_callinfo *ci = (struct rb_callinfo *)OPERAND_AT(next, 0);
3473: const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474:
// if the callinfo is "simple", with zero arguments,
// and there isn't a block provided(?), and the method id (mid) is `freeze`
// which is represented by `idFreeze`
3475: if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
// change the instruction to `opt_ary_freeze`
3476: iobj->insn_id = BIN(opt_ary_freeze);
// remove the `send` instruction, we don't need it anymore
3481: ELEM_REMOVE(next);
Now I will use it lldb
to see where this code runs in relation to our prism compilation. In CRuby, to debug you run make lldb-ruby
instead of make runruby
. You’ll see some setup code run, and then you’ll be left with a prompt, prefixed by (lldb)
:
> make lldb-ruby
lldb -o 'command script import -r ../misc/lldb_cruby.py' ruby -- ../test.rb
(lldb) target create "ruby"
Current executable set to '/Users/johncamara/Projects/ruby/build/ruby' (arm64).
(lldb) settings set -- target.run-args "../test.rb"
(lldb) command script import -r ../misc/lldb_cruby.py
lldb scripts for ruby has been installed.
(lldb)
At this point, we haven’t run at all. We can now set our breakpoint, then run the program. I will add a breakpoint after all if
statements succeed:
(lldb) break set --file compile.c --line 3476
Breakpoint 1: where = ruby`iseq_peephole_optimize + 2276 at compile.c:3476:17
At our breakpoint set, we called run
to run the program:
You will see something like the following. This runs the program until it hits our breakpoint, after which it detects a frozen array literal:
(lldb) run
Process 50923 launched: '/ruby/build/ruby' (arm64)
Process 50923 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:17
3473 const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474
3475 if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
-> 3476 iobj->insn_id = BIN(opt_ary_freeze);
3477 iobj->operand_size = 2;
3478 iobj->operands = compile_data_calloc2(iseq, iobj->operand_size, sizeof(VALUE));
3479 iobj->operands(0) = rb_cArray_empty_frozen;
I want to see where we are with all our prism compilation code. We can use it bt
to get the backtrace:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:29
frame #1: ruby`iseq_optimize(...) at compile.c:4352:17
frame #2: ruby`iseq_setup_insn(...) at compile.c:1619:5
frame #3: ruby`pm_iseq_compile_node(...) at prism_compile.c:10139:5
frame #4: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
frame #5: ruby`rb_protect(...) at eval.c:1033:18
frame #6: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
frame #7: ruby`pm_new_child_iseq(...) at prism_compile.c:1271:27
frame #8: ruby`pm_compile_node(...) at prism_compile.c:9458:40
frame #9: ruby`pm_compile_node(...) at prism_compile.c:9911:17
frame #10: ruby`pm_compile_scope_node(...) at prism_compile.c:6598:13
frame #11: ruby`pm_compile_node(...) at prism_compile.c:9784:9
frame #12: ruby`pm_iseq_compile_node(...) at prism_compile.c:10122:9
frame #13: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
frame #14: ruby`rb_protect(...) at eval.c:1033:18
frame #15: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
frame #16: ruby`pm_iseq_new_top(...) at iseq.c:906:12
frame #17: ruby`load_iseq_eval(...) at load.c:756:24
frame #18: ruby`require_internal(...) at load.c:1296:21
frame #19: ruby`rb_require_string_internal(...) at load.c:1402:22
frame #20: ruby`rb_require_string(...) at load.c:1388:12
frame #21: ruby`rb_f_require(...) at load.c:1029:12
frame #22: ruby`ractor_safe_call_cfunc_1(...) at vm_insnhelper.c:3624:12
frame #23: ruby`vm_call_cfunc_with_frame_(...) at vm_insnhelper.c:3801:11
frame #24: ruby`vm_call_cfunc_with_frame(...) at vm_insnhelper.c:3847:12
frame #25: ruby`vm_call_cfunc_other(...) at vm_insnhelper.c:3873:16
frame #26: ruby`vm_call_cfunc(...) at vm_insnhelper.c:3955:12
frame #27: ruby`vm_call_method_each_type(...) at vm_insnhelper.c:4779:16
frame #28: ruby`vm_call_method(...) at vm_insnhelper.c:4916:20
frame #29: ruby`vm_call_general(...) at vm_insnhelper.c:4949:12
frame #30: ruby`vm_sendish(...) at vm_insnhelper.c:5968:15
frame #31: ruby`vm_exec_core(...) at insns.def:898:11
frame #32: ruby`rb_vm_exec(...) at vm.c:2595:22
frame #33: ruby`rb_iseq_eval(...) at vm.c:2850:11
frame #34: ruby`rb_load_with_builtin_functions(...) at builtin.c:54:5
frame #35: ruby`Init_builtin_features at builtin.c:74:5
frame #36: ruby`ruby_init_prelude at ruby.c:1750:5
frame #37: ruby`ruby_opt_init(...) at ruby.c:1811:5
frame #38: ruby`prism_script(...) at ruby.c:2215:13
frame #39: ruby`process_options(...) at ruby.c:2538:9
frame #40: ruby`ruby_process_options(...) at ruby.c:3169:12
frame #41: ruby`ruby_options(...) at eval.c:117:16
frame #42: ruby`rb_main(...) at main.c:43:26
frame #43: ruby`main(...) at main.c:68:12
Whoa. That thing is huge! This is not the backtrace I was expecting! I seem to have missed a codepath in my initial explorations. I got it right, until prism_script
:
main
- who called
rb_main
- who called
ruby_options
THENruby_process_options
THENprocess_options
- who called
prism_script
- The next instruction I expect is
pm_iseq_new_main
but instead we go toruby_opt_init
- who called
Init_builtin_features
This path seems to pass through some gem preloading logic, so we see the rb_require
calls:
void
Init_builtin_features(void)
{
rb_load_with_builtin_functions("gem_prelude", NULL);
}
By default CRuby loads gem_prelude
who lives in ruby/gem_prelude.rb
. Here is the file, shortened for brevity:
require 'rubygems'
require 'error_highlight'
require 'did_you_mean'
require 'syntax_suggest/core_ext'
Compile on-the-fly
There is something I learned here that seems obvious at first glance, but I didn’t think about it. Ruby will only compile what it actually is loadedand only to the point that it can be loaded. If I never load a particular piece of code, it never compiles. Or if I defer loading it until later, it won’t compile until later.
We can actually demonstrate this by delaying a requirement:
sleep 10
require "net/http"
If we run it using the make lldb-ruby
we see delayed compilation in action:
(lldb) break set --file ruby.c --line 2616
(lldb) run
// hits our prism compile code
(lldb) next
(lldb) break set --file compile.c --line 3476
(lldb) continue
// waits 10 seconds, then compiles the contents of "net/http"
Go to our test.rb file
I prefer to just see my code test.rb
compile, so I’ll put a breakpoint directly on the pm_iseq_new_main
which for me is in the ruby.c
on the line 2616
:
(lldb) break set --file ruby.c --line 2616
(lldb) run
Process 32534 launched: '/ruby/build/ruby' (arm64)
Process 32534 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: ruby`process_options(...) at ruby.c:2616:38
2613 if (!result.ast) {
2614 pm_parse_result_t *pm = &result.prism;
2615 int error_state;
-> 2616 iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);
2617
2618 pm_parse_result_free(pm);
2619
Now when we run the backtrace I see what I expected, because we skipped the gem_prelude
compilation. This is the exact flow I went through part 2:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: ruby`process_options(...) at ruby.c:2616:38
frame #1: ruby`ruby_process_options(...) at ruby.c:3169:12
frame #2: ruby`ruby_options(...) at eval.c:117:16
frame #3: ruby`rb_main(...) at main.c:43:26
frame #4: ruby`main(...) at main.c:68:12
From here, we can set our iseq_peephole_optimize
breakpoint and just watch our specific code compile. Since we are already in the running program, we called continue
to continue implementation:
(lldb) break set --file compile.c --line 3476
Breakpoint 2: where = ruby`iseq_peephole_optimize + 2276 at compile.c:3476:17
(lldb) continue
Process 55336 resuming
Process 55336 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
frame #0: ruby`iseq_peephole_optimize() at compile.c:3476:17
3473 const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474
3475 if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
-> 3476 iobj->insn_id = BIN(opt_ary_freeze);
3477 iobj->operand_size = 2;
3478 iobj->operands = compile_data_calloc2(iseq, iobj->operand_size, sizeof(VALUE));
3479 iobj->operands(0) = rb_cArray_empty_frozen;
If we are called bt
from here to get the backtrace, we finally see the connection between prism_compile.c
and compile.c
. pm_iseq_compile_node
calls iseq_setup_insn
which runs the optimization logic. In the previous post, I saw iseq_setup_insn
but I don’t know what it means or what it does. Now we know. This is what Kevin Newton mentioned earlier: specialization comes after compilation. Prism compiles the node in the standard way, then the peephole optimization layer – the specialization – is applied after:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
* frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:17
frame #1: ruby`iseq_optimize(...) at compile.c:4352:17
frame #2: ruby`iseq_setup_insn(...) at compile.c:1619:5
frame #3: ruby`pm_iseq_compile_node(...) at prism_compile.c:10139:5
frame #4: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
frame #5: ruby`rb_protect(...) at eval.c:1033:18
frame #6: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
frame #7: ruby`pm_iseq_new_main(...) at iseq.c:930:12
frame #8: ruby`process_options(...) at ruby.c:2616:20
frame #9: ruby`ruby_process_options(...) at ruby.c:3169:12
frame #10: ruby`ruby_options(...) at eval.c:117:16
frame #11: ruby`rb_main(...) at main.c:43:26
frame #12: ruby`main(...) at main.c:68:12
From here, we can check and see the current instruction being used expr
:
(lldb) expr *(iobj)
(INSN) $4 = {
link = {
type = ISEQ_ELEMENT_INSN
next = 0x000000011f6568d0
prev = 0x000000011f656850
}
insn_id = YARVINSN_newarray
operand_size = 1
sc_state = 0
operands = 0x000000011f640118
insn_info = (line_no = 1, node_id = 3, events = 0)
}
We will see that iobj
there is a link to the next tutorial, as well as a insn_id
and other metadata. Today’s tutorial YARVINSN_newarray
. If we run next
that should run iobj->insn_id = BIN(opt_ary_freeze);
and our instruction should change:
(lldb) next
(lldb) expr *(iobj)
(INSN) $5 = {
//...
insn_id = YARVINSN_opt_ary_freeze
//...
}
It works! The instruction is modified from newarray
on opt_ary_freeze
! The optimization is at least partially complete (I’m not sure if much else is involved, though).
Take a small step forward opt_respond_to
This is the longest and densest post in the series. But I like to make actual progress toward new instruction. Our pattern match on respond_to?
in the peephole optimizer.
Here is our sample program:
puts "Did you know you can write to $stdout?" if $stdout.respond_to?(:write)
Run with RUNOPT0=--dump=insns make runruby
we get the following instructions:
== disasm: #./test.rb:1 (1,0)-(1,76)>
0000 getglobal :$stdout ( 1)(Li)
0002 putobject :write
0004 opt_send_without_block
0006 branchunless 14
0008 putself
0009 putchilledstring "Did you know you can write to $stdout?"
0011 opt_send_without_block
0013 leave
0014 putnil
0015 leave
I want to match this line:
0004 opt_send_without_block
Here is my test. I will copy what is newarray
freeze
optimization is to do, and just try to change some things that fit my example. Right under the code we’re debugging newarray
I will add this:
// If the instruction is `send_without_block`, ie `0004 opt_send_without_block`
if (IS_INSN_ID(iobj, send_without_block)) {
// Pull the same info the `newarray` optimization does
const struct rb_callinfo *ci = (struct rb_callinfo *)OPERAND_AT(iobj, 0);
const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(iobj, 1);
//
// 1. We have ARGS_SIMPLE, which is probably what `vm_ci_simple(ci)` checks for
// 2. We have argc:1, which should match `vm_ci_argc(ci) == 1`
// 3. We send without a block, hence blockiseq == NULL
// 4. The method id (mid) for `vm_ci_mid(ci)` matches `idRespond_to`. I searched around for names
// that seemed similar to idFreeze, but replacing `idFreeze` with `idRespond` and found `idRespond_to`
if (vm_ci_simple(ci) && vm_ci_argc(ci) == 1 && blockiseq == NULL && vm_ci_mid(ci) == idRespond_to) {
int i = 0;
}
}
Now I will follow the same debugging as before, but I will add a breakpoint compile.c
where I added my new code. Specifically, I set a breakpoint at int i = 0;
so I’m inside if
statement:
(lldb) break set --file ruby.c --line 2616
Breakpoint 1: where = ruby`process_options + 4068 at ruby.c:2616:38
(lldb) run
(lldb) break set --file compile.c --line 3491
Breakpoint 2: where = ruby`iseq_peephole_optimize + 2536 at compile.c:3491:17
(lldb) continue
Process 61925 resuming
Process 61925 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3491:17
3488 const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(iobj, 1);
3489
3490 if (vm_ci_simple(ci) && vm_ci_argc(ci) == 1 && blockiseq == NULL && vm_ci_mid(ci) == idRespond_to) {
-> 3491 int i = 0;
3492 }
3493 }
3494
I think it works! This pattern is matched with the characteristics of respond_to?
call, and hit the breakpoint set to int i = 0;
. This is a small step, but it is a first step in the direction of increasing optimization.
use gdb
For anyone who wants to do the same job using gdb
very similar. Let’s start by creating a breakpoints.gdb
file in the root of your project. This will set you at your initial breakpoint, similar to how we ran lldb
and set the breakpoint before calling run
:
When you run make gdb-ruby
you can use the same backtrace command, bt
:
> make gdb-ruby
Thread 1 "ruby" hit Breakpoint 4, process_options (...) at ../ruby.c:2616
2616 iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);
(gdb) bt
#0 process_options (...) at ../ruby.c:2616
#1 in ruby_process_options (...) at ../ruby.c:3169
#2 in ruby_options (...) at ../eval.c:117
#3 in rb_main (...) at ../main.c:43
#4 in main (...) at ../main.c:68
(gdb)
From here, you can set your next breakpoint so you can see the compilation just for newarray
instructions from us test.rb
program:
(gdb) break compile.c:3476
Breakpoint 5 at 0xaaaabaa22f14: file ../compile.c, line 3476
(gdb) continue
Continuing.
Thread 1 "ruby" hit Breakpoint 5, iseq_peephole_optimize (...) at ../compile.c:3476
3476 iobj->insn_id = BIN(opt_ary_freeze);
Similar to lldb
messenger expr
we can check the content of local users p
or print
on gdb
:
(gdb) p *(iobj)
$2 = {link = {type = ISEQ_ELEMENT_INSN, next = 0xaaaace797ef0, prev = 0xaaaace797e70}, insn_id = YARVINSN_newarray,
operand_size = 1, sc_state = 0, operands = 0xaaaace796ac8, insn_info = {line_no = 1, node_id = 3, events = 0}}
after
Ok, this is taking too long. Good thing you joined me! We found the optimizer, and we matched the pattern to our path in a respond_to?
call. Next, we need to add the new instruction definition and try to replace the send
with our new instruction. See you next time! 👋🏼
https://cdn.uploads.micro.blog/98548/2024/screenshot-2024-12-26-at-12.15.03pm.png
2024-12-28 10:32:00