Software & Apps

Peephole optimization: add `opt_respond_to` to Ruby VM, part 4

on The Ruby Syntax Holy Grail: addition opt_respond_to on the Ruby VM, part 3I found what I call the “Holy Grail” of Ruby syntax. I’m exaggerating this, but it’s a readable, step-by-step way to see how a large part of Ruby syntax is compiled. Here’s a snippet of it as a reminder:

// prism_compile.c
static void
pm_compile_node(rb_iseq_t *iseq, const pm_node_t *node, LINK_ANCHOR *const ret, bool popped, pm_scope_node_t *scope_node)
{
    const pm_parser_t *parser = scope_node->parser;
    //...
    switch (PM_NODE_TYPE(node)) {
      //...
      case PM_ARRAY_NODE: {
        // (foo, bar, baz)
        // ^^^^^^^^^^^^^^^
        const pm_array_node_t *cast = (const pm_array_node_t *) node;
        pm_compile_array_node(iseq, (const pm_node_t *) cast, &cast->elements, &location, ret, popped, scope_node);
        return;
      }
      //...
      case PM_MODULE_NODE: {
        // module Foo; end
        //...
      }
      //...
}

The file that the code resides in, prism_compile.cis the very big. pm_compile_node 1800+ lines, and the total file 11 THOUSANDS lines. It’s scary to say the least, but there are some obvious directions I can ignore – I’m trying to optimize a method call to respond_to?so I can avoid most of the Ruby syntax.

However, where should I go, specifically?

Sage wisdom

Helpfully, I got two identical sets of directions based on part 3. One from Kevin Newtonmaker of Prism:

https://x.com/kddnewton/status/1872280281409105925?s=46

And one from byrootwho inspired this entire series:

https://bsky.app/profile/byroot.bsky.social/post/3le6xypzykc2x

I don’t want to jump to conclusions, but I think I need to look into the peephole optimizer 😆.

And what exactly is the a “peephole optimizer”? Kevin describes the process as “specialization comes after compilation”. from Wikipedia:

Peephole optimization is an optimization technique performed on a small set of instructions created by the compiler, known as a peephole or window, which involves replacing the instructions with a logical equivalent which set with better performance.
https://en.wikipedia.org/wiki/Peephole_optimization

It seems to fit my goal. I want to replace the current one opt_send_without_block teaching with a specialist opt_respond_to teaching, optimized for respond_to? method call.

Finding the optimizer

So where are the peephole optimizations happening in CRuby today? on Etienneof PRhe added the optimization code to a function called… iseq_peephole_optimize. A bit on the nose, isn’t it? Kevin commented ALSO MENTION iseq_peephole_optimize – seems to have won.

I want to make the connection between iseq_peephole_optimize and where we left off pm_compile_node. Let’s check some code!

Disassemble current optimization

I will use Étienne’s frozen array optimization to get to the optimizer and see what it has to do with it. If you want to follow along, start the setup instructions from part 3.

His optimization only applies to array and hash literals that are frozen. So we’re going to write a teensy Ruby program to demonstrate, and put it together test.rb in the root of our CRuby project:

The best way to run test.rb here will be used make. This will not only run the file, but also make sure that things like C files are recompiled if necessary when you make changes. Let’s run our file, but drop the instructions it generates for the Ruby VM:

RUNOPT0=--dump=insns make runruby

RUNOPT0 we will add an option to ruby call, that’s it effective ruby --dump=insns test.rb. Here are the instructions we saw – we can confirm that we got the optimized one opt_ary_freeze tutorial from Étienne PR:

== disasm: #./test.rb:3 (3,0)-(3,12)>
0000 putself                      (   3)(Li)
0001 opt_ary_freeze               (), 
0004 opt_send_without_block       
0006 leave

You won’t know what the code actually does until you run it. For now, I’m just reading and navigating through the CRuby source. iseq_peephole_optimize live in compile.c – let’s set a breakpoint and see 🕵🏼‍♂️.

Using the debugger

We can debug C code in CRuby almost as easily as we can use a debugger/binding.pry.

For MacOS, you can use lldband for Docker/Linux, you can use gdb. I will do everything lldb to begin with, but I will show some equivalent commands for gdb afterwards.

Let’s start by looking at the peephole optimization code for ().freezewithin the iseq_peephole_optimize. I’ll add comments above each line to explain what I think it does:

// compile.c
static int
iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcallopt)
{
         // ...
         // if the instruction is a `newarray` of zero length
3469:    if (IS_INSN_ID(iobj, newarray) && iobj->operands(0) == INT2FIX(0)) {
             // grab the next element after the current instruction
3470:        LINK_ELEMENT *next = iobj->link.next;
             // if `next` is an instruction, and the instruction is `send`
3471:        if (IS_INSN(next) && (IS_INSN_ID(next, send))) {
3472:            const struct rb_callinfo *ci = (struct rb_callinfo *)OPERAND_AT(next, 0);
3473:            const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474:
                 // if the callinfo is "simple", with zero arguments,
                 // and there isn't a block provided(?), and the method id (mid) is `freeze`
                 // which is represented by `idFreeze`
3475:            if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
                     // change the instruction to `opt_ary_freeze`
3476:                iobj->insn_id = BIN(opt_ary_freeze);
                     // remove the `send` instruction, we don't need it anymore
3481:                ELEM_REMOVE(next);

Now I will use it lldb to see where this code runs in relation to our prism compilation. In CRuby, to debug you run make lldb-ruby instead of make runruby. You’ll see some setup code run, and then you’ll be left with a prompt, prefixed by (lldb):

> make lldb-ruby
lldb  -o 'command script import -r ../misc/lldb_cruby.py' ruby --  ../test.rb
(lldb) target create "ruby"
Current executable set to '/Users/johncamara/Projects/ruby/build/ruby' (arm64).
(lldb) settings set -- target.run-args  "../test.rb"
(lldb) command script import -r ../misc/lldb_cruby.py
lldb scripts for ruby has been installed.
(lldb)

At this point, we haven’t run at all. We can now set our breakpoint, then run the program. I will add a breakpoint after all if statements succeed:

(lldb) break set --file compile.c --line 3476
Breakpoint 1: where = ruby`iseq_peephole_optimize + 2276 at compile.c:3476:17

At our breakpoint set, we called run to run the program:

You will see something like the following. This runs the program until it hits our breakpoint, after which it detects a frozen array literal:

(lldb) run
Process 50923 launched: '/ruby/build/ruby' (arm64)
Process 50923 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:17
   3473             const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
   3474
   3475             if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
-> 3476                 iobj->insn_id = BIN(opt_ary_freeze);
   3477                 iobj->operand_size = 2;
   3478                 iobj->operands = compile_data_calloc2(iseq, iobj->operand_size, sizeof(VALUE));
   3479                 iobj->operands(0) = rb_cArray_empty_frozen;

I want to see where we are with all our prism compilation code. We can use it bt to get the backtrace:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:29
    frame #1: ruby`iseq_optimize(...) at compile.c:4352:17
    frame #2: ruby`iseq_setup_insn(...) at compile.c:1619:5
    frame #3: ruby`pm_iseq_compile_node(...) at prism_compile.c:10139:5
    frame #4: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
    frame #5: ruby`rb_protect(...) at eval.c:1033:18
    frame #6: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
    frame #7: ruby`pm_new_child_iseq(...) at prism_compile.c:1271:27
    frame #8: ruby`pm_compile_node(...) at prism_compile.c:9458:40
    frame #9: ruby`pm_compile_node(...) at prism_compile.c:9911:17
    frame #10: ruby`pm_compile_scope_node(...) at prism_compile.c:6598:13
    frame #11: ruby`pm_compile_node(...) at prism_compile.c:9784:9
    frame #12: ruby`pm_iseq_compile_node(...) at prism_compile.c:10122:9
    frame #13: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
    frame #14: ruby`rb_protect(...) at eval.c:1033:18
    frame #15: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
    frame #16: ruby`pm_iseq_new_top(...) at iseq.c:906:12
    frame #17: ruby`load_iseq_eval(...) at load.c:756:24
    frame #18: ruby`require_internal(...) at load.c:1296:21
    frame #19: ruby`rb_require_string_internal(...) at load.c:1402:22
    frame #20: ruby`rb_require_string(...) at load.c:1388:12
    frame #21: ruby`rb_f_require(...) at load.c:1029:12
    frame #22: ruby`ractor_safe_call_cfunc_1(...) at vm_insnhelper.c:3624:12
    frame #23: ruby`vm_call_cfunc_with_frame_(...) at vm_insnhelper.c:3801:11
    frame #24: ruby`vm_call_cfunc_with_frame(...) at vm_insnhelper.c:3847:12
    frame #25: ruby`vm_call_cfunc_other(...) at vm_insnhelper.c:3873:16
    frame #26: ruby`vm_call_cfunc(...) at vm_insnhelper.c:3955:12
    frame #27: ruby`vm_call_method_each_type(...) at vm_insnhelper.c:4779:16
    frame #28: ruby`vm_call_method(...) at vm_insnhelper.c:4916:20
    frame #29: ruby`vm_call_general(...) at vm_insnhelper.c:4949:12
    frame #30: ruby`vm_sendish(...) at vm_insnhelper.c:5968:15
    frame #31: ruby`vm_exec_core(...) at insns.def:898:11
    frame #32: ruby`rb_vm_exec(...) at vm.c:2595:22
    frame #33: ruby`rb_iseq_eval(...) at vm.c:2850:11
    frame #34: ruby`rb_load_with_builtin_functions(...) at builtin.c:54:5
    frame #35: ruby`Init_builtin_features at builtin.c:74:5
    frame #36: ruby`ruby_init_prelude at ruby.c:1750:5
    frame #37: ruby`ruby_opt_init(...) at ruby.c:1811:5
    frame #38: ruby`prism_script(...) at ruby.c:2215:13
    frame #39: ruby`process_options(...) at ruby.c:2538:9
    frame #40: ruby`ruby_process_options(...) at ruby.c:3169:12
    frame #41: ruby`ruby_options(...) at eval.c:117:16
    frame #42: ruby`rb_main(...) at main.c:43:26
    frame #43: ruby`main(...) at main.c:68:12

Whoa. That thing is huge! This is not the backtrace I was expecting! I seem to have missed a codepath in my initial explorations. I got it right, until prism_script:

  • main
  • who called rb_main
  • who called ruby_optionsTHEN ruby_process_optionsTHEN process_options
  • who called prism_script
  • The next instruction I expect is pm_iseq_new_mainbut instead we go to ruby_opt_init
  • who called Init_builtin_features

This path seems to pass through some gem preloading logic, so we see the rb_require calls:

void
Init_builtin_features(void)
{
    rb_load_with_builtin_functions("gem_prelude", NULL);
}

By default CRuby loads gem_preludewho lives in ruby/gem_prelude.rb. Here is the file, shortened for brevity:

require 'rubygems'
require 'error_highlight'
require 'did_you_mean'
require 'syntax_suggest/core_ext'

Compile on-the-fly

There is something I learned here that seems obvious at first glance, but I didn’t think about it. Ruby will only compile what it actually is loadedand only to the point that it can be loaded. If I never load a particular piece of code, it never compiles. Or if I defer loading it until later, it won’t compile until later.

We can actually demonstrate this by delaying a requirement:

sleep 10

require "net/http"

If we run it using the make lldb-rubywe see delayed compilation in action:

(lldb) break set --file ruby.c --line 2616
(lldb) run
// hits our prism compile code
(lldb) next
(lldb) break set --file compile.c --line 3476
(lldb) continue
// waits 10 seconds, then compiles the contents of "net/http"

Go to our test.rb file

I prefer to just see my code test.rb compile, so I’ll put a breakpoint directly on the pm_iseq_new_mainwhich for me is in the ruby.c on the line 2616:

(lldb) break set --file ruby.c --line 2616
(lldb) run
Process 32534 launched: '/ruby/build/ruby' (arm64)
Process 32534 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: ruby`process_options(...) at ruby.c:2616:38
   2613         if (!result.ast) {
   2614             pm_parse_result_t *pm = &result.prism;
   2615             int error_state;
-> 2616             iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);
   2617
   2618             pm_parse_result_free(pm);
   2619

Now when we run the backtrace I see what I expected, because we skipped the gem_prelude compilation. This is the exact flow I went through part 2:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: ruby`process_options(...) at ruby.c:2616:38
    frame #1: ruby`ruby_process_options(...) at ruby.c:3169:12
    frame #2: ruby`ruby_options(...) at eval.c:117:16
    frame #3: ruby`rb_main(...) at main.c:43:26
    frame #4: ruby`main(...) at main.c:68:12

From here, we can set our iseq_peephole_optimize breakpoint and just watch our specific code compile. Since we are already in the running program, we called continue to continue implementation:

(lldb) break set --file compile.c --line 3476
Breakpoint 2: where = ruby`iseq_peephole_optimize + 2276 at compile.c:3476:17
(lldb) continue
Process 55336 resuming
Process 55336 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: ruby`iseq_peephole_optimize() at compile.c:3476:17
   3473             const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
   3474
   3475             if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
-> 3476                 iobj->insn_id = BIN(opt_ary_freeze);
   3477                 iobj->operand_size = 2;
   3478                 iobj->operands = compile_data_calloc2(iseq, iobj->operand_size, sizeof(VALUE));
   3479                 iobj->operands(0) = rb_cArray_empty_frozen;

If we are called bt from here to get the backtrace, we finally see the connection between prism_compile.c and compile.c. pm_iseq_compile_node calls iseq_setup_insnwhich runs the optimization logic. In the previous post, I saw iseq_setup_insnbut I don’t know what it means or what it does. Now we know. This is what Kevin Newton mentioned earlier: specialization comes after compilation. Prism compiles the node in the standard way, then the peephole optimization layer – the specialization – is applied after:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
  * frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:17
    frame #1: ruby`iseq_optimize(...) at compile.c:4352:17
    frame #2: ruby`iseq_setup_insn(...) at compile.c:1619:5
    frame #3: ruby`pm_iseq_compile_node(...) at prism_compile.c:10139:5
    frame #4: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
    frame #5: ruby`rb_protect(...) at eval.c:1033:18
    frame #6: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
    frame #7: ruby`pm_iseq_new_main(...) at iseq.c:930:12
    frame #8: ruby`process_options(...) at ruby.c:2616:20
    frame #9: ruby`ruby_process_options(...) at ruby.c:3169:12
    frame #10: ruby`ruby_options(...) at eval.c:117:16
    frame #11: ruby`rb_main(...) at main.c:43:26
    frame #12: ruby`main(...) at main.c:68:12

From here, we can check and see the current instruction being used expr:

(lldb) expr *(iobj)
(INSN) $4 = {
  link = {
    type = ISEQ_ELEMENT_INSN
    next = 0x000000011f6568d0
    prev = 0x000000011f656850
  }
  insn_id = YARVINSN_newarray
  operand_size = 1
  sc_state = 0
  operands = 0x000000011f640118
  insn_info = (line_no = 1, node_id = 3, events = 0)
}

We will see that iobj there is a link to the next tutorial, as well as a insn_id and other metadata. Today’s tutorial YARVINSN_newarray. If we run nextthat should run iobj->insn_id = BIN(opt_ary_freeze);and our instruction should change:

(lldb) next
(lldb) expr *(iobj)
(INSN) $5 = {
  //...
  insn_id = YARVINSN_opt_ary_freeze
  //...
}

It works! The instruction is modified from newarray on opt_ary_freeze! The optimization is at least partially complete (I’m not sure if much else is involved, though).

Take a small step forward opt_respond_to

This is the longest and densest post in the series. But I like to make actual progress toward new instruction. Our pattern match on respond_to? in the peephole optimizer.

Here is our sample program:

puts "Did you know you can write to $stdout?" if $stdout.respond_to?(:write)

Run with RUNOPT0=--dump=insns make runrubywe get the following instructions:

== disasm: #./test.rb:1 (1,0)-(1,76)>
0000 getglobal                              :$stdout                  (   1)(Li)
0002 putobject                              :write
0004 opt_send_without_block                 
0006 branchunless                           14
0008 putself
0009 putchilledstring                       "Did you know you can write to $stdout?"
0011 opt_send_without_block                 
0013 leave
0014 putnil
0015 leave

I want to match this line:

0004 opt_send_without_block       

Here is my test. I will copy what is newarray freeze optimization is to do, and just try to change some things that fit my example. Right under the code we’re debugging newarrayI will add this:

// If the instruction is `send_without_block`, ie `0004 opt_send_without_block`
if (IS_INSN_ID(iobj, send_without_block)) {
    // Pull the same info the `newarray` optimization does
    const struct rb_callinfo *ci = (struct rb_callinfo *)OPERAND_AT(iobj, 0);
    const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(iobj, 1);

    // 
    // 1. We have ARGS_SIMPLE, which is probably what `vm_ci_simple(ci)` checks for
    // 2. We have argc:1, which should match `vm_ci_argc(ci) == 1`
    // 3. We send without a block, hence blockiseq == NULL
    // 4. The method id (mid) for `vm_ci_mid(ci)` matches `idRespond_to`. I searched around for names
    //    that seemed similar to idFreeze, but replacing `idFreeze` with `idRespond` and found `idRespond_to`
    if (vm_ci_simple(ci) && vm_ci_argc(ci) == 1 && blockiseq == NULL && vm_ci_mid(ci) == idRespond_to) {
        int i = 0;
    }
}

Now I will follow the same debugging as before, but I will add a breakpoint compile.c where I added my new code. Specifically, I set a breakpoint at int i = 0; so I’m inside if statement:

(lldb) break set --file ruby.c --line 2616
Breakpoint 1: where = ruby`process_options + 4068 at ruby.c:2616:38
(lldb) run
(lldb) break set --file compile.c --line 3491
Breakpoint 2: where = ruby`iseq_peephole_optimize + 2536 at compile.c:3491:17
(lldb) continue
Process 61925 resuming
Process 61925 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3491:17
   3488         const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(iobj, 1);
   3489
   3490         if (vm_ci_simple(ci) && vm_ci_argc(ci) == 1 && blockiseq == NULL && vm_ci_mid(ci) == idRespond_to) {
-> 3491             int i = 0;
   3492         }
   3493     }
   3494

I think it works! This pattern is matched with the characteristics of respond_to? call, and hit the breakpoint set to int i = 0;. This is a small step, but it is a first step in the direction of increasing optimization.

use gdb

For anyone who wants to do the same job using gdbvery similar. Let’s start by creating a breakpoints.gdb file in the root of your project. This will set you at your initial breakpoint, similar to how we ran lldband set the breakpoint before calling run:

When you run make gdb-rubyyou can use the same backtrace command, bt:

> make gdb-ruby
Thread 1 "ruby" hit Breakpoint 4, process_options (...) at ../ruby.c:2616
2616	            iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);
(gdb) bt
#0  process_options (...) at ../ruby.c:2616
#1  in ruby_process_options (...) at ../ruby.c:3169
#2  in ruby_options (...) at ../eval.c:117
#3  in rb_main (...) at ../main.c:43
#4  in main (...) at ../main.c:68
(gdb)

From here, you can set your next breakpoint so you can see the compilation just for newarray instructions from us test.rb program:

(gdb) break compile.c:3476
Breakpoint 5 at 0xaaaabaa22f14: file ../compile.c, line 3476
(gdb) continue
Continuing.

Thread 1 "ruby" hit Breakpoint 5, iseq_peephole_optimize (...) at ../compile.c:3476
3476	                iobj->insn_id = BIN(opt_ary_freeze);

Similar to lldb messenger exprwe can check the content of local users p or print on gdb:

(gdb) p *(iobj)
$2 = {link = {type = ISEQ_ELEMENT_INSN, next = 0xaaaace797ef0, prev = 0xaaaace797e70}, insn_id = YARVINSN_newarray,
  operand_size = 1, sc_state = 0, operands = 0xaaaace796ac8, insn_info = {line_no = 1, node_id = 3, events = 0}}

after

Ok, this is taking too long. Good thing you joined me! We found the optimizer, and we matched the pattern to our path in a respond_to? call. Next, we need to add the new instruction definition and try to replace the send with our new instruction. See you next time! 👋🏼


https://cdn.uploads.micro.blog/98548/2024/screenshot-2024-12-26-at-12.15.03pm.png

2024-12-28 10:32:00

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button