MMTk starts in
rb_objspace_alloc in gc.c. Weirdly it initialises the builder
object after the call to
calloc to allocate Ruby’s objectspace. I thought mmtk
was managing allocations - so what’s
objspace used for here?
All the functions in
mmtk.h are part of the MMTk Ruby binding layer. the
mmtk.h headers are autogenerated using
cbindgen from the binding project
it uses an
MMTk_Builder type (defined in
mmtk.h as a
typedef void *),
which it creates using
mmtk_builder_default (again, defined in
It then does a few things
mmtk_builder_set_plan(NoGC, or MarkSweep at the moment)
mmtk_init_binding(not sure what this does yet).
This is defined in
lib/api.rs inside the Ruby bindings and creates a reference
MMTKBuilder object (using
Box), which it then converts into a raw pointer.
MMTkBuilder is a Rust struct in mmtk core (
lib/mmtk.rs). It contains a single
public member options, which is of type Options (this looks like just a
collection of interesting options like which plan we’re using, the number of
threads we’re allowed, and some default settings, I’m going to ignore this for
now and assume there’s no magic going on in here).
MMTkBuilder has an
impl Default - which defines a
default function, all this
does is call
new on the default impl.
new just creats a struct of
MMTkBuilder, with a default
Options instance - so not much interesting happening
This is again in the Ruby binding layer
lib/api.rs. This reads the plan name
we set from the command line option (
MarkSweep), and does what looks
like a constant lookup based on the name
let plan_selector: PlanSelector = plan_name_str.parse::<PlanSelector>().unwrap();
butterfly meme: is this metaprogramming?
then it just assigns the plan name to the options in the
MMTk_Builder object that we built.
This is fully implemented inside Ruby - specifically in
mmtk_max_heap_size. where is this from - this can be passed in
--mmtk-make-heap at runtime, otherwise it’s 0.
Ok, so basically - if we’ve defined a fixed heap size, either using the command
line argument or the environment variable
THIRD_PARTY_HEAP_LIMIT then we set
is_dynamic_heap to false, and we calculate the min and the max sizes.
max size is calculated using
rb_mmtk_parse_heap_limit - this is basically just
trying to convert the string that was passed in, to a size_t.
if nothing was passed then
is_dynamic_heap is set to
true. At this point min and
max are set to the default values (10Mb and 80% of the computers physical memory
These are functions in the ruby binding layer that set a gc_trigger on the options inside the builder
These use a struct called
All this stuff is essentially just setting options on the Ruby heap though - where is the actual heap allocation happening?
this function consumes the
MMTk_Builder object - but it’s return void. so what’s
happening - does it replace the target of the pointer? it’s defined in the Ruby
This function takes the pointer to the builder, as well as some binding options and a static C struct of function pointers called upcalls (which is a Rust type that’s abi bound to a C struct defined in gc.c)
Ruby upcalls contains function pointers to init gc worker threads, stop the world, block and resume mutators. It looks like this is where most of the actual work is triggered from. All these functions are defined inside gc.c.
it calls mmtk_init to return a boxed
MMTK<Ruby> instance - what is this? and
mmtk_static which is a mutable pointer to the boxed MMTk Ruby instance
Box::leak - as per the comments in
memory_manager.rs we use
leak here to give the boxed MMTk pointer a static lifetime rather than a
lifetime that is bound to the box pointer).
It then builds a
RubyBinding with the static instance, the options and a borrow
of the upcalls. and does something with
OnceCell with that binding.
(). So something must be happening with this line of code to give
Ruby access back to the RubyBinding
crate::BINDING .set(binding) .unwrap_or_else(|_| panic!("Binding is already initialized"));
Ok. so BINDING is a pub static OnceCell inside the mmtk ruby bindings. This makes it a singleton object. publicly accessible and can be assigned to at most once. But how is this exposed? is it in mmtk.h in the ruby project?
actually does this even need to be exposed to the Ruby side. This is probably just an MMTk object that’s used within the binding and MMTk core right?
Ok, so BINDING is a global, and we set an instance of the Ruby Binding into it, and then unwrap it because reasons.
Ok, so now we know how the builder is initialized and builds an MMTk instance we should look into how the MMTk system is actually being initialized.
First, this function takes a parameter, which is an immutable borrow of the
builder object, and it returns a boxed pointer of MMTk
RubyBinding is an object that wraps the MMTk instance, the binding options, the upcalls and the plan name - the VMBinding trait.
VMBinding’s job seems to be to define some generic types for the binding, in this case it only really seems to care about VMEdge (type aliased to RubyEdge), and VMMemorySlice (type aliased to RubyMemorySlice).
RubyEdge according to the comments, doesn’t really matter right now as we don’t do edge enqueing (whatever that means). and Ruby memory slice is actually just another type alias for UnimplementedMemorySlice.
aside from some logger shenanigans, this just calls back into builder.build and then boxes the result.
straight up does MMTK::new passing in a reference counted clone of the options struct (using Arc).
Ok, now we’re getting somewhere.
firstly, what is SFT_MAP? Apparently it mamages the SFT Table, but what is the SFTTable? Ok - apparently the sft is defined inside mmtk-core. It’s something to do with space specific dynamic dispatch…. I’m just going to ignore this for now and hope it’s not important yet. Good start
Then we build a
GCWorkScheduler, passing in the number of configured threads
as a parameter (This is 1 for Ruby).
Then we build a plan. VSCode is type annotating this let binding as
Plan<VM = VM>> and I don’t know what this means. but it uses a factory method
on the plan module:
VM_MAP.boot(). What is
VM_MAP (a global
VMMap that manages the
mapping of spaces to virtual memory ranges). I’ve seen spaces referenced before
What are spaces? They’re mentioned in
Cargo.toml which is surprising to me.
The main source file seems to be
lib/policy/space.rs. I’m pretty sure this is
going to be where the heap is allocated.
These look like the main memory spaces. They’re set up and defined by the Plans I think.
newobj_init0. This code could do with being refactored a bit, but
it looks like we calculate which size pool an object belongs in and then we call
We pass in
GET_THREAD()->mutator - this is a thread on the actual Ruby
rb_thread_struct. It’s been added as part of mmtk. This looks to be just an
MMTk wrapper around the current Ruby thread.
We also pass in the slot size we want to allocate (MMTk is still dealing with slot sizes I think, and not specifically object sizes).
** NOTE ** Wait. What does MMTk do with the actual object data? is it still using malloc and heap allocating strings and shit? I think we need to replace ruby_xmalloc etc with mmtk variants.
The allocation size it uses is the size of the slot, plus and prefix and suffix.
It stores the size pool size in the prefix and returns the offset address for
the actual pointer to the
Then we call
mmtk_post_alloc which I’m not sure what this does. First lets