A general C program has 4 areas in it’s memory space.
The text area is where the code lives, the actual compiled machine code of the program that is running.
the area reserved for static’s and globals is similarly uninteresting for our purposes.
I wrote a bit about the stack here: [rhg-the-stack]
The heap is for actual data that we want to store and retain access to for the
lifetime of the program. we allocate memory here using malloc
and we’re
required to free
it when we’re done.
When you call malloc
you pass as arguments the number of bytes of memory you
wish to allocated and then malloc
returns a pointer to the start of that
memory address.
when you call free
you pass the pointer to a region of memory previously
allocated with malloc
, no value is returned.
Having to do this manually would be super hard in an OO language, where we have webs of objects that are pointing to each other.
so GC is misnamed a bit really - it’s primary job is to manage our use of
memory, so that we don’t have to manually malloc
and free
.
Thinking about things in terms of objects and links between them we can say that some objects are definitely necessary. some examples of these are:
When you categorise objects like this you end up with a series of tree structures, where the root of the tree is a necessary object.
Any object that is not part of the tree (so any object that isn’t depended on in some way by one of our defined “root” nodes) is not necessary for the functioning of the program, and it’s memory can be released for the program to reuse.
This is how the main GC in Ruby works (this was the only GC when this guide was written, these days there is also the transient heap which uses generation GC - I think).
the algorithm looks like
Advantages of Mark & Sweep are:
gc.c
)Disadvantages are:
There are other GC algorithms that the Ruby Hacking guide talks about, but I’m not going to write about them as Ruby doesn’t use them.
An interesting alternative to look at is called Reference counting. This is the main mechanism that Python uses. Although it also implements a backup mark and sweep algorithm in order to free cyclic references.
It’s really important to only consider GC in the context of object management. You can’t keep track of and clean up objects unless the system knows about them.
So this will cause a memory leak:
void not_ok()
{
malloc(1024);
}
whereas this is fine (because rb_ary_new
uses rb_newobj_of
internally, which
is a function in the garbage collector that allocates a new RVALUE
in Ruby’s
object space).
void this_is_file()
{
rb_ary_new();
}
see [rvalue] for more notes.
Every Ruby object is represented by a struct, like RClass
, RFloat
, RString
etc.
To keep things simpler a union of all the structs of builtin objects is declared and that union is the one that’s always used when referring to memory.
Here is a clipped version of that union.
typedef struct RVALUE {
union {
struct {
VALUE flags; /* always 0 for freed obj */
struct RVALUE *prev;
struct RVALUE *next;
} free;
struct RMoved moved;
struct RGarbage garbage;
struct RBasic basic;
struct RObject object;
struct RClass klass;
struct RFloat flonum;
struct RString string;
struct RArray array;
struct RRegexp regexp;
struct RHash hash;
struct RData data;
struct RTypedData typeddata;
struct RStruct rstruct;
struct RBignum bignum;
struct RFile file;
struct RMatch match;
struct RRational rational;
struct RComplex complex;
} as;
} RVALUE;
Ruby Object structure discussed in more detail [rhg-ruby-objects]