Closures Part III - The Compiler

The Squeak bytecode compiler is a bijou jewel. It has flaws in some facets but on the whole it is small and beautiful. OK, its a Curate’s Egg. Good in parts. Luckily for me Vassili Bykov has already written the perfect introduction to the compiler that will help you understand the rest of this post if you’re unfamiliar with the compiler. So if you want to see how technical texts should properly be written please peruse Vassili’s “The Hitch Hiker’s Guide to the Smalltalk Compiler

A Pluggable Compiler Back-End

I have two requirements for the compiler. One is that it compile closures using the bytecode design already presented. The other is that the compiler be bytecode-set agnostic. I want to be able to easily migrate to a new bytecode set with a different encoding. As we saw in the post on the bytecodes the bytecode set contains long forms and short forms. One simple way of migrating the bytecode set is to recompile the system using only the long forms, this means one can reassign the unused short forms as one sees fit. Its actually a five step dance:

1a recompile to the long form of the existing set
1b move to a VM that has the long form of the existing set and the long form of the new set
2a recompile to the long form of the new set
2b move to a VM that has the new bytecode set
3a recompile to the new bytecode set, short forms and all

Doing this easily means being able to plug different bytecode set encodings into the compiler easily. But if you look at the existing compiler the parse nodes have the instruction set embedded in them, which makes this very hard. For example here is the pair of methods that generate a store for either an instance or a temporary variable:

VariableNode methods for code generation
emitStorePop: stack on: strm
        (code
between: 0 and: 7)
                 
ifTrue:
                          [
strm nextPut: ShortStoP + code "short stopop inst"]
                 
ifFalse:
                          [
(code between: 16 and: 23)
                                   
ifTrue: [strm nextPut: ShortStoP + 8 + code - 16 "short stopop temp"]
                                   
ifFalse: [(code >= 256 and: [code \\ 256 > 63 and: [code // 256 = 4]])
                                                     
ifTrue: [self emitLong: Store on: strm. strm nextPut: Pop]
                                                     
ifFalse: [self emitLong: StorePop on: strm]]].
        
stack pop: 1

This is difficult to understand, and is very hard to change. So the first thing to do is relace the back-end with something more readable and flexible. In the existing compiler class Encoder is responsble for encoding literals into literal indices and variable names into parse nodes. Let’s extend it to encode the bytecode set as well. Since I need to support multiple bytecode sets we’ll add a subclass for each set. We can use an abstract class to hold common code. Here’s the hierarchy:

        Encoder - the existing encoder, unchanged, still responsible for encoding literals and variables
                 BytecodeEncoder - the abstract superclass of the bytecode set encoders
                          EncoderForV3 - the encoder for the existing bytecode set, V3 because it is the bytecode set of Squeak3.x
                                   EncoderForV3PlusClosures - the encoder that adds the closure opcodes
                          EncoderForLongFormV3 - the encoder that outputs only the long-forms of the V3 bytecodes
                                   EncoderForLongFormV3PlusClosures - the encoder that adds closure opcodes to the long-form

BytecodeEncoder defines a stream instance variable which holds either a scratch buffer or the target method to generate code into. Very sharp of you to notice that I could have, arguably should have, made EncoderForV3 a subclass of EncoderForLongFormV3 because EncoderForV3 adds short-form encodings to the long forms and could inherit the code for outputting the long forms. But if I do that I can’t easily eliminate the long-form encoder once it is no longer needed. So in this case I thought it better to keep the classes entirely separate.

BytecodeEncoder and the closure encoders can be queried by the rest of the compiler whether to use closure compilation or not:

BytecodeEncoder methods for testing
supportsClosureOpcodes
        
"Answer if the receiver supports the
                 genPushNewArray:/genPushConsArray:
                 genPushRemoteTemp:inVectorAt:
                 genStoreRemoteTemp:inVectorAt:
                 genStorePopRemoteTemp:inVectorAt:
                 genPushClosureCopyCopiedValues:numArgs:jumpSize:
         opcodes"

        
^false

EncoderForV3PlusClosures & EncoderForLongFormV3PlusClosures methods for testing
supportsClosureOpcodes
        
^true

The bulk of interface between these bytecode encoders and the parse nodes is an api for sizing and emitting abstract opcodes. Each abstract opcode has a method defined in BytecodeEncoder for computing how many bytes the opcode will generate and a method defined in one of the concrete subclasses for emitting the bytecode encoding of the opcode that the concrete class determines. The compiler needs to be able to size opcodes before they’re generated so as to be able to generate the right offsets for jump and block creation opcodes.

In the existing compiler each emit* method has a corresponding size* method that computes how many bytes the emit method will produce. For example, here’s the sizer for the emitter above

VariableNode methods for code generation
sizeForStorePop: encoder
        
self reserve: encoder.
        (code
< 24 and: [code noMask: 8]) ifTrue: [^ 1].
        code
< 256 ifTrue: [^ 2].
        code
\\ 256 <= 63 ifTrue: [^ 2]. "extended StorePop"
        code
// 256 = 1 ifTrue: [^ 3]. "dbl extended StorePopInst"
        code
// 256 = 4 ifTrue: [^ 4]. "dbl extended StoreLitVar , Pop"
        
self halt. "Shouldn’t get here"

This is error prone and ugly. The two methods are intimately connected. Change one and forget to change the other and the compiler breaks. There is nothing that guarantees that the sizes agree with the opcodes. Instead of having special versions of the sizing methods the new compiler simply outputs the bytecode to a scratch buffer and measures how much code was output:

BytecodeEncoder methods for opcode sizing
sizeOpcodeSelector: genSelector withArguments: args
        stream
ifNil: [stream := WriteStream on: (ByteArray new: 8)].
        stream
position: 0.
        
self perform: genSelector withArguments: args.
        ^stream
position

So a sizing method simply looks like

BytecodeEncoder methods for opcode sizing
sizePushTemp: tempIndex
        
^self sizeOpcodeSelector: #genPushTemp: withArguments: {tempIndex}

The concrete subclasses then define genPushTemp: according to the encoding they choose, i.e.:

EncoderForV3 methods for bytecode generation
genPushTemp: tempIndex
        
"See BlueBook page 596"
        
tempIndex < 0 ifTrue:
                 [
^self outOfRangeError: 'index' index: tempIndex range: 0 to: 63].
        
tempIndex < 16 ifTrue:
                 [
"16-31         0001iiii         Push Temporary Location #iiii"
                  stream
nextPut: 16 + tempIndex.
                 
^self].
        
tempIndex < 64 ifTrue:
                 [
"128         10000000 jjkkkkkk         Push (Receiver Variable, Temporary Location, Literal Constant, Literal Variable) [jj] #kkkkkk"
                  stream
                          
nextPut: 128;
                          
nextPut: 64 + tempIndex.
                 
^self].
        
^self outOfRangeError: ‘index’ index: tempIndex range: 0 to: 63

EncoderForLongFormV3 methods for bytecode generation
genPushTemp: tempIndex
        
"See BlueBook page 596"
        
tempIndex < 0 ifTrue:
                 [
^self outOfRangeError: 'index' index: tempIndex range: 0 to: 63].
        
tempIndex < 64 ifTrue:
                 [
"128         10000000 jjkkkkkk         Push (Receiver Variable, Temporary Location, Literal Constant, Literal Variable) [jj] #kkkkkk"
                  stream
                          
nextPut: 128;
                          
nextPut: 64 + tempIndex.
                 
^self].
        
^self outOfRangeError: ‘index’ index: tempIndex range: 0 to: 63

IMO, there is no need for a separate class method to introduce names for the opcode encodings, We can just use the constants directly. The code is clearer.

We can now list the full opcode set:

EncoderForV3 & EncoderForLongFormV3 methods for opcode generation
genBranchPopFalse: distance
genBranchPopTrue: distance
genDup
genJumpLong:
distance
genJump: distance
genPop
genPushInstVarLong:
instVarIndex
genPushInstVar: instVarIndex
genPushLiteralVar: literalIndex
genPushLiteral: literalIndex
genPushReceiver
genPushSpecialLiteral:
aLiteral
genPushTemp: tempIndex
genPushThisContext
genReturnReceiver
genReturnSpecialLiteral:
aLiteral
genReturnTop
genReturnTopToCaller
genSendSuper:
selectorLiteralIndex numArgs: nArgs
genSend: selectorLiteralIndex numArgs: nArgs
genStoreInstVarLong: instVarIndex
genStoreInstVar: instVarIndex
genStoreLiteralVar: literalIndex
genStorePopInstVarLong: instVarIndex
genStorePopInstVar: instVarIndex
genStorePopLiteralVar: literalIndex
genStorePopTemp: tempIndex
genStoreTemp: tempIndex

EncoderForV3PlusClosures & EncoderForLongFormV3PlusClosures methods for opcode generation
genPushClosureCopyNumCopiedValues: numCopied numArgs: numArgs jumpSize: jumpSize
genPushConsArray: size
genPushNewArray: size
genPushRemoteTemp: tempIndex inVectorAt: tempVectorIndex
genStorePopRemoteTemp: tempIndex inVectorAt: tempVectorIndex
genStoreRemoteTemp: tempIndex inVectorAt: tempVectorIndex

The two forms of jump, variable sized and long, are used by the compiler to generate old-style blocks. By always using a long (two byte) jump to follow the block-creation #blockCopy: primitive the compiler allows blockCopy to work out the starting address for the block’s code being two bytes after the following jump. e.g.:

        19 <70> self
        
20 <89> pushThisContext:
        
21 <76> pushConstant: 1
        
22 <C8> send: blockCopy:
        
23 <A4 08> jumpTo: 33
        
25 <6B>         popIntoTemp: 3
        
26 <11>         pushTemp: 1
        
27 <12>         pushTemp: 2
        
28 <13>         pushTemp: 3
        
29 <F0>         send: value:value:
        
30 <81 42>         storeIntoTemp: 2
        
32 <7D>         blockReturn
        
33 <CB> send: do:
        
34 <87> pop

The long forms of the instance variable access opcodes are an evil hack of mine for Context instance variable access that I’ll explain in a subsequent post on the Stack VM. For the moment forget they exist.

All these methods are very similar. Most are a few lines long. Here are one of the shortest and the longest:

EncoderForV3 & EncoderForLongFormV3 methods for opcode generation
genPop
        
"See BlueBook page 596"
        
"135         10000111         Pop Stack Top"
        stream
nextPut: 135

genSend: selectorLiteralIndex numArgs: nArgs
        
"See BlueBook page 596 (with exceptions for 132 & 134)"
        
nArgs < 0 ifTrue:
                 [
^self outOfRangeError: 'numArgs' index: nArgs range: 0 to: 31 "!!"].
        
selectorLiteralIndex < 0 ifTrue:
                 [
"Special selector sends.
                          176-191         1011iiii         Send Arithmetic Message #iiii
                          192-207         1100iiii         Send Special Message #iiii"

                 
self flag: #yuck.
                 
(selectorLiteralIndex negated between: 176 and: 207) ifFalse:
                          
[^self outOfRangeError: 'special selector code' index: selectorLiteralIndex negated range: 176 to: 207].
                  stream
nextPut: selectorLiteralIndex negated.
                 
^self].
        (
selectorLiteralIndex < 16 and: [nArgs < 3]) ifTrue:
                 [
"         208-223         1101iiii         Send Literal Selector #iiii With No Arguments
                          224-239         1110iiii         Send Literal Selector #iiii With 1 Argument
                          240-255         1111iiii         Send Literal Selector #iiii With 2 Arguments"

                  stream
nextPut: 208 + (nArgs * 16) + selectorLiteralIndex.
                 
^self].
        (
selectorLiteralIndex < 32 and: [nArgs < 8]) ifTrue:
                 [
"         131         10000011 jjjkkkkk         Send Literal Selector #kkkkk With jjj Arguments"
                  stream
                          
nextPut: 131;
                          
nextPut: ((nArgs bitShift: 5) + selectorLiteralIndex).
                 
^self].
        (
selectorLiteralIndex < 64 and: [nArgs < 4]) ifTrue:
                 [
"In Squeak V3
                          134         10000110 jjjjjjjj kkkkkkkk         Send Literal Selector #kkkkkkkk To Superclass With jjjjjjjj Arguments
                  is replaced by
                          134         10000110 jjkkkkkk         Send Literal Selector #kkkkkk With jj Arguments"

                  stream
                          
nextPut: 134;
                          
nextPut: ((nArgs bitShift: 6) + selectorLiteralIndex).
                 
^self].
        (
selectorLiteralIndex < 256 and: [nArgs < 32]) ifTrue:
                 [
"In Squeak V3
                          132         10000100 jjjjjjjj kkkkkkkk         Send Literal Selector #kkkkkkkk With jjjjjjjj Arguments
                  is replaced by
                          132         10000100 ooojjjjj kkkkkkkk
                                   ooo = 0 => Send Literal Selector #kkkkkkkk With jjjjj Arguments
                                   ooo = 1 => Send Literal Selector #kkkkkkkk To Superclass With jjjjj Arguments"

                 stream
                          
nextPut: 132;
                          
nextPut: nArgs;
                          
nextPut: selectorLiteralIndex.
                 
^self].
        
nArgs >= 32 ifTrue:
                 [
^self outOfRangeError: 'numArgs' index: nArgs range: 0 to: 31].
        
selectorLiteralIndex >= 256 ifTrue:
                 [
^self outOfRangeError: 'selector literal index' index: selectorLiteralIndex range: 0 to: 255]

Moving back to the Parse nodes they use the new API through a parallel set of sizers and emitters. By writing a parallel set I am able to a) bootstrap the compiler in place and b) check that the new back-end produces identical code to the old back-end. I generated the parallel set using a poor man’s refactoring browser. I used the Shout syntax highlighter to do typed tokenisation of the old methods and produce identical copies that changed all uses of emit* and size*, replacing them with emitCode*, and sizeCode* so that for example emitForValue:on: maps to emitCodeForValue:encoder: and sizeForValue: maps to sizeCodeForValue:. Once I’d generated the duals I edited them by hand to mate them to the BytecodeEncoder API. For example here are old and new versions of send generation:

SelectorNode methods for code generation
emit: stack args: nArgs on: aStream super: supered
        
| index |
        
stack pop: nArgs.
        (
supered not and: [code - Send < SendLimit and: [nArgs < 3]]) ifTrue:
                 [
"short send"
                 code
< Send
                          
ifTrue: [^ aStream nextPut: code "special"]
                          
ifFalse: [^ aStream nextPut: nArgs * 16 + code]].
        
index := code < 256 ifTrue: [code - Send] ifFalse: [code \\ 256].
        (
index <= 31 and: [nArgs <= 7]) ifTrue:
                 [
"extended (2-byte) send [131 and 133]"
                 
aStream nextPut: SendLong + (supered ifTrue: [2] ifFalse: [0]).
                 
^ aStream nextPut: nArgs * 32 + index].
        (
supered not and: [index <= 63 and: [nArgs <= 3]]) ifTrue:
                 [
"new extended (2-byte) send [134]"
                 
aStream nextPut: SendLong2.
                 
^ aStream nextPut: nArgs * 64 + index].
        
"long (3-byte) send"
        
aStream nextPut: DblExtDoAll.
        
aStream nextPut: nArgs + (supered ifTrue: [32] ifFalse: [0]).
        
aStream nextPut: index

size: encoder args: nArgs super: supered
        
| index |
        
self reserve: encoder.
        (
supered not and: [code - Send < SendLimit and: [nArgs < 3]])
                 
ifTrue: [^1]. "short send"
        (
supered and: [code < Send]) ifTrue:
                 [
"super special:"
                 code
:= self code: (encoder sharableLitIndex: key) type: 5].
        
index := code < 256 ifTrue: [code - Send] ifFalse: [code \\ 256].
        (
index <= 31 and: [nArgs <= 7])
                 
ifTrue: [^ 2]. "medium send"
        (
supered not and: [index <= 63 and: [nArgs <= 3]])
                 
ifTrue: [^ 2]. "new medium send"
        
^ 3 "long send"

SelectorNode methods for code generation (new scheme)
emitCode: stack args: nArgs encoder: encoder super: supered
        
stack pop: nArgs.
        
^supered
                 
ifTrue:
                          [
encoder genSendSuper: index numArgs: nArgs]
                 
ifFalse:
                          [
encoder
                                   
genSend: (code < Send ifTrue: [code negated] ifFalse: [index])
                                   
numArgs: nArgs]

sizeCode: encoder args: nArgs super: supered
        
self reserve: encoder.
        
^supered
                 
ifTrue:
                          [code
< Send "i.e. its a special selector" ifTrue:
                                   
[code := self code: (index := encoder sharableLitIndex: key) type: 5].
                          
encoder sizeSendSuper: index numArgs: nArgs]
                 
ifFalse:
                          [
self flag: #yuck. "special selector sends cause this problem"
                          
encoder
                                   
sizeSend: (code < Send ifTrue: [code negated] ifFalse: [index])
                                   
numArgs: nArgs]

While the new code still has a hack to deal with the special selector sends it is a lot clearer for having moved the details of bytecode encoding into the encoder. The new code for sends is more verbose but, I think, much more comprehensible.

Now that we can change the back-end we need to plug it into the front end. The existing parser hard codes the use of Encoder in Parser>>parse:class:noPattern:notifying:ifFail:

        …

         [methNode _ self method: noPattern context: ctxt encoder: (Encoder new init: class context: ctxt notifying: self)]
on: ParserRemovedUnusedTemps
do:
[ :ex | repeatNeeded _ (requestor isKindOf: TextMorphEditor) not.
myStream _ ReadStream on: requestor text string.
ex resume].
repeatNeeded] whileTrue.

        …

The existing compiler holds onto an encoder instance, which makes sense. The encoder needs to be accessed throughout the front-end to map from names to nodes. With a bit of chicanery we can provide an interface that allows us to supply the class of encoder to use rather than supplying an already instantated encoder. This would be a mistake since encoder instantiation is done in the context of a target class to compile within as the encoder initializes its scope tables from the target class, and this would complicate things for the client.

Let’s change Parser>>parse:class:noPattern:notifying:ifFail: to get the encoder from an accessor:

Parser methods for public access
parse: sourceStream class: class category: aCategory noPattern: noPattern context: ctxt notifying: req ifFail: aBlock
        
"Answer a MethodNode for the argument, sourceStream, that is the root of
         a parse tree. Parsing is done with respect to the argument, class, to find
         instance, class, and pool variables; and with respect to the argument,
         ctxt, to find temporary variables. Errors in parsing are reported to the
         argument, req, if not nil; otherwise aBlock is evaluated. The argument
         noPattern is a Boolean that is true if the the sourceStream does not
         contain a method header (i.e., for DoIts)."

        | methNode repeatNeeded myStream s p |
        category
:= aCategory.
        
myStream := sourceStream.
        [
repeatNeeded := false.
        
p := myStream position.
        
s := myStream upToEnd.
        
myStream position: p.
        
self init: myStream notifying: req failBlock: [^ aBlock value].
         doitFlag
:= noPattern.
         failBlock
:= aBlock.
        
[methNode := self
                                            
method: noPattern
                                            
context: ctxt
                                            
encoder: (self encoder init: class context: ctxt notifying: self)]
                 
on: ReparseAfterSourceEditing
                 
do:         [ :ex |
                          
repeatNeeded := true.
                          
myStream := ReadStream on: requestor text string].
        
repeatNeeded] whileTrue:
                 [encoder
:= self encoder class new].
        
methNode sourceText: s.
        
^methNode

encoder
        encoder
isNil ifTrue:
                 [encoder
:= Encoder new].
        
^encoder

encoderClass: anEncoderClass
        encoder
notNil ifTrue:
                 [
self error: 'encoder already set'].
        encoder
:= anEncoderClass new

and if we needed it we could implement

encoderClass
        
^self encoder class

So now we can experiment, saying things like

        | c |
        c := Compiler new.
        c parser encoderClass: EncoderForLongFormV3.
        c evaluate: ‘3 + 4′ in: nil to: nil

or

        | m |
        m := ((Parser new
                                   encoderClass: EncoderForV3PlusClosures;
                                   parse: ‘foo: n | nfib |
                                                              nfib := [:i| i <= 1 ifTrue: [1] ifFalse: [(nfib value: i - 1) + (nfib value: i - 2) + 1]].
                                                              ^(1 to: n) collect: nfib

                                   class: Object)
                          generate: #(0 0 0 0)).
        ContextPart runSimulated: [#receiver withArgs: #(10) executeMethod: m]

Compiling To Closures

Now we have everything we need to compile to closures. The existing compiler has already made the change to ANSI-style block syntax, supporting block-local temporary declarations, even if it still implements these as method-scope temporaries. As a result we only need to modify the middle of the compiler, the ParseNode hierarchy. Once the parser has generated the parse node tree we need to traverse the tree analysing temporary variable usage. This is best done where methods are actually generated, in MethodNode>>generate:. However, since we have to do this modification in place without breaking the existing compiler let’s be a bit circumspect and introduce a subclass of MethodNode that we’ll use to insulate the existing compiler from our meddling:

MethodNode subclass: #BytecodeAgnosticMethodNode
        
instanceVariableNames: ‘locationCounter localsPool’
        
classVariableNames:
        
poolDictionaries:
        
category: ‘Compiler-ParseNodes’

Encoder methods for accessing
methodNodeClass
        
^MethodNode

BytecodeEncoder methods for accessing
methodNodeClass
        
^BytecodeAgnosticMethodNode

Parser methods for expression types
newMethodNode
        
^self encoder methodNodeClass new

Now BytecodeAgnosticMethodNode (a bit of a mouthful) can have its own generate: method:

BytecodeAgnosticMethodNode methods for code generation (new scheme)
generate: trailer
        
"The receiver is the root of a parse tree. Answer a CompiledMethod.
         The argument, trailer, is the reference to the source code that is
         stored with every CompiledMethod."

        | blkSize nLits literals stack method |
        
self generate: trailer ifQuick:
                          [:
m |
                          
m         literalAt: 2 put: encoder associationForClass;
                                   
properties: properties.
                          
^m].
        encoder
supportsClosureOpcodes ifTrue:
                 [temporaries
:= block analyseArguments: arguments temporaries: temporaries rootNode: self.
                  encoder
rootNode: self. "this is for BlockNode>>sizeCodeForClosureValue:"].
        
blkSize := block sizeCodeForEvaluatedValue: encoder.
        
method := CompiledMethod
                                   
newBytes: blkSize
                                   
trailerBytes: trailer
                                   
nArgs: arguments size
                                   
nTemps: (encoder supportsClosureOpcodes
                                                              
ifTrue: [| locals |
                                                                                
locals := arguments, temporaries.
                                                                                encoder
                                                                                         
noteBlockExtent: block blockExtent
                                                                                         
hasLocals: locals.
                                                                                
locals size]
                                                              
ifFalse: [encoder maxTemp])
                                   
nStack: 0
                                   
nLits: (nLits := (literals := encoder allLiterals) size)
                                   
primitive: primitive.
        
nLits > 255 ifTrue:
                 [
^self error: 'Too many literals referenced'].
        
1 to: nLits do: [:lit | method literalAt: lit put: (literals at: lit)].
        encoder
streamToMethod: method.
        
stack := ParseStack new init.
        block
emitCodeForEvaluatedValue: stack encoder: encoder.
        
stack position ~= 1 ifTrue:
                 [
^self error: 'Compiler stack discrepancy'].
        encoder
methodStreamPosition ~= (method size - trailer size) ifTrue:
                 [
^self error: 'Compiler code size discrepancy'].
        
method needsFrameSize: stack size.
        
method properties: properties.
        
^method

The analysis pass is performed by

        encoder supportsClosureOpcodes ifTrue:
                 [temporaries
:= block analyseArguments: arguments temporaries: temporaries rootNode: self.
                  encoder
rootNode: self. "this is for BlockNode>>sizeCodeForClosureValue:"].

and the method uses the new sizers and emitters via sizeCodeForEvaluatedValue: and emitCodeForEvaluatedValue:.

The Closure Analysis

We’re finally at an interesting bit, namely how to analyse the parse tree to handle temporaries correctly when using closures. We need to know for each temporary whether it is closed over (used in a block other than - necessarily within - its block scope), and if it is assigned to after being closed-over. If a temporary is not assigned to after being closed over the closures can be given read-only copies of the temporary. If a temporary is written to after being closed over then it needs to be made remote, i.e. not live as a temporary in the declaring block but live in an Array which, being read-only, can be copied. Remember the rationale for all this is to break the dependence of closing over scopes on the stack frames of declaring scopes. The Closures Part I post explains this in more detail.

It turns out this analysis is quite simple to do. We simply assign different numbers to different block scopes. We can initialize temporary nodes with their defining scope’s number and then compare this against the scope number at each access point. Here’s the numbering applied to our old saw Collection>>inject:into:. The method’s scope number starts at 0. After a scope’s temporaries are declared the number is incremented. When a block is entered or exited the block count is incremented.

        Collection methods for enumerating
0:         
inject: thisValue into: binaryBlock
                 
| nextValue |
1:                  
nextValue := thisValue.
                 
self do:
2:                           [:each |
3:                            nextValue := binaryBlock value: nextValue value: each
4:                           ].
5:         
^nextValue
6:


The scope extent of the whole method is 0 to 6. The scope of the block is 2 to: 4, which we’ll call the block extent, and the scope within it is 3 to: 3.

So now let’s label each argument and temporary with its uses, d@N for definitions, r@N for reads, w@N for writes

        inject: thisValue d@0 into: binaryBlock d@0
                 
| nextValue d@0 |
                 
nextValue w@1 := thisValue r@1.
                 
self do:
                          
[:each d@2 |
                          
nextValue w@3 := binaryBlock r@3 value: nextValue r@3 value: each r@3].
                 
^nextValue r@5

Now let’s analyse:
        thisValue                  d@0
r@1
        binaryBlock                  d@0 r@3
        nextValue                  d@0 r@3 r@5 w@1 w@3
        each                           d@2 r@3

So the block [2 to: 4] can determine that it closes over binaryBlock and nextValue because these are defined before the start of its block extent and referenced within it. nextValue can determine that it must be remote because it is written to in an inner scope. It therefore asks its defining scope block [0 to: 6] to add a remote temporary, represented by an instance of RemoteTempVectorNode, adds itself to the node’s sequence of remote temps, and notes that it is remote by setting its remoteNode instance varable to the new remote temp vector node.

Since there only ever needs to be one remote temp vector node per scope we can use the block extent as its name. After we have introduced the remote temp vector node our method numbering looks like this. nextValue now lives inside the remote temp vector <0-6> and not on the stack of method block [0 to: 6].

        
inject: thisValue d@0 into: binaryBlock d@0
                 
| <0-6: nextValue> d@0 |
                 <0-6> r@1 at: 1 put:
thisValue r@1.
                 
self do:
                          
[:each d@2 |
                          
<0-6> r@3 at: 1 put: (binaryBlock r@3 value: (<0-6> r@3 at: 1) value: each r@3)].
                 
^<0-6> r@5 at: 1

and the analysis becomes
        thisValue                  d@0
r@1
        binaryBlock                  d@0 r@3
        <0-6>                           d@0 r@3 r@5