Here’s what I hope you’ll agree is a nice example of bytecode analysis and of creating custom control structures in Smalltalk. One might think that a dynamically-typed language like Smalltalk is difficult to analyze. But in fact there are many ways of analyzing it, and this post concerns analyzing bytecode. Further, languages that support closures (Smalltalk calls them blocks, and you might know them as anonymous functions) allow one easily to construct custom control structures, and this post works up to a nice example.
First some context, which is code within the Just-in-Time compiler for Cog, the Smalltalk virtual machine for Squeak, Scratch et al that this blog discusses. In the bowels of Cog’s Just-in-Time compiler is code that merges control flow when a jump bytecode jumps to another bytecode. The execution state for the control flow falling though to the target bytecode must be merged with the execution state of the bytecode jumping to that bytecode (or vice verse). Also, a jump instruction generated as part of a jump from some previous bytecode to some target bytecode must be "fixed up" to jump to the first instruction generated for the target bytecode when the JIT gets around to generating the instructions for the target bytecode. That’s not the topic of this post ;-). What is the topic is simplifying the code that does this, because when I wrote it in the first place I made a mistake, writing the code in an ugly manner. Now I want to write the code in a cleaner manner and I want to check that the clean up is always appropriate.
The two methods that are at the heart of the "merge" and the "jump to" operations are ensureFixupAt: and ensureNonMergeFixUpAt:. Typical usage looks like this:
(self fixupAt: nextPC - initialPC) notAFixup ifTrue: "The next instruction is dead. we can skip it." [deadCode := true. self ensureFixupAt: targetBytecodePC - initialPC. self ensureFixupAt: postBranchPC - initialPC] ifFalse: [self deny: deadCode]. "push dummy value below" self assert: (unforwardArg or: [unforwardRcvr]). orNot == branchDescriptor isBranchTrue "orNot is true for ~~" ifFalse: "branchDescriptor is branchFalse" [ fixup := (self ensureNonMergeFixupAt: postBranchPC - initialPC) asUnsignedInteger. self JumpZero: (self ensureNonMergeFixupAt: targetBytecodePC - initialPC) asUnsignedInteger ] ifTrue: [ fixup := (self ensureNonMergeFixupAt: targetBytecodePC - initialPC) asUnsignedInteger. self JumpZero: (self ensureNonMergeFixupAt: postBranchPC - initialPC) asUnsignedInteger ].
The bug you can see is that all calls to the routines first subtract initialPC from the bytecode PC to get to the index of the fixup for that pc. That’s ugly. I’d like to move the "- initialPC" sends into ensureFixupAt: and ensureNonMergeFixupAt:. But I’d like to know if all calls on the two methods involve "- initialPC" before I make this change. One simple way to do this is to scan the bytecode of the relevant methods. For this I can use Smalltalk’s InstructionStream, which is used for all sorts of things including the debugger. [An aside to non-Smalltalkers; one of the powerful things about Smalltalk is that the system is not merely a compiler for Smalltalk source; it’s a live object space, and the code exists as CompiledMethod objects in the system that one can look at directly].
What I want to do is look for all sends of either ensureFixupAt: or ensureNonMergeFixupAt: and verify that they’re always preceded by a bytecode that pushes the initialPC instance variable of the Just-in-Time compiler and a send of #-. InstructionStream provides a custom control structure "scanFor:" that takes a one-argument block and evaluates it with the numeric value of each bytecode in a method until the block answers true. scanFor: answers false if the block never evaluates to true. The bytecodes look like this:
279 <70> self 280 <8C 03 05> pushTemp: 3 inVectorAt: 5 283 <80 1C> pushRcvr: 28 285 <B1> send: - 286 <EE> send: ensureFixupAt: 287 <87> pop 288 <70> self 289 <8C 05 05> pushTemp: 5 inVectorAt: 5 292 <80 1C> pushRcvr: 28 294 <B1> send: - 295 <EE> send: ensureFixupAt: 296 <94> jumpTo: 302 297 <70> self 298 <84 40 B7> pushRcvr: 183 301 <EC> send: deny: 302 <87> pop 303 <70> self 304 <12> pushTemp: 2 305 <99> jumpFalse: 308 306 <71> pushConstant: true 307 <90> jumpTo: 309 308 <11> pushTemp: 1 309 <83 31> send: assert: 311 <87> pop 312 <10> pushTemp: 0 313 <8C 01 05> pushTemp: 1 inVectorAt: 5 316 <D8> send: isBranchTrue 317 <C6> send: == 318 <AC 1C> jumpFalse: 348 320 <70> self 321 <8C 03 05> pushTemp: 3 inVectorAt: 5 324 <80 1C> pushRcvr: 28 326 <B1> send: - 327 <83 33> send: ensureNonMergeFixupAt:
Now this is more than a little inconvenient to parse. In particular, the details of the bytecode for each send vary. Smalltalk has compact bytecodes for common selectors such as #-, so there’s a special bytecode for it (16rB1 at 294) and for #== (16rC6 at 317). Some times the literal offset for the ensureFixupAt: selector will mean a one byte send bytecode can be used (such as 16rEE at 286) and sometimes it’ll have to be a two byte bytecode (such as 16r83 16r33 at 327). Any attempt to look at the code at this level will cause confusion and delay. Luckily InstructionStream has facilities to extract the bytecode semantics, lifting the level of abstraction well above the numeric values. peekInstruction answers a Message instance that is the message that InstructionStream would send to a client when decoding a bytecode. So in the above at 280 peekInstruction will return a Message selector: #pushTemp:inVectorAt: arguments: #(3 5). selectorToSendOrSelf answers either the selector of a message send bytecode or the InstructionScanner itself, so at 286 selectorToSendOrSelf will answer #ensureFixupAt:, but at 287 it will answer the InstructionStream. With these facilities we can write something almost comprehensible:
self systemNavigation browseAllSelect: [:m| | is pi ppi | is := InstructionStream on: m. ppi := pi := is peekInstruction. is scanFor: [:b| | ok | ok := (#(ensureFixupAt: ensureNonMergeFixupAt:) includes: is selectorToSendOrSelf) and: [ppi selector == #pushReceiverVariable: and: [ppi argument = 28 and: [pi selector == #send:super:numArgs: and: [pi argument = #-]]]]. ppi := pi. pi := is peekInstruction. ok]] localTo: Cogit
The heart of this is
ok := (#(ensureFixupAt: ensureNonMergeFixupAt:) includes: is selectorToSendOrSelf) and: [ppi selector == #pushReceiverVariable: and: [ppi argument = 28 and: [pi selector == #send:super:numArgs: and: [pi argument = #-]]]].
which sets ok to true if the current instruction is a send of ensureFixupAt: or ensureNonMergeFixupAt: and the previous previous instruction (ppi) is a push of initialPC, and if the previous instruction (pi) is a send of #-. So the block must manage updating ppi and pi. Hence haivng to save the value of the test in the variable "ok", and the statements
ppi := pi. pi := is peekInstruction.
to update them before answering "ok" as the value of the block, and the statement to initialize them to message objects before scanFor: is invoked.
Wouldn’t it be nice if we could add a method that abstracted away from the need to wrangle these variables? Indeed, and it’s quite easy. Let’s add a method scanForInstructionSequence: to InstructionStream to do precisely this:
InstructionStream methods for scanning scanForInstructionSequence: naryBlock "naryBlock is a block taking one or more arguments. Evaluate it for each sequence of instructions of length n in the receiver until naryBlock evaluates to true. Answer if naryBlock evaluated to true." | instructions | instructions := OrderedCollection withAll: ((1 to: naryBlock numArgs) collect: [:ign| self atEnd ifTrue: [^false]. self nextInstruction]). [(naryBlock valueWithArguments: instructions asArray) ifTrue: [^true]. self atEnd] whileFalse: [instructions removeFirst; addLast: self nextInstruction]. ^false
Let’s break this down. naryBlock numArgs answers the number of arguments the block takes. This way we don’t need a method for one-instruction sequences, another for two-instruction sequences, etc; we’ll just have one method for any length of sequence (up to the maximum argument count for a block, which in Squeak and Pharo is currently 15). (1 to: naryBlock numArgs) constructs an Interval from 1 to that value, and collect: applies its argument to each element of the Interval (collect: is Smalltalk’s map). self atEnd ifTrue: [^false]. checks that there is an instruction avalable to parse, and aborts if not, so if a method doesn’t have enough bytecodes to supply naryBlock, scanForInstructionSequence: answers false instead of raising an error as it tries to access bytecodes that don’t exist. OrderedCollection is Smalltalk’s double-ended queue, so we can maintain the sequence of instuctions by adding a new one to the end and removing an old one from the front. nextInstruction is the method that collects the message for a given bytecode and advances the InstructionStream’s pc to the next bytecode. So the first statement collects the first naryBlock numArgs bytecode messages, answering false if there aren’t enough.
The next statement is a while loop that supplies the bytecode messages to naryBlock, testing the result and answering true if naryBlock answers true. valueWithArguments: is the block evaluation method that evaluates a block with a collection iof arguments, as opposed to value, value:, value:value: et al that take individual arguments. But valueWithArguments: takes an Array, not an OrderedCollection, so the OrderedCollection is copied to an Array before sending valueWithArguments:. As long as there are still isntructions to be processed (self atEnd is false), the body of the while loop removes the first bytecode message before adding a new one at the end. Finaly, if naryBlock never evaluates to true, scanForInstructionSequence: answers false.
So we’ve abstracted away from wrangling those instructions and created a custom control structure that evaluates a block with any length of instruction sequence from 1 to 15, answering whether that block evaluated to true. Neat.
So let’s apply this new method to our previous example. And let’s add another search that answers the converse, finding all sends of ensureFixupAt: or ensureNonMergeFixupAt: that are not preceded by a push of initalPC followed by a send of #-.
self systemNavigation browseAllSelect: [:m| (InstructionStream on: m) scanForInstructionSequence: [:a :b :c| a selector == #pushReceiverVariable: and: [a argument = 28 and: [b selector == #send:super:numArgs: and: [b argument = #- and: [c selector == #send:super:numArgs: and: [#(ensureFixupAt: ensureNonMergeFixupAt:) includes: c argument]]]]]]] localTo: Cogit; browseAllSelect: [:m| (InstructionStream on: m) scanForInstructionSequence: [:a :b :c| c selector == #send:super:numArgs: and: [(#(ensureFixupAt: ensureNonMergeFixupAt:) includes: c argument) and: [(a selector == #pushReceiverVariable: and: [a argument = 28 and: [b selector == #send:super:numArgs: and: [b argument = #-]]]) not]]]] localTo: Cogit
Nice. And using this I find that the only such sends in the Just-in-Time compiler that don’t involve a "- initalPC" are overrides of ensureFixupAt: or ensureNonMergeFixupAt: in subclasses:
RegisterAllocatingCogit methods for compile abstract instructions ensureNonMergeFixupAt: targetIndex "Make sure there's a flagged fixup at the targetIndex (pc relative to first pc) in fixups. Initially a fixup's target is just a flag. Later on it is replaced with a proper instruction." | fixup | fixup := super ensureNonMergeFixupAt: targetIndex. fixup mergeSimStack ifNil: [self setMergeSimStackOf: fixup]. ^fixup
So indeed, it is safe to make the change.