# # This file is in POD format; if you're not used to reading POD, # you can run the file through "perldoc" for a plain text version. =head1 TITLE Parrot JIT Subsystem =head1 VERSION =head2 CURRENT Maintainer: Daniel Grunblatt Class: Internals PDD Number: 8 Version: 1.1 Status: Developing Last Modified: 31 January 2002 PDD Format: 1 Language:English =head1 ABSTRACT This PDD describes the Parrot Just In Time compilation subsystem. =head1 DESCRIPTION The Just In Time, or JIT, subsystem converts a bytecode file to native machine code instructions and executes the generated instruction sequence directly. =head1 IMPLEMENTATION Currently works on B or B architectures running B or B. The JIT gives the possibility to write Parrot opcodes in assembly. =head1 FILES =over 4 =item jit/${jitcpuarch}/core.jit Most of the core parrot opcodes are (or will be) here written in assembly, the syntax is described later. When an opcode is not defined here, the code generated by the C compiler is called. =item jit/${jitcpuarch}/string.jit The string subsystem. =item include/parrot/jit.h There is an opcode_assembly_t for each parrot opcode holding the position independent code, the size of it, the number of arguments it needs, and one structure like this: typedef struct { int amount; info_t info[MAX_SUBSTITUTION]; } substitution_t; Where info_t is: typedef struct { int position; int number; } info_t; per C, B is the number of substitutions of this type the current opcode uses, per each we got the B in the PIC where it goes (relative to the start of this opcode) and the argument B that holds the value or address. =item jit.c Here is B which, loops over the parrot bytecode and fills an array with the displacement from the start of the jitted code to each opcode and an array with absolute address, both at the same position that the opcode number has in the bytecode, this is: Parrot bytecode: 73 0 3 72 1 0 Relative: 5 0 0 17 0 0 Absolute: 0x3a285f0 0 0 0x3a285fc 0 0 Then concatenate the PIC (Position Independent Code) of each Parrot opcode using the B structure, the how is described in the B section below. And it replace in the Parrot bytecode cur_opcode[0] by the absolute address of the jitted op. =item Parrot/Jit/${jitcpuarch}Generic.pm Should have the platform specific implementation of the methods required by jit2h.pl : B Takes assembly in the .jit files format and returns the object code with the C B Returns the object code that is need to start running the jitted code according to each platform calling convention. B Takes the number of arguments and the arguments and generates the object code to call a C function. The caller must calculate the position of the address or displacement. B Takes the system call number, the number of arguments and the arguments and generates the object code to make a system call. B Returns the object code which must be placed after each call to code generated by the C compiler according to each platform calling convention. B Returns the object code which must be placed after each call to code generated by the C compiler that chance the program control flow. Since all the ops that change the program control flow returns the address in the bytecode where execution continues it must dereference that value and then jump. =item Parrot/Jit/${jitcpuarch}-${jitosname}.pm Each of this files should use all the methods from Parrot/Jit/${jitcpuarch}Generic.pm that fits the current platform and redefine the methods that don't. Also must define some constants: B<$Parrot::Jit::OP_ARGUMENT_SIZE> Is the size of the opcode argument in bytes. If the size in bits is not a multiple of 8 round it down. B<$Parrot::Jit::Call_immediate_arg_size> The size of the instruction used to pass an immediate value as an argument in bytes. B<$Parrot::Jit::Call_address_arg_size> The size of the instruction used to pass an address as an argument in bytes. B<$Parrot::Jit::Call_start> The size of the instruction/s that are before the position where will be the address or the displacement to the called function before dealing with the arguments. B<$Parrot::Jit::Call_move> This is used to correct the position of the call when some argument require more than just one instruction. B<$Parrot::Jit::Precompiled_call_position> The position of the call in the precompiled call to a Parrot opcode. B<%Parrot::Jit::syscall_number> The key is the system call name and the value the number. =item jit2h.pl Reads the .jit files and prints the struct opcode_assembly_t. =back =head1 Format of .jit Files Jit files are interpreted as follows: =over 4 =item I { I } Where I is the name of the Parrot opcode, and I consists of a sequence of the following forms: =item Assembly instruction. Which may have one of this B as an argument: B Gets replaced by the C register specified in the Ith argument. B Gets replaced by the C register specified in the Ith argument. B Gets replaced by the C register specified in the Ith argument. B Gets replaced by the C constant specified in the Ith argument. B Gets replaced by the C constant specified in the Ith argument. B Gets replaced by C of the C constant specified in the Ith argument. B Gets replaced by C of the C constant specified in the Ith argument. B Gets replaced by C of the C constant specified in the Ith argument. B Gets replaced by C of the C constant specified in the Ith argument. B Gets replaced by C of the C constant specified in the Ith argument. B Gets replaced by C of the C constant specified in the Ith argument. B Gets replaced by C of the C constant specified in the Ith argument. B Gets replaced by the Ith integer constant defined in jit.c B Gets replaced by the Ith floatval constant defined in jit.c B Gets replaced by the Ith char constant defined in jit.c B Gets replaced by the Ith temporary integer array. B Gets replaced by the Ith temporary float array. B Gets replaced by the Ith temporary char array. You must preside all the identifiers with I<&> requesting the address of that identifier, or I<*> requesting the value, I<*> can be used only with constants since the replacement is done before start running. B<&INTERPRETER[n]> Gets replaced by the address of the interpreter. B<*CUR_OPCODE[n]> Gets replaced by the address of the current opcode in the Parrot bytecode. =item B(I, I, ..., I) Call a function defined in another C<.jit> file (except from the core). =item B(I, I, ..., I) Call a system call. =item B(I, I, ..., I) Call a C function. The idea is to replace all the B() with B(). =item Arguments to CALL and SYSTEMCALL The arguments to CALL and SYSTEMCALL must be preceeded by I to indicate that the value should be taken as an immediate or I to indicate that the value should be dereferenced. =head1 ALPHA Notes The access to Parrot registers is done relative to C<$6>, all other memory access is done relative to C<$27>, to access float constants relative to C<$7> so you must preside the instruction with I. =head1 EXAMPLE Let's see how this work: B set I0,8 set I2,I0 print "Big piece of JIT\n" time I0 end B (only the bytecode segment is showed) +-----------------------------------------------+ | 63 | 0 | 8 | 62 | 2 | 0 | 24 | 0 | 48 | 0 | 0 | +-|------------|------------|--------|--------|-+ | | | | | | | | | +-- end (no arguments) | | | +----------- time_i (1 argument) | | +-------------------- print_sc (1 argument) | +--------------------------------- set_i_i (2 arguments) +---------------------------------------------- set_i_ic (2 arguments) Please note that the opcode numbers used might have already changed. B Parrot_set_i_ic { movl *INT_CONST[2],&INT_REG[1] } Parrot_set_i_i { movl &INT_REG[2],%eax movl %eax,&INT_REG[1] } Parrot_print_sc { movl $1,&TEMP_INT[1] SYSTEMCALL(WRITE,3, A&TEMP_INT[1] V&STRING_CONST_bufstart[1] V*STRING_CONST_strlen[1]) } Parrot_end { leave ret } Note that there is no Parrot_time_i so, the code generated by the C compiler for Parrot_time_i will be called. B Parrot_set_i_ic { \xc7\x05\x00\x00\x00\x00\x00\x00\x00\x00 # mov $0,0x0 } Parrot_set_i_i { \xa1\x00\x00\x00\x00 # mov 0x0,%eax \xa3\x00\x00\x00\x00 # mov %eax,0x0 } Parrot_print_sc { \xc7\x05\x00\x00\x00\x00\x01\x00\x00\x00 # mov $1,0x0 \x68\x00\x00\x00\x00 # push 0x0 \x68\x00\x00\x00\x00 # push 0x0 \xff\x35\x00\x00\x00\x00 # push $0 \x50 # push \xb8\x04\x00\x00\x00 # mov $4,%eax \xcd\x80 # int 80h \x72\x00 # jb 0 } Parrot_end { \xc9 # leave \xc3 # ret } Parrot_time_i { \x68\x00\x00\x00\x00 # pushl 0x0 \x68\x00\x00\x00\x00 # pushl 0x0 \xe8\x00\x00\x00\x00 # call 0x0 \x83\xc4\x08 # add $0x8,%esp } The object code for time_i is the same that for any opcode that isn't implemented in core.jit B Memory dump of the JIT code being generated: +-----------------------------------------+ | 0x55 0x89 0xe5 0xc7 0x05 0x00 0x00 0x00 | | 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 | +-----------------------------------------+ That is the state after the code for the first op has been copied. The B<0x55 0x89 0xe5> you see before the object code for Parrot_set_i_ic is the output of Parrot::Jit->init() Fill it with addresses and/or values: +-----------------------------------------+ | 0x55 0x89 0xe5 0xc7 0x05 0x00 0xa0 0x10 | | 0x00 0x08 0x00 0x00 0x00 0x00 0x00 0x00 | +-----------------------------------------+ The address of I0 (&intepreter->int_reg.registers[0]) is 0x10a000 (or whatever), so the first 4 bytes after the opcode number are filled with it, and the other contiguous 4 with the constant it self. The same process is done one time per opcode. The final result: +-----------------------------------------+ | 0x55 0x89 0xe5 0xc7 0x05 0x00 0xa0 0x10 | | 0x00 0x08 0x00 0x00 0x00 0xa1 0x00 0xa0 | | 0x10 0x00 0xa3 0x08 0xa0 0x10 0x00 0xc7 | | 0x05 0x54 0x7a 0x10 0x00 0x01 0x00 0x00 | | 0x00 0x68 0x11 0x00 0x00 0x00 0x68 0x18 | | 0xb0 0x10 0x00 0xff 0x35 0x54 0x7a 0x10 | | 0x00 0x50 0xb8 0x04 0x00 0x00 0x00 0xcd | | 0x80 0x72 0x00 0x68 0x00 0xa0 0x10 0x00 | | 0x68 0xe0 0x60 0x12 0x00 0xe8 0xae 0xdb | | 0xed 0xff 0x83 0xc4 0x08 0xc9 0xc3 0x00 | +-----------------------------------------+ This code is ready to be called. =back