代写COMP4403、代做Java编程语言
时间:2024-05-07 来源: 作者: 我要纠错
COMP4403 - Compilers and Interpreters
Assignment 2
This is an individual assignment which involves modifying the LALR assignment 2 compiler for the PL0 language to add array types and operations on arrays.
Assignment Compiler Files
All sources for the assignment PL0 compiler are available as a2.zip (below). Please be sure to use the version for this assignment and not the one used for the tutorials or another assignment. There are differences (like the lexical tokens you need for this assignment are only defined in the assignment version).
a2.zipSun 14 Apr 2024 12:12:22 AEST Save this .zip file and follow the instructions for setting up a compiler project in IntelliJ
Setting up and running PL0 in IntelliJThu 21 May 2020 16:46:06 AEST
Brief documentation on assignment 2 filesSun 7 Apr 2024 18:29:52 AEST
Here is the documentation for
Java CUP [HTML]
JFlex [HTML]
For the most part you will not need these.
Please pay attention to course Blackboard announcments, and ensure you follow the course discussion board (https://edstem.org/) for any updates and further information on the assignment.
Do not use imports for external packages other than those in java.util.*. Note that IntelliJ may offer the option of importing an external package to resolve an issue; please avoid accepting this option because it will often add an erroneous import that you will not need. Such imports lead to the compilation failing in the environment in which your compiler will be assessed because that environment does not include the external libraries. Please check you are not importing external libraries before submitting.
You must only modify the files that must be submitted (see below).
You must not modify any other files because we will be testing your implementation using the existing other files with your submitted files.
Please do not reformat the files because we would like to just print the differences between the originals and the versions you hand in.
Please keep the length of lines in your files below 100 characters, so that we can print them sensibly.
Please avoid using non-standard characters, e.g. Chinese characters, including in the comments. Non-standard characters are not accepted by the Java compiler used to test your assignment and all comments should be readable by the person assessing your assignment.
Please remove any debugging output before your assignment is submitted because debugging output will cause your program to fail our automated testing of your assignment.
Either avoid using tabs or set your tabs stops to 4 spaces (this is the default for IntelliJ/Eclipse) so that your files will print sensibly.
Read the fine print in detail before you start! And, most important, when you have finished implementing the assignment, come back and reread the fine print again.
Overview
Array types
An array type may be declared in a type definition. In the following declarations
const N = 3;
type
V = array of int;
M = array of V;
V is an array type with elements of type integer, and M is an array type with elements of type V. Each element of M is itself an array of integers. One may declare a variable to be of an array type:
var
vec : V;
mat: M;
Arrays are indexed by integers. The length of an instance of an array will be fixed when the array is created. (The minimum length of an array is one.) Once an array has been instantiated, its first index will be index 0 and its last index will be the length of the array minus one. An instance of an array may be created using a new expression. For example,
vec := new V[N];
mat := new M[N-1]
assigns to variable vec a new (dynamically allocated) array with three elements, each of type integer, with indices 0 to 2 (inclusive); and assigns to variable mat a new array with two elements, each of type V, with indices 0 to 1 (inclusive).
The length of an initialised array may be accessed. For example
write vec.length
writes the length of array vec. Elements of initialised arrays may be assigned appropriate values:
vec[0] := 100;
mat[0] := new V[10];
mat[0][9] := 200;
mat[1] := vec; // array assignment (mat[1] and vec now refer to the same array)
mat[1][0] := 300
After the array assignment mat[1] := vec, both mat[1] and vec refer to the same array. And so, for example, after the assignment mat[1][0] := 300 both mat[1][0] and vec[0] will have the same value (300). The values of the elements may be accessed, for example, the following
vec[1] := mat[0][9] - 1
assigns to index 1 of the array vec the value of mat[0][9] minus one.
Syntax Changes
The reserved keywords "array", "of" and "new" have already been added to the lexical analyser input file (PL0.flex) as the tokens KW_ARRAY, KW_OF and KW_NEW, and the symbol "." has been added as the token PERIOD. They have also been added to the terminal definitions in PL0.cup.
The syntax is given here in BNF. The syntax for type definitions (Type) is extended with an additional alternative for array types:
Type → ... | "array" "of" Type
A reference to an element of an array can be used as an LValue either within an expression or on the left side of an assignment:
LValue → IDENTIFIER | LValue "[" Condition "]"
A new array expression or an access to the length attribute of an array can be used as a Factor in an expression:
Factor → ... | "new" TypeIdentifier "[" Condition "]" | LValue "." IDENTIFIER
You need to add these productions and their associated actions to build the symbol table entries and abstract syntax trees to PL0.cup.
The Parser Generator Java-CUP
The parser specification for the compiler is specified in the file PL0.cup. You will need to add productions (and their associated actions) to the specification and then run the parser generator Java-CUP (manually or automatically) to generate the files CUPParser.java and CUPToken.java. Do not modify these two Java files directly (even if you think you understand them (do you really?)) - remake them from PL0.cup. You should make the compiler before you change anything just to see what forms of messages to expect. When you make the compiler (before you modify it) there will be some warning messages about the terminal symbols like ILLEGAL being declared but never used; these are to be expected at this stage. Any new warning/error messages will be significant. Beware that if there are Java-CUP errors reported, the Java files for the parser will not be generated, so check for Java-CUP errors first. There is HTML documentation for Java-CUP available from the class web page (with the assignment).
The Scanner Generator JFlex
All the lexical tokens for this version of the compiler have already been added to the lexical analyser.
The file Lexer.java is automatically generated by the scanner generator JFlex from PL0.flex; again, do not modify Lexer.java - remake Lexer.java from PL0.flex.
Both Java-CUP and JFlex are available with the assignment files on the course web page, with instructions on how to run them in IntelliJ. Java archives for Java-CUP and JFlex are part of the IntelliJ project for the assignment.
Static Semantic Restrictions
Array types
In an array type declaration, array of t,
the element type, t, can be any type including another array type, but cycles in type definitions are not permitted, e.g. the following is not valid as the definitions of types C and D form a cycle.
C = array of D;
D = array of C;
In Type.java a class ArrayType has already been added to represent array types within the compiler. The type of array of t is ArrayType(T), where T is the type of t:
syms ⊢ typeof(t) = T
syms ⊢ typeof(array of t) = ArrayType(T)
In a new expression, new id[e], the type identifier id must be that of an array type; the expression e must be compatible with the integer type; and the type of the new expression will correspond to the array type of the type identifier:
id ∈ dom(syms)
syms(id) = TypeEntry(ArrayType(T))
syms ⊢ e : int
syms ⊢ new id[e] : ArrayType(T)
For a reference to the length attribute of an array, e.id, expression e must have an array type, and identifier id must be the identifier "length". A reference to the length of an array, e.id is of type integer:
syms ⊢ e : ArrayType(T)
id = "length"
syms ⊢ e.id : int
For a reference to an element of an array, e1[e2], expression e1 must have an array type, and expression e2, used as the array index, must be compatible with the integer type (the type of array indices). The type of a reference to an element of an array, e1[e2], is a reference to the element type of the array:
syms ⊢ e1 : ArrayType(T)
syms ⊢ e2 : int
syms ⊢ e1[e2] : ref(T)
Note that the type of e1[e2] is ref(T) rather than T so that the subscripted array element can be used as an L-value, e.g., it can be used on the left side of an assignment.
Assignment of arrays is allowed but other operations on arrays (e.g., comparison, etc.) are not supported.
Dynamic Semantics
Arrays
Arrays are dynamically allocated via a new expression. As such, the value of an array will be an absolute (i.e. global) address: the address where the length of the array and the elements of the array are stored. The value of an array will be null (StackMachine.NULL_ADDR) until it has been otherwise assigned.
When one array is directly assigned to another, such as in
vec2 := vec1
then the value (an absolute address) stored by vec2 will become the same value (an absolute address) stored by vec1. Hence, following the assignment, vec1 and vec2 will refer to the same array object in memory.
The expression new id[e] dynamically allocates space on the heap to store both the length, e, of the new array, as well as all of its e elements; it writes the length of the new array to that address at an offset of 0; and it evaluates to the absolute address of the new array that has been allocated.
The length of a new array should be checked at run time to ensure that it is greater than or equal to the minimum length of an array (one): if it is not then this is a run time error, and the stack machine should be stopped with an exit code of StackMachine.INVALID_ARRAY_LENGTH.
Objects dynamically allocated via a new array expression have a life time that is not restricted to that of the variable to which they were allocated. For example, a new array may be created within a procedure and assigned to a local variable within that procedure. Provided that variable's value (the absolute address of the allocated array) is assigned to a variable or field that is accessible via variables global to the procedure, before the procedure exits, the object will remain accessible.
Although we dynamically allocate arrays via the new expression, we won't implement garbage collection of objects that are no longer accessible.
A reference to the length of an array, i.e. e.length, should evaluate to the length of the array.
The dynamic semantics of array accessing, i.e. e1[e2], is conventional. An element of an array may be used like a variable whose type is the same as the element type of the array. For example, if the element type is int then it may be "assigned to", read, written and used in arithmetic expressions and comparisons (but if you start worrying about each of these you are making way more work for yourself than you need to).
If the value of an array is null (StackMachine.NULL_ADDR) when either at attempt is made to access its length, or to access one of its elements then this is a run time error and the stack machine should be stopped with an exit code of StackMachine.NULL_ARRAY.
Accesses to array elements should be bounds checked at run time, i.e., the value of the index expression should be checked to make sure that it is a valid index of the array. If it is not, then this is a run time error and the stack machine should be stopped with an exit code of StackMachine.OUT_OF_BOUNDS. (See the discussion of the genBoundsCheck method below.)
Variables of an array type are local variables and hence are allocated on the stack just like any other local variable. The main difference from scalar variables is that the value stored in the local variable will be the absolute address of an array. The absolute address of the array (unless it is StackMachine.NULL_ADDR) can then be used to access the length of the array, and the elements of the array.
Object Code
The PL0 compiler generates code for the Stack Machine. A description of the stack machine is available in StackMachineHandout.pdf. See also the file StackMachine.java (near the end) for details of the instruction set.
Dynamic allocation of arrays
There is an instruction, ALLOC_HEAP, which assumes that there is an integer on the top of the stack that represents the size of the object to be allocated. It pops that value from the stack and replaces it with the absolute address of a newly allocated object of that size. The stack machine does not support disposing objects or garbage collection.
If there is insufficient space then ALLOC_HEAP will fail with a "memory overflow" message. In the implementation of the stack machine there is a single array representing the whole of memory: the stack (the bottom end of the array) and the heap of dynamically allocated objects (the top end). If either pushing onto the stack reaches the lower limit of the heap, or allocating on the heap reaches the top of the stack, then there is insufficient memory and the program aborts (with the same error message in both cases).
You need to be aware that the instructions LOAD_FRAME and STORE_FRAME expect an address that is an offset from the frame pointer for the current (procedure) scope. You can use instruction TO_LOCAL to convert an absolute address into an address relative to the current frame pointer.
Reporting runtime errors
Under exceptional circumstances, the STOP instruction can be used to stop execution of the stack machine. The top of the stack is popped to get an exit code.
Array indexing and bounds checking
The BOUND instruction expects that the following have been loaded (pushed) onto the stack (in this order):
a value to be bounds checked,
a lower bound, and
an upper bound.
The BOUND instruction pops the upper and lower bound as well as the value to be checked. If the value to be checked is not within the given bounds (inclusive) then value false is pushed on top of the stack, else the value true is pushed on top of the stack.
The genBoundsCheck method of the Code class can be used to generate code for checking that a value is within bounds. It assumes the value to check is already on the stack. If the bounds check succeeds the value checked is left on the top of stack, otherwise the stack machine interpreter halts with an StackMachine.OUT_OF_BOUNDS runtime error.
Additional stack machine instructions
Additional instructions (not in the Stack Machine handout) STORE_STACK and LOAD_STACK have been added to the Stack Machine. You do not have to use these instructions, however you may find them useful. For a precise description of their behaviour, refer to StackMachine.java.
STORE_STACK: The value of the second top of the stack is stored at the (absolute) address given by the stack pointer minus one (sp-1), minus the top of the stack. The two values on the stack are popped.
LOAD_STACK: The top of the stack is replaced with the contents of the memory location whose (absolute) address is given by the stack pointer minus one (sp-1), minus the top of the stack.
Tests
Some tests are available in the test-pgm directory (in a2.zip), and can be used to help you to debug your code. All of the tests can be run together using the Test_LALR configuration. You can also individually run a test using PL0_LALR on a selected PL0 program. The test cases of the form test-base*.pl0 are useful for regression testing, to make sure that you haven't broken any existing functionality in the compiler, and the other tests can help you find bugs in the new compiler features.
This assignment compiler is provided solely for the purposes of doing this assignment and your solutions must never be shared, especially publicly, even after completion of the course. Such publication would be considered both student misconduct and a breach of copyright.
If the assignment is submitted after the deadline, without an approved extension, a late penalty will apply. The late penalty shall be 10% of the maximum possible mark for the assessment item will be deducted per calendar day (or part thereof), up to a maximum of seven (7) days. After seven days, no marks will be awarded for the item. A day is considered to be a 24 hour block from the assessment item due time. Negative marks will not be awarded.
Submission
Please keep the length of lines in your files below 100 characters, so that we can print them sensibly. You should avoid using tabs or set your tabs stops to 4 spaces so that when we print them (with tab stops set to 4 spaces) they will print sensibly. Do not forget to remove any code generating debugging output and any rogue external imports before submission.
You must submit your completed assignment electronically through the assessment section of the course BlackBoard site (the BlackBoard Assessment page rather than the course web pages).
You need to submit the following list of individual files (not a .zip or any other form of archive file) for evaluation and marking. Note that file names are case-sensitive.
PL0.cup
ExpNode.java
ExpTransform.java
StaticChecker.java
CodeGenerator.java
You can submit your assignment multiple times, but only the last copy submitted will be retained for marking.
Assessment
The assignment is marked out of a total of 15 marks.
Marks will be allocated as follows:
3.5 - Syntax analysis and tree building
5.5 - Static semantics checking
6.0 - Code generation
Marks will be awarded for the correctness of the changes to each category. Readability and modular structure will also be criteria. For readability, we expect that you follow good software engineering practice, such as appropriate choices of variable names, consistent indentation, appropriate comments where needed, etc. For modularity we expect you introduce new methods where it makes sense to help structure the program and to avoid unnecessary duplication of code. Use of generic Java utility interfaces (like Set, Map, List, Queue, ...) and their implementations (like HashSet, ..., TreeMap, ..., LinkedList, ...) is encouraged. We expect you to produce well structured programs that are not unnecessarily complex, both structurally (e.g. complex control flow that is hard to follow), and in terms of execution time and space requirements, (e.g. an O(n) algorithm is preferred to an O(n2) algorithm, and a O(log n) algorithm is even better).
We will not be concerned with the quality of syntactic error recovery because the parser generator CUP takes care of that for the most part, but you must handle semantic errors appropriately, including handling the situation when there is a syntax error, i.e., your compiler should not crash because of a syntax error.
Your assignment files will be compiled in the context of the remaining assignment files and put through a sequence of tests. The total mark for this assignment will be limited by the overall success of the development in the following way:
The program submitted does not compile: Maximum 8/15.
The program submitted will not correctly handle any test case with the new facilities: Maximum 10/15.
You are not required to correct any bugs that may exist in the original compiler. However, we would appreciate being informed of any such bugs that you detect, so that we can correct them, or any improvements to the compiler you might like to suggest.
请加QQ:99515681 邮箱:99515681@qq.com WX:codinghelp
标签: