Supervisor:
Frederic Blanqui, Jean-Francois Monin, Vania Jolobof
Location:
Tsinghua University, Formes Project, FIT building, 3-603
Background:
Master in Computer Science, specialisation in Formal methods
Expertise:
C, OCaml, Coq. A good knowledge of type theory and compilatio techniques would be appreciated.
Bibliography:
- The INRIA project COMPCERT: http://compcert.inria.fr/
- A.C.J Fox. Formal specification and verification of ARM6. In David Basin and Burkhart Woff, editors, TPHOLs ’03, volume 2758 of LNCS, pages 25-40. Springer-Verlag, 2003.
A simulator for embedded systems is based on a kernel which implements the runtime of a model of the CPU (ARM, PowerPC or MIPS) on which the embedded system is built. Given a machine language program for the considered CPU, this runtime loads a data structure representing the instructions in the program, and execute them on the model of the CPU, each instruction being represented by a particular piece of code. C is probably the best language to express a CPU model both easily and efficiently. We are especialy interested in the simulation of ARM and PowerPC machine code. Our goal is to develop a simulator, SIMSOC, which kernel and CPU model are all certified in Coq for both ARM and PowerPC.
This project is quite innovative: no certified simulator exists on the market and we are not aware of any similar project. When completed, we shall have at our disposal a certified simulator for both ARM and PowerPC. For practical use, the kernel will have to be complemented with input outputs generating traces at run-time for the user executing a particular binary code on the simulator.
Given an abstract processor model, the simulator’s kernel is made of three main components: a reader-loader of the bit machine code described in the ELF standard; a decoder of each machine instruction; an engine made of a set of C functions interpreting the machine instructions w.r.t. the processor model. We shall assume an abstract processor model.
The reader-loader transforms the input string into a data structure representing the machine code stored in the memory model from some given location specified in the string. The reader-loader needs to decode each instruction present in the input string: PowerPC, MIPS. This task is achieved by the decoder. Decoding a sequence of bits representing an intruction of the machine language is actually a cascade of two switches in practice, the first identifying the instruction class, the second the particular instruction in the class. As a consequence, a Caml decoder can be generated from a description of the syntatic format of the instructions.
Task 1: write a certified Caml reader-loader.
Task 2: write a certified Caml decoder-generator taking as input a description of the machine instructions syntax.
The same idea can be used to generate the C code associated with each binary instruction. The instructions of the machine language are based on a few basic operations, and all instructions in a given class are very similar to each other. This should allow to generate these functions -at least most of them- by a CamL program taking as input an SOS description of the meaning of the instructions of the machine language.
Task 3: write a certified Caml engine generator taking as input an SOS description of the machine instructions semantics. This assumes an SOS based language to specify the meaning of ARM/PowerPC instruction machine instructions.