ESP: Evolutionary Structured Programming W. R. Pringle Unisys Corporation Weather Information Services 221 Gale Lane Kennett Square, PA wrp@kennet.paramax.com wrp103@psu.edu Abstract Genetic Programming (GP) has demonstrated the ability to evolve programs to solve a wide variety of problems. Traditional GP concentrates on the creation of functions or LISP S-Expressions. Recent extensions of traditional GP have included automatic definition of functions (ADF), a stack based GP, a strongly typed GP, as well as the introduction of memory and attempts at evolving algorithms rather than individual functions. This paper proposes a proposed research effort to further extend the GP paradigm. In particular, a methodology is presented to introduce structured programming constructs as well as a context sensitive probability mechanism that may enable GP to evolve modular programs which are closer to those developed by human programmers. Keywords Genetic Programming, evolutionary programming, structured programming I.Introduction The term Evolutionary Programming refers to a group of methodologies in which programs evolve over time. John Holland invented the Genetic Algorithm (GA) which demonstrated that natural progression observed in nature could be simulated in computational systems to solve a variety of problems (Holland 1989). Genetic Programming (GP) is an extension of the Genetic Algorithm introduced by John Koza (Koza 1992) which evolves a single function in the form of Lisp S-Expressions. Recent extensions to traditional GP have included: automatic definition of functions (ADF) which evolve a main program and some number of functions (Koza 1994); Adaptive Representation (AR) which attempts to identify useful code segments (Rosca, Ballard 1994); a stack based GP which uses a stack to evaluate expressions rather than Lisp (Perkis 1994) (Openshaw, Turton 1995) (Wawra 1995); a strongly typed GP which includes function prototypes and forces function arguments to match the prototypes (Montana 1993); and the introduction of memory to evolve algorithms rather than individual functions (Teller 1994). Evolutionary Structured Programming (ESP) is an extension of GP in which complete structured programs are evolved. Section 2 contains the necessary background for an understanding of GA and GP. Section 3 introduces the basic structure of ESP, while Section 4 describes the dynamic probability mechanism. Section 5 suggests how ESP can be used to evolve structured programs. It is important to stress that ESP is a concept; it does not yet exist. Some preliminary design work has been done. The purpose of this paper is to explore the feasibility of such an approach. I.Genetic Algorithms and Genetic programming Genetic Algorithms and Genetic Programming use natural selection to cause a gradual evolution of the desired solution to a problem. The set of solutions to a GA problem is usually represented by fixed length binary strings. GP, on the other hand, evolves functions, usually in the form of Lisp S-Expressions. An initial population is created randomly. Successive generations are formed by reproduction until a solution is found, or some specified number of generations have been created. Reproduction is performed by one of three methods: duplication, cross-over, and mutation. Duplication means that a given individual is simply copied into the next generation. For cross-over, random portions of two parent individuals are copied and joined to form a new individual. Mutation is similar to duplication, except that some portion of the new individual is changed randomly. A fitness function is used to rank each individual on how close it satisfies the conditions of the problem. The closer the fit, the more likely that individual will be chosen to create offspring. Initially, fitness functions were written to be independent of the population, often "expert" functions. Recently, tournament selection as introduced by (Angeline, Pollack 1993) has been used extensively. It is important to realize that GA and GP are not random searches of the solution space. Nor do they explore the entire solution space. At the completion of a run, there is no way of guaranteeing that the best solution has been found. In some cases of premature convergence, the solution won't even be acceptable. However, under the right conditions, the process will discover interesting and acceptable solutions to the problem at hand. For further information, the reader is invited to review a number excellent tutorials on Genetic Algorithms, including (Beasley, Bull, Martin 1993) (Whitley 1993) (Amos 1995) (Koza 1995). The GA paradigm has been used to develop solutions to a wide range of problems. However, in order to use GA on a problem, one must be able to represent the general form of the potential solution. This is fine for problems in which this can be done (e.g., Neural Nets, simultaneous equations, etc.), but in some cases, determining the overall form of the solution is the first difficult part of the problem. This is where GP can help. GP obviates the necessity to predict the overall form of the solution. All that is needed in preparation is a fitness function and a list of potentially useful functions, variables, and constants. For example, when dealing with logic problems, potential useful functions might include: AND, OR, NOT, and XOR. These primitive functions are implemented as part of the GP environment. Input variables are also defined as required. For example, if solving a 5 bit parity problem, we might define variables Bit1 through Bit5. However, GP has some shortcomings as well. In particular, GP evolves functions rather than complete programs. Teller uses indexed memory and iterative function calls to provide the capability of evolving algorithms (Teller 1995); however, this application is currently limited to pattern recognition and produces non-structured programs. Another problem with GP is the difficulty of interpreting results. It is not uncommon for the optimal solution to contain sections of code which does nothing useful. It is very difficult to interpret the results of stack based GP because the functions are usually written to ignore errors (such as stack underflow), and sometimes items are placed on the stack and never used. Once an optimal solution has evolved, some effort remains in simplifying the result to produce an equivalent solution that is easier to understand. I.ESP - Evolutionary Structured Programming ESP attempts to extend GP so that complete, modular, structured programs can evolve. In order to do this, a number of Structured Programming constructs are defined. Random construction, cross-over, and mutation are performed with a knowledge of these constructs. This is similar to the Smart Cross-Over and Smart Mutation operations suggested by (Teller 1995). Details of how this occurs is explained later. A.ESP Structured Programming Constructs The following constructs are defined within ESP: STATEMENT This basic component is the equivalent to a line of source code in many languages. It can be an assignment statement, a subroutine call, or any of the following structured programming constructs. BLOCK A statement block consists of some number of statements in sequential order. IF boolean block This statement will execute the statements in block if the boolean expression evaluates to a true (non-zero) value. IFELSE boolean block1 block2 If the expression boolean is non-zero, then block1 is executed; otherwise block2. CASE casevar range1 block1 ... rangen blockn This statement consists of a scalar variable, casevar, and a series of pairs consisting of a range of values for casevar, and a block of code. Depending on the current value for casevar, at most one block will be executed. WHILE boolean block The set of statements defined by block are executed as long as the boolean expression is non-zero. LOOPTHRU array loopvar block This statement causes an iterative loop through block with loopvar ranging through all valid subscript values of array. IFCONTINUE boolean This statement can only appear within loops. If the value of boolean is non-zero, then the remainder of the loop statements are ignored, and the loop is restarted. If within a LOOPTHRU loop, loopvar is incremented. IFBREAK boolean This statement can only appear within loops. If the value of boolean is non-zero, then the loop is terminated, and execution resumes at the statement after the loop. If within a LOOPTHRU loop, the current value of loopvar is preserved. FUNCTION type name mode1 type1 argument1 ... moden typen argumentn This statement defines the function name that returns type data and takes the specified arguments. In addition to data type, the arguments are encoded with a mode: input, output, or input/output. Data types are integer, floating point, character, structure, and void. Data elements can be constants or variables. The void data type is only used for function types, indicating that the function returns no meaningful data (i.e., a subroutine). Void functions can't appear in assignment statements. Structures are a collection of primitive data types. The user may define a set of global variables and constants. Typically this would consist of the input data, output data, and otherwise useful data. The user may also define a set of functions that are part of the ESP environment and which may be used when generating the initial population as well as mutation operations. Block statements can be comprised of any of the above statement types, including other block statements. Boolean statements include relational operators (e.g., greater than, not equal to, etc.) and can be combined with and and or operators. Functions can contain local variables which are static (retain their values between function calls) or dynamic. They can also contain constants. It should be obvious that this language is Turing complete, and be capable of solving any problem that traditional GP can solve. The program in Figure 1 could be used to solve the even parity problem for any number of bits. Assume that the array Input contains the values of the bits (one or zero), and RetVal is to contain zero if the bits conform to even parity, and one otherwise. RetVal = 0 LoopThru Input x RetVal = RetVal + Input[x] RetVal = mod(RetVal, 2) Figure 1 - Generic even parity program When the program terminates, the variable RetVal contains zero if there are an even number of ones in the array Input, and one if there are an odd number. By using cross-over and mutation as other GA and GP programs do, ESP should be able to evolve the above program. However, humans don't program by making random changes to their programs. There are a number of "rules of thumb" that experienced programmers often develop to aid them in writing new programs. By dynamically altering reproduction probabilities at certain points, we might be able to get programs to evolve similar to those written by a human programmer. I.Dynamic Probability GA and GP have demonstrated that random changes combined with fitness evaluation can evolve solutions to a wide variety of problems. The selection of parents for mating is done by biasing the probabilities so that the more fit individuals breed more often. However, when selecting portions of the parents to use in reproduction, the probability that one spot be selected is often about the same as any other spot. Likewise, mutation may occur at any location and the probability that a specific value be chosen as the new value is the same as any another value. By considering how humanly created programs are organized, it is possible to change the probability functions used to select items. For example, when in a LoopThru statement, the probability that array[loopvar+c] (where c is a small integer value, usually zero) appears in a statement should be greater than when outside the loop. When inside a function, the probability should be high that output (or in/out) arguments will be modified. By modifying the probabilities of various items being selected depending on the current context, the probability increases that the resulting program is not only valid but useful. In addition to context sensitive variation, the probability function would also change based on whether or not certain data values have already been referenced. For example, when building a function, the probability that any function argument be chosen would increase until the argument is referenced. Once referenced, the probability of that argument being chosen would be the same as the other arguments that have been referenced. I.Using ESP to evolve programs A.Initial random population The initial population is generated randomly. Each individual will consist of some random number of functions (possibly zero) and a main program. The functions defined for this individual as well as any global variables and constants may appear at any place within the individual. The main program and local functions consist of a single statement block. A random number is generated to determine the length of each statement block. This length indicates the number of top level statements. Statement types are selected randomly. Depending on the type of statement, additional (lower level) statement blocks may be generated randomly. Also, depending on the type of statement, the group of possible elements and the probabilities associated with each element will change. For example, if a LoopThru statement is to be generated, an array would be selected at random from local and global arrays. The probable order would favor: arrays passed as function arguments, local arrays, and global arrays in that order. Next a scalar variable would be selected at random. The probable order would highly favor a local variable, but could also select an output or in/out function argument, or a global variable. Next, a statement block would be randomly generated. While constructing this block, the probability that the array and loop variable are used will be increased until it is first referenced. A.Cross-Over Cross-over is done by selecting two groups statements within each parent. Children are created by exchanging the two groups of statements. When selecting statements from an individual, the two end statements would come from the same hierarchical level. This process is shown in Figure 2. Once cross-over occurs, the newly inserted statements must be checked for syntax errors. For example, the new code in Parent A might refer to a function that was created in Parent B, but doesn't exist in Child A. In cases such as this, the function call can either be changed to call another valid function in Child A, the referenced function could be copied from Parent B, the function could be copied to a global library, or the statement could be eliminated. A.Mutation Mutation must be careful not to introduce invalid statements. Mutation would usually change elements to related elements, as proposed by (Teller 1995). For example, "<" might be changed to ">", "<=", etc. more often than to arithmetic operators such as "+", "-", etc. Mutation might also insert new statements or delete existing statements. Randomly select a statement at the top level of one parent. If the chosen statement contains lower level statements, randomly decide whether to descend another level and pick a statement. Repeat as needed. Once the first statement has been chosen, randomly pick a second statement at that level. Repeat the above process for the other parent. Generate two children by swapping the set of statements from each parent. Figure 2 - Cross over algorithm A.Timing Considerations It is possible to evolve programs that contain infinite loops. Because of this, a maximum execution time is defined for program runs. If a program hasn't finished before that time, it will be terminated. Regardless of how it was stopped, it will be evaluated with the fitness function. While execution time would be a good measurement to use as part of the fitness function, it must be done so that short but useless programs are not considered better than long useful programs. A.Interpretation of Results As previously mentioned, once a candidate solution has evolved, the next step is to interpret the results and simplify the leading candidates. Due to the structured nature of ESP, it is hoped that evolved solutions will be inherently easier to interpret. It would also be fairly easy to translate ESP programs into C or some other structured language, and allow an optimizing compiler to simplify the evolved program. I.Conclusion This paper has outlined an approach that should be capable of evolving structured programs. A language has been defined that enforces structured programming style and is Turing complete. An approach to cross-over and mutation has been presented that reduces the probability of invalid syntax during evolution. A dynamic probability has been presented that should help programs evolve in a manner closer to those written by humans. Although considerable details have yet to be resolved, ESP is nevertheless a feasible approach. I.References Angeline, P., Pollack, J. (1993) Competitive Environments Evolve Better Solutions for Complex Tasks. Genetic Algorithms: Proceedings of the Fifth International Conference (GA93). (*) Amos, M. (1995) An Introduction to Natural Computation (preliminary draft) (*) Beasley, D., Bull, D. R., Martin, R. R. (1993) An Overview of Genetic Algorithms: Part 1, Fundamentals. University of Cardiff (*) Holland, J. H. (1975) Adaptation in Natural and Artificial Systems. University of Michigan Press. Revised Second Edition 1992 from the MIT Press, Cambridge MA. Koza, J. R. (1992) Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge MA. Koza, J. R. (1994) Hierarchical Automatic Function Definition in Genetic Programming. Stanford CA. (*) Koza, J. R. (1995) Home page: http://www-cs-faculty.stanford.edu/~koza Montana, D. J. (1993) Strongly Typed Genetic Programming, BBN Technical Report (*) Openshaw, S., Turton, I. (1995) Building New Spatial Interaction Models Using Genetic Programming. University of Leeds, UK (*) Perkis, T. (1994). Stack-based genetic programming, IEEE World Congress on Computational Intelligence. (*) Rosca, J., Ballard, D. (1994) Genetic Programming with Adaptive Representations. University of Rochester. (*) Teller, E. (1994) Genetic Programming, Indexed Memory, The Halting Problem, and Other Curiosities. Carnegie Mellon University (*) Teller, E. (1995) PADO: A New Learning Architecture for Object Recognition. Carnegie Mellon University (*) Wawra, S. (1995) An Investigation of Stack Based Genetic Programming (not yet published) Whitley, D. (1993) A Genetic Algorithm Tutorial, Colorado State University (*) (*) available from a number of GA and GP ftp sites. I.ACKNOWLEDGMENTS The author wishes to gratefully acknowledge the assistance of numerous members of the genetic programming mailing list, as well as the GA-PHD mailing list for their help in obtaining references of published works, as well as sharing of information about their current (and in some cases, unpublished) work. Teller, Turton, and Wawra were especially generous with their time and comments.