Highlight

Những điều thú vị khi dùng Trí tuệ nhân tạo của Viettel

Những người dùng Internet tại Việt Nam thường lấy “chị Google” ra để… giải trí. Khi “chị” đọc văn bản hay chỉ đường cho người tham gia gi...

Tuesday, May 30, 2017

A Look into the Future - Source Code Generation by the Bots

Introduction

Software development technologies have been evolved to fulfill growing demand and capabilities of hardware and methodologies. Role of automation in software development plays a more and more significant role in order to simplify manual labor. Examples of the software development automation include systems of code build managers, static code analyzers, tests, and so on.
On the other hand, researches in the field of artificial intelligence and robotics show significant progress. The family of software that are named “bots” is intensively growing. The bots implement predefined logic, and their power is limited practically by the hardware only. Modern software development technologies can be applied for building bots that in their turn can perform software development tasks.
This article describes a general conception of creating Software Development Bots and tries to show the conception from perspectives of modern software development approaches. The description is supplied by two practical examples demonstrating the idea.
The purpose of this article is inspiration of readers to study modern programming techniques and apply them in real life. Through this article, the author has just showed his small idea and hopes that readers help him to improve the article by giving him their feedback.

How to Read this Article

The reader, for his/her convenience, should pay attention to two main key sections of the article:
  • "The Idea" contains theoretical description
  • "The Bot" describes the practical realization of two Software Development Bots
Thus, the reader who wants to see how the bots work by running the examples can read chapter "The Bot" first. The reader can also look at chapter "Contents" to find the section he/she wants.

Abbreviations

ATDCGAutomated Test-Driven Code Generation
SDBSoftware Development Bot
TDCGTest-Driven Code Generation
TDDTest-Driven Development

Contents

The Idea

Evolution of Software Development

The initial question that has inspired the author to make a research in field of SDBs was read as "How the software development technologies are being developing?". Understanding of trends of evolution of the software development technologies is important because this knowledge helps in creating vision of the future. The previous general stages of programming paradigms (programming techniques) can be briefly described as the following:
  1. Sequence of commands. The early stage in programming era when hardware was weak and a single programmer developed a single program. The programs were developed consecutively in a non-structured way.
  2. Structured programming. Growing of scope of the problems to be solved required growing of programming code volume and better management of this volume. The codes were combined into block structures.
  3. Object Oriented Programming. Code block structures grew and finally were combined into new kind of structures named Objects. Number and complexity of relations between the code blocks also grew. New types of relations that are peculiar to the objects were introduced (encapsulation, inheritance, polymorphism, access levels, and so on). The programming model reflected features of the complex system where different standalone units have state and are capable of performing specific operations within one common environment and stepped close to the actors and processes of the real life.
The questions are: what is the next step, in what direction do programming methods evolve?

Simplified TDD Procedure

TDD is an evolutionary approach to development which combines test-first development where a test is written before development or refactoring of the production code. TDD is not a simple programming technique but both an important agile requirement and agile design technique. The goal of TDD is to write clean code that works.
TDD is a convenient way to explain the main idea of the article. Taking TDD as a basis for the further explanation gives a clear path to describe the methodology of TDCG.
Full TDD process contains phases where code is being refactored and the tests assure correct operation of the code. In case where creating of new code is only needed, but not refactoring is supposed, the TDD procedure can be simplified. The considered simplified TDD procedure can have the following phases: creating the tests first, developing the software unit, and possibly repeating development tasks until the code passes the test phase.
The following diagram presents possible simplified TDD process.
Test Driven Development
Almost all information sources describe TDD tasks paying attention to the fact that some processes can be automated that reduce manual work. There is one important characteristic that should be emphasized. Automated tasks are run by the proven algorithms and are stable, but manually performed tasks can introduce errors. In other words, developers of the tests and software are responsible for quality of their work. Thus main tasks of the considering simplified TDD procedure can be described from the point of view of responsibility:
  • Creating the tests – is the key task that can impact the quality of the entire developed software unit.
  • Running the tests – is the proven automatic process that does not introduce errors itself (in practice / we assume it).
  • Development of the software – is performed by the programmer but the tests assure the quality of his/her work that reduce responsibility of the programmer for his/her work.
Although the article includes consideration of reducing programmer's responsibility for his/her work, it means that the quality of the work is assured only by the tests. Other quality indexes, such as code readability, time spent for development, code performance are not considered in this article for keeping the explanation simple.

Test-Driven Code Generation

Raphael Marvie describes Test-Driven Code Generation (TDCG) technique in his article "An Introduction to Test-Driven Code Generation" [Marvie. 2006]. He described TDCG in the context of TDD process.
The current article considers automatic code generation as a separate technique that is not a part of TDD. But, the considered technique has a close relation to TDD and other information technologies. Since the automated code generation that is being considered is based on the previously written tests also, it can be named as Automated Test-Driven Code Generation ATDCG.
The above mentioned reasoning shows that only running the tests is automated, but creating the tests and developing of the software are manual tasks in TDD (strongly speaking, these tasks can be / are automated partially, but not completely). Due to the fact that the quality (passing the tests) of the software development tasks is controlled automatically, the development can also be completely automated.
Role of humans can be contracted only to create the tests. From the business perspective, it means that humans will create and describe requirements for the software units and the units will be created automatically according to these requirements.
Thus, the automatic software development can be presented as an evolution of simplified TDD process that was considered above.
Test Driven Development / 2 Components

Automated Testing

Classical approaches of automated software testing have a characteristic named “code coverage”. Code coverage is a measure used to describe the degree to which the source code of a program is executed when a particular test suite runs. Small value of the code coverage means that there is code that does not automatically tested.
ATDCG technique can create code that is fully covered by the tests, because the code is created from scratch. In other words, it has 100% code coverage. Thus, if the code coverage is a good indicator for classical testing approach, it cannot be used to measure success for the ATDCG.
New indicator that measures actual business problem coverage by the tests should be introduced. This fact moves measurement of the success from software development sphere to the system/business analysis.
This measure will not be discussed in this article to keep simplicity of reading and more narrowly follow the subject. But the fact that there are issues requiring further research that should be emphasized.

Role of Tests in ATDCG

Creating tests in ATDCG is the only task performed manually. And the tests are the main driver of the automatic software unit creation process and at the same time, assure quality of the process. Thus, the tests in ATDCG have the following characteristics:
  • Entire ATDCG process and quality of the result are dependant on how the tests were created
  • Creating the tests is the major and sensitive task in ATDCG
  • Creating the tests has more close relations to the business requirements and more far from creating software codes than classical approach
  • There is demand to perform research and develop methodology and rules of creating the tests for ATDCG

Machine Learning

As it was mentioned, the quality of units being developed by ATDCG completely depends on the tests. Creating of the tests in its turn relates to a bigger extent to the business requirements than to the programming. Most likely, the tests will contain a large number of rules. From other point of view, the tests will be described in a declarative way and contain or refer to a large number of data.
Because the tests creation process relates to the business requirements, these data should have abstraction level that is high enough to give the test creator a convenient way to develop tests based on the business requirements. High abstraction level and large number of data for the test design will lead to increase of complexity compared to the classical testing approach.
Thus, creating of the tests and running of the ATDCG can be described as:
  • Creating test in declarative form and considering amount of real physical data (for example, indexes of the business process)
  • Consider created tests or part of it as the training data for automatic creation of the software unit
These observations show a strong relation of ATDCG to Supervised Machine Learning. Therefore, theory and techniques of Supervised Machine Learning can be used in ATDCG.

Social Impact

ATDCG supposes automation of software development tasks and free programmers from this role. At the same time, the role of the analyst and creator of the tests becomes the key for successfully running the ATDCG. But ATDCG requires development of itself. Moreover, development of ATDCG will obviously meet two rules of Unix programming [Raymond. 2003]:
  • Rule of Economy: Programmer time is expensive; conserve it in preference to machine time
  • Rule of Generation: Avoid hand-hacking; write programs to write programs when you can
Therefore, ATDCG can be considered as a software development technology that opens additional opportunities but not a limitation, nor an overturn. At the same time, it also opens opportunities and demand in further research on the matter.

The Bot

A bot is a "software robot" (Wikipedia). This section describes creation of two SDBs:
  • Bot 1 - The simplest example that demonstrates key concept of the SDB. Performance of this bot does not allow to get real result in reasonable time period
  • Bot 2 - The bot with improved performance that produces results in 5-10 minutes (Intel I7 CPU)

The Problem

The article is trying to give an explanation that is as simple as possible to present the general conception of the technique. In this context, the minimal software unit can be just a simple function represented by one code line written on the high level language.
Both examples have purposes of creating a simple function of type f(a,b) on C# language based on training data. The test will check equivalence of output value of the function for input in each case.

Training Data

Input arguments (a and b) and output values (f) are presented in the following table:
abf
3821463
41180
5172645
1232313
621150
2038599
2543849
319113
40341779
Actually, these data were built using polynomial function f=a2+5b+9. This fact allows to check correct operations of created SDBs. In the real world, the objective function as well as test is most likely unknown and can be very complex.
The bot should find the function that is equal to the mentioned polynom. But the result can be any combination of its summands. See chapter Results to see that the bot discovers several functions implementing calculations according to the training data.

Bot 1

As it was mentioned above, Bot 1 is the simplest example that demonstrates key concept of the SDB. Performance of this bot does not allow to get real result in a reasonable time period. But the code is the simplest among other possible solutions.
The bot creates function of type f(a,b) on C# language that has a look as the following snippet:
namespace Program
{
    public class WorkingClass
    {
        public static int F(int a, int b)
        {
            return XXXXXXXXXXXXXXXX;
        }
    }
}
This snippet contains the code inclusion implementing the function formula in place marked as XXXXXXXXXXXXXXXX. The bot should find this code line. All other text is known and forms complete C# program that can be compiled into the library.
The code inclusion is built using the alphabet consisting of four arithmetic operations (-+*/), two variables (aand b), and nine digits (123456789).
The algorithm that is implemented by the bot is simple as well:
  • Step 1. Building the code by insertion into the place XXXXXXXXXXXXXXXX the symbols that are randomly selected from the alphabet
  • Step 2. Compile the code into RAM. If compilation fails, go to Step 1
  • Step 3. Run the function for training dataset
  • Step 4. Compare the actual function output and results from training dataset. Go to Step 1 if values for any case differ
  • Step 5. Print the found function
The algorithm was implemented on C#, using reflection, and dynamic code generation.

Performance

Performance of the simple algorithm is dramatically poor. Number of different combinations of symbols taken from the alphabet is about 41 mln. (if we use 4 operations and 5 variables). Simple iteration requires huge computation volume. Such big number of iterations cannot be performed on modern personal computer in reasonable time period. The author managed to find one solution by running 4 instances of bot on Intel I7 4-core CPU in 10 days.
Although Bot 1 has extremely poor performance, its source code is very simple. Its code is not included into the article as the text, but it could be downloaded (see chapter Using Code). The code is quite simple and commented and the author hopes it is readable enough to understand how it works.

Bot 2

Bot 1 can be used to understand the easiest algorithm of automatic creating software code and study required programming language commands. But it cannot be used to see the result conveniently. The author included improved version of the same bot and named it as Bot 2 to get the result in reasonable time period.
Bot 2 implements Genetic Algorithm that significantly reduces time to find the solution. It allows to get the result in 5 - 20 minutes on Intel I7 CPU running one thread. Bot 2 is written using a bit more complex code. But the code is also commented and should not be difficult to understand.

Results

Several results generated by Bot 2 are presented on the pictures below. The reader can pay attention to the generated C# code and running time. The reader can download the source code and run the example himself (see chapter Using the Code).
The bots can be easily upgraded to write created code directly into *.cs file and even create Visual Studio project structure. The author did not do such an upgrade trying to keep the source code simple.
Results 1
Results 2
Results 3
Results 4
Results 5

Future Development

The current article shows the possibility of creating SDBs. But it cannot be used in practice now. It can benefit and take benefits from relating technologies, such as the Machine Learning, Data Mining, TDD, Automatic Code Generation, and so on.
Future development of ATDCG can involve experience of building Compilers and Code Analyzers. Mature bots could create software units avoiding use of high level languages and compilation stage. They could create the units that are composed from Microsoft Intermediate Language (binary representation) commands or Java byte codes. This can significantly speed up the code generation process.
At the moment, the ATDCG methods can be studied and developed by enthusiasts. Researches on the matter should be also take place. The theoretical research will create a base to understand the benefits of ATDCG and how to implement them practically.

Using the Code

The code includes Bot 1 and Bot 2 that were created and run in the following environment:
  • Development: Microsoft Visual Studio Community 2015
  • Compilation: Microsoft .NET Framework 4.5, Any CPU
  • Platform: Microsoft Windows 8.1, 64-bit, CPU: Intel I7
The reader can download examples either at the top of the article or right here:
Both examples use one thread when run. Several instances can be run simultaneously, but it runs CPU in mode with maximum power consumption. Such load can overheat computer if it is not designed for intensive CPU use for long time calculations. Bot 2 produces result quite quickly and should not overload the hardware. But the author does not recommend to run several instances of Bot 1 for a long time period.

No comments:

Post a Comment