Test Generation Using GitHub Copilot

by John Miller | December 13, 2024

Using AI in a production application is risky, particularly if you are new to working with NLM- generated code. This post will show how you can get started using AI in a production application by leveraging AI's ability to generate test automation.

A Good Place to Start

If you're hesitant to apply AI directly to a production codebase, that hesitation is well- grounded. We are at a point where NLMs are constantly reminding us of their fallibility. The consequences of introducing a serious defect into a fragile codebase is enough to put someone off.

Neglected Code

Test automation is one area of your code that is begging for a little help from AI. If the codebases that you work in are anything like the codebases I work in, test coverage is always lacking in one way or another, if it exists at all. Leveraging AI in test automation is a good way to add value without incurring significant risk.

Safe Because Test Automation Is Not Deployed to Production

The goal of good test automation is to prevent defects from reaching production and it is not intended to be deployed to production. Faulty test automation could return a false positive, failing when it shouldn't, which is a broken test. That's an inconvenience but it is not a major issue. The other possibility is a false negative, the test should have failed but didn't. This is more serious, but is essentially the same as not having a test. Likely the state you're in now.

An advantage test code has over application code is that in general, test code is simpler than application code. Typically unit test code exercises the interfaces of the code under test and examines the values returned by the code under test. Effective use of stubs and mocks excludes dependent code further simplifying the test. This makes NLM-generated test automation easier to review for completeness and correctness.

Test Automation Can Expose Design and Architecture Issues

Asking an NLM to generate test automation for your code has the added advantage of providing an extra “pair of eyes” on your code. To an extent, the quality of the generated test code is correlated to the quality of the code being tested. Tests that are simple and straight forward are an indication that the code is testable. If the NLM has trouble generating tests or produces test that are complex or incomplete, it's a sign that the code is not as testable as it could be and that some refactoring may be in order.

Enables Refactoring

Of course the major benefit from generating tests with an NLM is that, in the end, you have tests! And having tests is an enabler for refactoring and making improvements to the codebase.

Scoping Statement

In addition to generating tests to increase code coverage, AI can increase the efficiency, accuracy, and scalability of a test suite. It can run tests faster and more precisely, reducing testing cycles. With predictive analytics, AI can prioritize tests likely to uncover defects, improving overall effectiveness. However to realize these benefits you need tests and this post is focused on unit test generation. We'll get to these other benefits in due time as well generating other types of tests.

Further Scoping Statement

There are many ways to generate tests. In this post I'll focus on using GitHub Copilot to demonstrate the ins and outs of generating test automation and I'll discuss Copilot's strengths and weaknesses for this. The concepts are applicable to other NLMs but we won't be comparing them here.

Why Copilot

GitHub Copilot is a reasonable choice for generating tests. It is supported in VS Code, Visual Studio, IntelliJ IDEA, Vim, and other IDEs. Integration into an IDE is advantageous as the IDE can expose the code to the NLM making it easy to tell the IDE to generate tests using the NLM for a function, method, class, etc.

Our Complex Code That Needs Testing

To demonstrate generating automated tests we need code to test. More specifically, we need code that is not trivial and that has some degree of complexity to challenge the NLM's test generation capabilities. Rather than trying to extract a snippet from an existing codebase or creating something from scratch, I handed the task off to Copilot.

My first attempt was with the prompt: “Can you create a C# class that has a high cyclomatic complexity?” which produced a complex class which would suffice but was based on numeric comparisons and not very readable. I added that “I'm looking for a real-world example” and Copilot produced a class similar to code below, which is much more suitable.

using System;

namespace BankAccountApp
{
    public class BankAccount
    {
        public decimal Balance { get; private set; }
        public bool IsActive { get; private set; }
        public BankAccount(decimal initialBalance)
        {
            Balance = initialBalance;
            IsActive = true;
        }

        public decimal MakeDeposit(decimal amount)
        {
            ProcessTransaction("Deposit", amount);
            return Balance;
        }

        public decimal MakeWithdrawal(decimal amount)
        {
            ProcessTransaction("Withdraw", amount);
            return Balance;
        }

        public virtual void ProcessTransaction(string transactionType, decimal amount)
        {
            if (!IsActive)
            {
                Console.WriteLine("Account is inactive.");
                return;
            }
            switch (transactionType)
            {
                case "Deposit":
                    if (amount > 0)
                    {
                        Balance += amount;
                        Console.WriteLine($"Deposited: {amount}");
                    }
                    else
                    {
                        Console.WriteLine("Invalid deposit amount.");
                    }
                    break;
                case "Withdraw":
                    if (amount > 0 && amount <= Balance)
                    {
                        Balance -= amount;
                        Console.WriteLine($"Withdrew: {amount}");
                    }
                    else if (amount > Balance)
                    {
                        Console.WriteLine("Insufficient funds.");
                    }
                    else
                    {
                        Console.WriteLine("Invalid withdrawal amount.");
                    }
                    break;
                default:
                    Console.WriteLine("Invalid transaction type.");
                    break;
            }
            if (Balance < 0)
            {
                IsActive = false;
                Console.WriteLine("Account deactivated due to negative balance.");
            }
        }
    }
}

The ProcessTransaction method in the BankAccount class handles two types of transactions: deposits and withdrawals. It first checks if the account is active. If not, it prints a message and exits. For each transaction type, it validates the amount and updates the balance accordingly, printing appropriate messages for each case. If the balance becomes negative, the account is deactivated.

This class is complex enough to require a good amount of test automation to provide adequate code coverage. Tests are needed for each transaction type and each transaction type has different conditions that need to be tested. Also the IsActive flag and the negative balance conditions need testing.

Copilot gave us complex code, not necessarily clean code and there are opportunities to improve this legacy code. But before we might attempt improvements we should have tests for the current implementation so that we can be confident that any refactoring doesn't introduce defects.

Using Copilot to Write Unit Tests

Copilot has several ways to generate tests. I'm focusing on using the Copilot integration with Visual Studio Code which is fairly representative of other IDEs and techniques.

Generating Tests in VSCode

In VSCode there are two primary ways to generate tests. You can enter commands in Copilot chat or you can use the right-click menu in a code file and select to generate tests.

Generate Tests for the MakeDepost Method

To generate tests for a single method or function in the Copilot chat, enter a prompt asking Copilot to generate tests for the method or function. For example, the prompt: Generate unit tests for the MakeDeposit method of the BankAccount class produced the code:

using Xunit;
using BankAccountApp;

namespace BankAccountTests
{
    public class BankAccountTests
    {
        [Fact]
        public void MakeDeposit_ShouldIncreaseBalance()
        {
            // Arrange
            var account = new BankAccount(100m);

            // Act
            var newBalance = account.MakeDeposit(50m);

            // Assert
            Assert.Equal(150m, newBalance);
            Assert.Equal(150m, account.Balance);
        }

        [Fact]
        public void MakeDeposit_ShouldNotChangeBalance_WhenAmountIsNegative()
        {
            // Arrange
            var account = new BankAccount(100m);

            // Act
            var newBalance = account.MakeDeposit(-50m);

            // Assert
            Assert.Equal(100m, newBalance);
            Assert.Equal(100m, account.Balance);
        }
    }
}

Add Mocking to Isolate Dependencies

The MakeDeposit method calls the ProcessTransaction method and returns the new balance. As a unit test of the MakeDeposit method, the functioning of the ProcessTransaction method should not effect the testing of the MakeDeposit method and clearly both tests depend on an expected behavior of the ProcessTransaction method in order to pass.

We can address this with another prompt: Can you mock the ProcessTransaction method in every MakeDeposit test? which produced new tests that leverage the Moq framework and reflection to remove the dependency on the ProcessTransaction method:

using Xunit;
using Moq;
using BankAccountApp;
using System.Reflection;

namespace BankAccountTests
{
    public class BankAccountTests
    {
        [Fact]
        public void MakeDeposit_ShouldCallProcessTransaction()
        {
            // Arrange
            var mockAccount = new Mock<BankAccount>(100m) { CallBase = true };

            // Act
            mockAccount.Object.MakeDeposit(50m);

            // Assert
            mockAccount.Verify(a => a.ProcessTransaction("Deposit", 50m), Times.Once);
        }

        [Fact]
        public void MakeDeposit_ShouldIncreaseBalance()
        {
            // Arrange
            var mockAccount = new Mock<BankAccount>(100m) { CallBase = true };
            mockAccount.Setup(a => a.ProcessTransaction("Deposit", 50m)).Callback(() =>
            {
                var balanceProperty = typeof(BankAccount).GetProperty("Balance",
                BindingFlags.Public | BindingFlags.Instance);
                balanceProperty.SetValue(mockAccount.Object, 150m);
            });

            // Act
            var newBalance = mockAccount.Object.MakeDeposit(50m);

            // Assert
            Assert.Equal(150m, newBalance);
            Assert.Equal(150m, mockAccount.Object.Balance);
        }

        [Fact]
        public void MakeDeposit_ShouldNotChangeBalance_WhenAmountIsNegative()
        {
            // Arrange
            var mockAccount = new Mock<BankAccount>(100m) { CallBase = true };
            mockAccount.Setup(a => a.ProcessTransaction("Deposit", -50m)).Callback(() => 
              { /* No change in balance */ });

            // Act
            var newBalance = mockAccount.Object.MakeDeposit(-50m);

            // Assert
            Assert.Equal(100m, newBalance);
            Assert.Equal(100m, mockAccount.Object.Balance);
        }
    }
}

Explaining the Tests

The MakeDeposit_ShouldCallProcessTransaction test is designed to verify that the MakeDeposit method of the BankAccount class correctly calls the ProcessTransaction method with the appropriate parameters. It isolates the ProcessTransaction method by creating a mock of the BankAccount class and then verifying that the ProcessTransaction method of the mocked BankAccount is called once with the expected parameter value.

The MakeDeposit_ShouldIncreaseBalance method is designed to verify that the MakeDeposit method of the BankAccount class correctly increases the account balance. It isolates the ProcessTransaction method by creating a mock of the BankAccount class and implements a mock of the ProcessTransaction method that assigns the expected value to the BankAccount.Balance property using reflection.

The MakeDeposit_ShouldNotChangeBalance_WhenAmountIsNegative test is similar to the MakeDeposit_ShouldIncreaseBalance test but does not change the balance. Since the MakeDeposit method has no code that behaves differently when the amount is negative it would be reasonable to drop this test.

Generating Tests for the ProcessTransaction Method

I used a similar prompt to Generate unit tests for the ProcesTransaction method of the BankAccount class, which generates these additional tests:

[Fact]
public void ProcessTransaction_ShouldIncreaseBalance_OnValidDeposit()
{
    // Arrange
    var account = new BankAccount(100m);

    // Act
    account.ProcessTransaction("Deposit", 50m);

    // Assert
    Assert.Equal(150m, account.Balance);
}

[Fact]
public void ProcessTransaction_ShouldNotChangeBalance_OnInvalidDeposit()
{
    // Arrange
    var account = new BankAccount(100m);

    // Act
    account.ProcessTransaction("Deposit", -50m);

    // Assert
    Assert.Equal(100m, account.Balance);
}

[Fact]
public void ProcessTransaction_ShouldDecreaseBalance_OnValidWithdrawal()
{
    // Arrange
    var account = new BankAccount(100m);

    // Act
    account.ProcessTransaction("Withdraw", 50m);

    // Assert
    Assert.Equal(50m, account.Balance);
}

[Fact]
public void ProcessTransaction_ShouldNotChangeBalance_OnInvalidWithdrawal()
{
    // Arrange
    var account = new BankAccount(100m);

    // Act
    account.ProcessTransaction("Withdraw", -50m);

    // Assert
    Assert.Equal(100m, account.Balance);
}

[Fact]
public void ProcessTransaction_ShouldNotChangeBalance_OnInsufficientFundsWithdrawal()
{
    // Arrange
    var account = new BankAccount(100m);

    // Act
    account.ProcessTransaction("Withdraw", 150m);

    // Assert
    Assert.Equal(100m, account.Balance);
}

[Fact]
public void ProcessTransaction_ShouldDeactivateAccount_OnZeroBalance()
{
    // Arrange
    var account = new BankAccount(50m);

    // Act
    account.ProcessTransaction("Withdraw", 50m);

    // Assert
    Assert.False(account.IsActive);
    Assert.Equal(0m, account.Balance);
}

[Fact]
public void ProcessTransaction_ShouldNotProcessTransaction_OnInactiveAccount()
{
    // Arrange
    var account = new BankAccount(0m);
    account.ProcessTransaction("Withdraw", 1m); // Deactivate account

    // Act
    account.ProcessTransaction("Deposit", 50m);

    // Assert
    Assert.Equal(0m, account.Balance);
}

Running these tests shows that two of these tests, ProcessTransaction_ShouldNotProcessTransaction_OnInactiveAccount and ProcessTransaction_ShouldDeactivateAccount_OnZeroBalance are failing. These tests both depend on the state of the IsActive flag. A closer look at the code shows that the IsActive flag is set after the transactions types are handled when the balance is less than zero. The “Withdraw” transaction has a condition that prevents updating the balance if the amount is greater than the balance. The balance can never be zero and the IsActive flag will never be cleared. Copilot found a bug!

Copilot Conundrums

I've found that Copilot generated tests sometimes fail due to problems with the test implementation. The test will raise an unexpected error or an assertion will fail unexpectedly. I've found that using the error message or test output as a prompt in the same context will eventually resolve the problem. I ran across this when adding mocks to the MakeDeposit method. At first the test failed trying to set the Balance in the mock. The Balance is private and can only be set within the class. Eventually Copilot arrived at a solution using reflection.

I've also found that when Copilot has trouble generating tests it can be an indication that the code is not easily testable. I've found instances where Copilot struggles to generate working mocks or that the mocking required to properly test the code is excessive. If you find that the complexity of the test code is equal to or exceeding the code under test, you may have no choice but to refactor the code before generating tests.

Conclusion

Leveraging GitHub Copilot for test automation offers numerous benefits, including increased test coverage, efficiency, and accuracy. By generating tests, Copilot can help identify design and architecture issues, enabling refactoring and other improvements to the codebase. Copilot, integrated with popular IDEs, provides a practical solution for generating unit tests, as demonstrated with the BankAccount class example. While AI-generated tests may sometimes require adjustments, they ultimately contribute to a more robust and maintainable codebase. Embracing Copilot for test automation can significantly enhance the development process, making it more efficient and reliable.

Disclaimer

AI contributed to the writing of this blog post.

Prompts:

  • Create a summary paragraph for why AI is good for test automation.
  • Create an outline for why AI is good for test automation.
  • What IDEs does GitHub Copilot support?
  • Can you create a C# class that has a high cyclomatic complexity?
    • I'm looking for a real world example
  • @workspace /explain the MakeDeposit_ShouldCallProcessTransaction method of the BankAccountTests class
  • @workspace /explain the MakeDeposit_ShouldIncreaseBalance method of the BankAccountTests class
  • Generate a conclusion for this post