Test Generation Using GitHub Copilot
by John Miller | December 13, 2024
Using AI in a production application is risky, particularly if you are new to working with NLM- generated code. This post will show how you can get started using AI in a production application by leveraging AI's ability to generate test automation.
A Good Place to Start
If you're hesitant to apply AI directly to a production codebase, that hesitation is well- grounded. We are at a point where NLMs are constantly reminding us of their fallibility. The consequences of introducing a serious defect into a fragile codebase is enough to put someone off.
Neglected Code
Test automation is one area of your code that is begging for a little help from AI. If the codebases that you work in are anything like the codebases I work in, test coverage is always lacking in one way or another, if it exists at all. Leveraging AI in test automation is a good way to add value without incurring significant risk.
Safe Because Test Automation Is Not Deployed to Production
The goal of good test automation is to prevent defects from reaching production and it is not intended to be deployed to production. Faulty test automation could return a false positive, failing when it shouldn't, which is a broken test. That's an inconvenience but it is not a major issue. The other possibility is a false negative, the test should have failed but didn't. This is more serious, but is essentially the same as not having a test. Likely the state you're in now.
An advantage test code has over application code is that in general, test code is simpler than application code. Typically unit test code exercises the interfaces of the code under test and examines the values returned by the code under test. Effective use of stubs and mocks excludes dependent code further simplifying the test. This makes NLM-generated test automation easier to review for completeness and correctness.
Test Automation Can Expose Design and Architecture Issues
Asking an NLM to generate test automation for your code has the added advantage of providing an extra “pair of eyes” on your code. To an extent, the quality of the generated test code is correlated to the quality of the code being tested. Tests that are simple and straight forward are an indication that the code is testable. If the NLM has trouble generating tests or produces test that are complex or incomplete, it's a sign that the code is not as testable as it could be and that some refactoring may be in order.
Enables Refactoring
Of course the major benefit from generating tests with an NLM is that, in the end, you have tests! And having tests is an enabler for refactoring and making improvements to the codebase.
Scoping Statement
In addition to generating tests to increase code coverage, AI can increase the efficiency, accuracy, and scalability of a test suite. It can run tests faster and more precisely, reducing testing cycles. With predictive analytics, AI can prioritize tests likely to uncover defects, improving overall effectiveness. However to realize these benefits you need tests and this post is focused on unit test generation. We'll get to these other benefits in due time as well generating other types of tests.
Further Scoping Statement
There are many ways to generate tests. In this post I'll focus on using GitHub Copilot to demonstrate the ins and outs of generating test automation and I'll discuss Copilot's strengths and weaknesses for this. The concepts are applicable to other NLMs but we won't be comparing them here.
Why Copilot
GitHub Copilot is a reasonable choice for generating tests. It is supported in VS Code, Visual Studio, IntelliJ IDEA, Vim, and other IDEs. Integration into an IDE is advantageous as the IDE can expose the code to the NLM making it easy to tell the IDE to generate tests using the NLM for a function, method, class, etc.
Our Complex Code That Needs Testing
To demonstrate generating automated tests we need code to test. More specifically, we need code that is not trivial and that has some degree of complexity to challenge the NLM's test generation capabilities. Rather than trying to extract a snippet from an existing codebase or creating something from scratch, I handed the task off to Copilot.
My first attempt was with the prompt: “Can you create a C# class that has a high cyclomatic complexity?” which produced a complex class which would suffice but was based on numeric comparisons and not very readable. I added that “I'm looking for a real-world example” and Copilot produced a class similar to code below, which is much more suitable.
using System;
namespace BankAccountApp
{
public class BankAccount
{
public decimal Balance { get; private set; }
public bool IsActive { get; private set; }
public BankAccount(decimal initialBalance)
{
Balance = initialBalance;
IsActive = true;
}
public decimal MakeDeposit(decimal amount)
{
ProcessTransaction("Deposit", amount);
return Balance;
}
public decimal MakeWithdrawal(decimal amount)
{
ProcessTransaction("Withdraw", amount);
return Balance;
}
public virtual void ProcessTransaction(string transactionType, decimal amount)
{
if (!IsActive)
{
Console.WriteLine("Account is inactive.");
return;
}
switch (transactionType)
{
case "Deposit":
if (amount > 0)
{
Balance += amount;
Console.WriteLine($"Deposited: {amount}");
}
else
{
Console.WriteLine("Invalid deposit amount.");
}
break;
case "Withdraw":
if (amount > 0 && amount <= Balance)
{
Balance -= amount;
Console.WriteLine($"Withdrew: {amount}");
}
else if (amount > Balance)
{
Console.WriteLine("Insufficient funds.");
}
else
{
Console.WriteLine("Invalid withdrawal amount.");
}
break;
default:
Console.WriteLine("Invalid transaction type.");
break;
}
if (Balance < 0)
{
IsActive = false;
Console.WriteLine("Account deactivated due to negative balance.");
}
}
}
}
The ProcessTransaction
method in the BankAccount
class handles two types of transactions: deposits and withdrawals. It first checks if the account is active. If not, it prints a message and exits. For each transaction type, it validates the amount and updates the balance accordingly, printing appropriate messages for each case. If the balance becomes negative, the account is deactivated.
This class is complex enough to require a good amount of test automation to provide adequate code coverage. Tests are needed for each transaction type and each transaction type has different conditions that need to be tested. Also the IsActive
flag and the negative balance conditions need testing.
Copilot gave us complex code, not necessarily clean code and there are opportunities to improve this legacy code. But before we might attempt improvements we should have tests for the current implementation so that we can be confident that any refactoring doesn't introduce defects.
Using Copilot to Write Unit Tests
Copilot has several ways to generate tests. I'm focusing on using the Copilot integration with Visual Studio Code which is fairly representative of other IDEs and techniques.
Generating Tests in VSCode
In VSCode there are two primary ways to generate tests. You can enter commands in Copilot chat or you can use the right-click menu in a code file and select to generate tests.
Generate Tests for the MakeDepost Method
To generate tests for a single method or function in the Copilot chat, enter a prompt asking Copilot to generate tests for the method or function. For example, the prompt: Generate unit tests for the MakeDeposit method of the BankAccount class
produced the code:
using Xunit;
using BankAccountApp;
namespace BankAccountTests
{
public class BankAccountTests
{
[Fact]
public void MakeDeposit_ShouldIncreaseBalance()
{
// Arrange
var account = new BankAccount(100m);
// Act
var newBalance = account.MakeDeposit(50m);
// Assert
Assert.Equal(150m, newBalance);
Assert.Equal(150m, account.Balance);
}
[Fact]
public void MakeDeposit_ShouldNotChangeBalance_WhenAmountIsNegative()
{
// Arrange
var account = new BankAccount(100m);
// Act
var newBalance = account.MakeDeposit(-50m);
// Assert
Assert.Equal(100m, newBalance);
Assert.Equal(100m, account.Balance);
}
}
}
Add Mocking to Isolate Dependencies
The MakeDeposit
method calls the ProcessTransaction
method and returns the new balance. As a unit test of the MakeDeposit
method, the functioning of the ProcessTransaction
method should not effect the testing of the MakeDeposit
method and clearly both tests depend on an expected behavior of the ProcessTransaction
method in order to pass.
We can address this with another prompt: Can you mock the ProcessTransaction method in every MakeDeposit test?
which produced new tests that leverage the Moq framework and reflection to remove the dependency on the ProcessTransaction
method:
using Xunit;
using Moq;
using BankAccountApp;
using System.Reflection;
namespace BankAccountTests
{
public class BankAccountTests
{
[Fact]
public void MakeDeposit_ShouldCallProcessTransaction()
{
// Arrange
var mockAccount = new Mock<BankAccount>(100m) { CallBase = true };
// Act
mockAccount.Object.MakeDeposit(50m);
// Assert
mockAccount.Verify(a => a.ProcessTransaction("Deposit", 50m), Times.Once);
}
[Fact]
public void MakeDeposit_ShouldIncreaseBalance()
{
// Arrange
var mockAccount = new Mock<BankAccount>(100m) { CallBase = true };
mockAccount.Setup(a => a.ProcessTransaction("Deposit", 50m)).Callback(() =>
{
var balanceProperty = typeof(BankAccount).GetProperty("Balance",
BindingFlags.Public | BindingFlags.Instance);
balanceProperty.SetValue(mockAccount.Object, 150m);
});
// Act
var newBalance = mockAccount.Object.MakeDeposit(50m);
// Assert
Assert.Equal(150m, newBalance);
Assert.Equal(150m, mockAccount.Object.Balance);
}
[Fact]
public void MakeDeposit_ShouldNotChangeBalance_WhenAmountIsNegative()
{
// Arrange
var mockAccount = new Mock<BankAccount>(100m) { CallBase = true };
mockAccount.Setup(a => a.ProcessTransaction("Deposit", -50m)).Callback(() =>
{ /* No change in balance */ });
// Act
var newBalance = mockAccount.Object.MakeDeposit(-50m);
// Assert
Assert.Equal(100m, newBalance);
Assert.Equal(100m, mockAccount.Object.Balance);
}
}
}
Explaining the Tests
The MakeDeposit_ShouldCallProcessTransaction
test is designed to verify that the MakeDeposit
method of the BankAccount
class correctly calls the ProcessTransaction
method with the appropriate parameters. It isolates the ProcessTransaction
method by creating a mock of the BankAccount
class and then verifying that the ProcessTransaction
method of the mocked BankAccount
is called once with the expected parameter value.
The MakeDeposit_ShouldIncreaseBalance
method is designed to verify that the MakeDeposit
method of the BankAccount
class correctly increases the account balance. It isolates the ProcessTransaction
method by creating a mock of the BankAccount
class and implements a mock of the ProcessTransaction
method that assigns the expected value to the BankAccount.Balance
property using reflection.
The MakeDeposit_ShouldNotChangeBalance_WhenAmountIsNegative
test is similar to the MakeDeposit_ShouldIncreaseBalance
test but does not change the balance. Since the MakeDeposit
method has no code that behaves differently when the amount is negative it would be reasonable to drop this test.
Generating Tests for the ProcessTransaction Method
I used a similar prompt to Generate unit tests for the ProcesTransaction method of the BankAccount class
, which generates these additional tests:
[Fact]
public void ProcessTransaction_ShouldIncreaseBalance_OnValidDeposit()
{
// Arrange
var account = new BankAccount(100m);
// Act
account.ProcessTransaction("Deposit", 50m);
// Assert
Assert.Equal(150m, account.Balance);
}
[Fact]
public void ProcessTransaction_ShouldNotChangeBalance_OnInvalidDeposit()
{
// Arrange
var account = new BankAccount(100m);
// Act
account.ProcessTransaction("Deposit", -50m);
// Assert
Assert.Equal(100m, account.Balance);
}
[Fact]
public void ProcessTransaction_ShouldDecreaseBalance_OnValidWithdrawal()
{
// Arrange
var account = new BankAccount(100m);
// Act
account.ProcessTransaction("Withdraw", 50m);
// Assert
Assert.Equal(50m, account.Balance);
}
[Fact]
public void ProcessTransaction_ShouldNotChangeBalance_OnInvalidWithdrawal()
{
// Arrange
var account = new BankAccount(100m);
// Act
account.ProcessTransaction("Withdraw", -50m);
// Assert
Assert.Equal(100m, account.Balance);
}
[Fact]
public void ProcessTransaction_ShouldNotChangeBalance_OnInsufficientFundsWithdrawal()
{
// Arrange
var account = new BankAccount(100m);
// Act
account.ProcessTransaction("Withdraw", 150m);
// Assert
Assert.Equal(100m, account.Balance);
}
[Fact]
public void ProcessTransaction_ShouldDeactivateAccount_OnZeroBalance()
{
// Arrange
var account = new BankAccount(50m);
// Act
account.ProcessTransaction("Withdraw", 50m);
// Assert
Assert.False(account.IsActive);
Assert.Equal(0m, account.Balance);
}
[Fact]
public void ProcessTransaction_ShouldNotProcessTransaction_OnInactiveAccount()
{
// Arrange
var account = new BankAccount(0m);
account.ProcessTransaction("Withdraw", 1m); // Deactivate account
// Act
account.ProcessTransaction("Deposit", 50m);
// Assert
Assert.Equal(0m, account.Balance);
}
Running these tests shows that two of these tests, ProcessTransaction_ShouldNotProcessTransaction_OnInactiveAccount
and ProcessTransaction_ShouldDeactivateAccount_OnZeroBalance
are failing. These tests both depend on the state of the IsActive
flag. A closer look at the code shows that the IsActive
flag is set after the transactions types are handled when the balance is less than zero. The “Withdraw” transaction has a condition that prevents updating the balance if the amount is greater than the balance. The balance can never be zero and the IsActive
flag will never be cleared. Copilot found a bug!
Copilot Conundrums
I've found that Copilot generated tests sometimes fail due to problems with the test implementation. The test will raise an unexpected error or an assertion will fail unexpectedly. I've found that using the error message or test output as a prompt in the same context will eventually resolve the problem. I ran across this when adding mocks to the MakeDeposit
method. At first the test failed trying to set the Balance
in the mock. The Balance
is private and can only be set within the class. Eventually Copilot arrived at a solution using reflection.
I've also found that when Copilot has trouble generating tests it can be an indication that the code is not easily testable. I've found instances where Copilot struggles to generate working mocks or that the mocking required to properly test the code is excessive. If you find that the complexity of the test code is equal to or exceeding the code under test, you may have no choice but to refactor the code before generating tests.
Conclusion
Leveraging GitHub Copilot for test automation offers numerous benefits, including increased test coverage, efficiency, and accuracy. By generating tests, Copilot can help identify design and architecture issues, enabling refactoring and other improvements to the codebase. Copilot, integrated with popular IDEs, provides a practical solution for generating unit tests, as demonstrated with the BankAccount
class example. While AI-generated tests may sometimes require adjustments, they ultimately contribute to a more robust and maintainable codebase. Embracing Copilot for test automation can significantly enhance the development process, making it more efficient and reliable.
Disclaimer
AI contributed to the writing of this blog post.
Prompts:
- Create a summary paragraph for why AI is good for test automation.
- Create an outline for why AI is good for test automation.
- What IDEs does GitHub Copilot support?
- Can you create a C# class that has a high cyclomatic complexity?
- I'm looking for a real world example
- @workspace /explain the MakeDeposit_ShouldCallProcessTransaction method of the BankAccountTests class
- @workspace /explain the MakeDeposit_ShouldIncreaseBalance method of the BankAccountTests class
- Generate a conclusion for this post