Looking for real-world AI examples is a challenge and part of this challenge comes from Generative AI (GenAI) news dominating the media. It feels like every AI demo involves chatting with GenAI to produce content. The obligatory chat completion demo has started to become the to-do list of AI demo apps, and, to make matters worse, it's selling AI short. GenAIs rely on large language models (LLMs), which are the brain behind natural language processing tasks. In this article, I'll explore the opportunities presented by LLMs using a real-world research-and-development experiment. This experiment is part of on-going research into AI-enabled user interface components (aka .NET Smart Components) by Progress Software and Microsoft.

Solving Problems for the User

Currently, there are experimental Smart UI Components created by Microsoft (http://github.com/dotnet/smartcomponents) and third-party component vendors. These components exist across multiple .NET application models including: ASP.NET, Blazor, WinForms, .NET MAUI, and more. Smart Components use aspects of AI capabilities to enhance the user experience (UX) by adding new features or making existing features easier to use. This approach is an evolutionary one, as the components aren't novel but rather augmented. This allows users to continue using concepts they're familiar with rather than completely replacing the UI. In addition to evolving the UX gradually, it allows UI developers and designers to incrementally update applications and avoid complete rewrites.

Smart AI Search is one example of a Smart Component implementation (https://demos.telerik.com/blazor-ui/blazorlab/grid-smart-ai-search). Using text-to-vector embedding and vector search allows users to query by relationship in addition to verbatim text search. When applied to a UI, smart search can be added to drop-down boxes, auto-complete boxes, and generic search bars. As shown in Figure 1, when the user searches for milk, the results contain dairy and cheese. These are UIs that already implement search and are further enhanced by AI to enable results based on context.

Figure 1: A Telerik Grid with smart AI search. Searching by milk displays results based on context, such as dairy and cheese.

Smart Text inputs are another frontier where AI can assist the user to enhance their experience through GenAI. In a text input scenario, GenAI is incorporated to perform auto-completions. These UIs are designed to assist the user in writing and provide value by aiding in productivity. Shown in Figure 2, when the user pauses typing, the smart component completes the statement using AI. Smart inputs can be tailored to specific business models through additional context and training or combined with retrieval augmented generation (RAG). In a customer service setting, smart text inputs enable agents to respond to customer messages in a live chat system, support ticket system, CRM, bug tracker, etc.

Figure 2: A Microsoft smart input component completes the user's statement using AI

Improving the UX with AI can reach far beyond these ideas. The concept of Smart Components is just beginning to take shape, and the examples thus far are relatively small enhancements to the UI. What hasn't been fully explored yet is how AI can become a new input device for a UI, allowing the user to interact in a completely new yet familiar way through natural language. Before you can envision how this will work, you first need to understand the true capability of an LLM.

Beyond Chat with Large Language Models

Although the typical chat completion example is certainly impressive on the surface, it doesn't generally inspire developers. Chatting with AI out of context doesn't communicate the potential of what an LLM can do. One such feature of an LLM that's not highlighted enough is the ability for LLMs to transform and translate text.

Translating text from one language to another is a complex process addressed by neural networks. A simplified diagram can be seen in Figure 3. The process involves training on large sets of text, tokenization, embeddings (vector representations), and language pair mappings. Despite the complexity, language translation has come a long way. Although it can't be considered a “solved problem” just yet, it has achieved a quality level that delivers reliable results. This powerful tool is abstracted down and made available to developers through an API call to an LLM.

Figure 3: A generalized block diagram of an LLMs translation feature

In addition to language translation, LLMs can generate code from natural language. It would be an oversimplification to say language translation and language-to-code are processed the same way. However, there are similarities in how these processes work. The primary concept remains the same: Text is embedded and the LLM uses relationships forged through training to produce a contextual understanding of the content. With content and context, the LLM then translates and outputs code.

Now you know that LLMs are well suited for language translations, language-to-code, and that they inherently work well with relational data. Consider these wholistically, as a user can enter text in any language and receive an output in any language and code is one of many “languages” that the LLM understands. If you consider that code isn't a spoken language, but rather a text-based language that the LLM understands, boundaries begin to fall away. In Figure 4, a block diagram shows a simplified overview of input and capabilities from an LLM. Any number of input options can be used to produce one or many output options.

Figure 4: A block diagram of an LLM's possible inputs and outputs

Using Natural Language to Manipulate UI

I've established that LLMs can convert human language into code. Now it's time to evaluate how AI can understand a UI. The ideal scenario is to allow the LLM to operate the UI based on a natural language input from a user. This allows the user to query and interact with an application or component with less cognitive overhead. The user doesn't need to know how to find functionality, buttons, or menus, or even know where to click. Instead, the user should simply be able to provide a prompt and watch the UI respond.

On its surface, this is a complex issue, as UIs are elaborate systems. The AI has no prior knowledge of the system, only the capacity to build relationships, contexts, and translate text. A UI consists of design time code using a multitude of languages including HTML, C#, JavaScript, and CSS. At runtime, the code runs on a platform that renders the UI, typically using a document object model (DOM). One option is to use the LLM to generate UI code and dynamically execute it at runtime. Another option is to write DOM changes directly to update the UI. A third solution would be to look for a high-level interface and leverage APIs that already exist to modify the UI. The latter option has the lowest barrier to entry because it avoids the DOM or dynamic UI generation.

For a specific example, let's look at the Telerik UI for Blazor Grid component. The component itself has complexities of its own with functionalities that include sorting, filtering, paging, grouping, and more. Each feature has a UI for interacting with the user and displaying the current state of the component, highlighted in Figure 5.

Figure 5: The Telerik UI for Blazor Data Grid component with the main UI features highlighted: grouping, filtering, sorting and paging

The grid has an API that enables control of the grid's state at runtime. This feature is implemented by saving and loading state objects that you can serialize as JSON, shown in the snippet below. By providing a state object as JSON, the grid component can be controlled directly and the other technical complexities, such as the HTML, DOM, and UI interactivity, can be ignored. Instead of the user clicking buttons and dragging elements, loading states triggers events and applies configurations.

// Get a GridState object
var gridState = Grid.GetState();

// Serialize to JSON
var jsonString = JsonSerializer.Serialize(gridState);

// Example JSON:
"columnStates":[{
    "index":0,
    "visible":true,
    "field":"Product Name"
}]...

// Load State
await Grid.SetStateAsync(jsonString);

Using this approach, you can theorize that the LLM can translate a user's natural language query into valid JSON that represents the grid state. A complete sample of the grid state JSON object can be seen in Listing 1. The user can simply request the grid to perform a function, and a grid state with the corresponding parameters applies the changes needed, as seen in Figure 6.

Listing 1: Grid State Object

{
  "groupDescriptors": [],
  "collapsedGroups": [],
  "columnStates": [
    {
      "index": 0,
      "width": null,
      "visible": null,
      "locked": false,
      "id": null,
      "field": "CustomerId"
    },
    {
      "index": 1,
      "width": null,
      "visible": null,
      "locked": false,
      "id": null,
      "field": "CompanyName"
    },
    {
      "index": 2,
      "width": null,
      "visible": null,
      "locked": false,
      "id": null,
      "field": "Address"
    },
    {
      "index": 3,
      "width": null,
      "visible": null,
      "locked": false,
      "id": null,
      "field": "City"
    },
    {
      "index": 4,
      "width": null,
      "visible": null,
      "locked": false,
      "id": null,
      "field": "Country"
    }
  ],
  "expandedItems": [],
  "filterDescriptors": [],
  "sortDescriptors": [],
  "searchFilter": null,
  "page": 5,
  "skip": 40,
  "selectedItems": [],
  "originalEditItem": null,
  "editItem": null,
  "editField": null,
  "insertedItem": null,
  "tableWidth": null
}

Figure 6: The conceptual application of a natural language query applied to a data grid using JSON to represent state

This is possible because of the LLM's ability to process the user's query along with the JSON data representation of the component on screen. Shown in Figure 7, you can see how to leverage the LLM's input and output capabilities to generate structured JSON data by providing additional data and contextual cues.

Figure 7: A block diagram showing the text and data inputs of an LLM being used to create structured data. The inputs are provided with a context prompt, user's query, and JSON data.

The concept of using natural language as an interface assumes that the LLM will be able to build a relationship between the prompt and a complex JSON object. The model needs to translate the prompt into a valid JSON response that deserializes to a grid state object. You can test this theory using a deployed AI model directly in Azure.

Proof of Concept Using Azure AI Foundry

Testing a proof of concept using AI models is best done in isolation, if possible. One such tool for interacting with AI models directly in a codeless environment is Azure's AI Foundry (formerly Azure AI Studio). The studio is useful for starting and refining the prompt engineering process through a simplified interface. Shown in Figure 8, the Playgrounds section and the Chat Playground provides an interactive UI for quickly testing and building AI prompts. In this section, AI models are chosen from actual deployments that will be used in the final production application. This ensures that responses seen in the chat playground are the same as when the model is called through an Azure API.

Figure 8: The Azure AI Foundry portal interface being used to test conceptual prompts and outputs

Using the chat playground, let's see if the gpt-4o-mini model is capable of translating user queries into usable and correct JSON responses. First the deployed model is selected, as shown in Figure 8 (1), then a system prompt is added to set the context for the LLM, Figure 8 (2). The initial prompt informs the LLM that this chat is used to edit a JSON object based on a user's input. It also gives a list of basic concepts that are within the scope, such as filtering, sorting, grouping and paging. Finally, the prompt ends with an output example, in this case, a simple opening curly brace ({). The LLM uses this output example as a starting cue for responding with a JSON object rather than generating text streams.

# You are helping to edit a JSON object that
 represents a Data Grid User Interface
 component's meta data.
Users can request assistance with
 configuring:
1. Filtering
2. Sorting
3. Grouping
4. Paging
Return ONLY the JSON response in a web
 standard JSON format: {

With the prompt created and the model's context set, a user query is issued. The user query represents the request made by the user to perform an action on the grid UI component along with the current JSON state of the grid, seen in Figure 8 (3). When writing your user query, it's helpful to the LLM when you set any additional context that applies. Here, you'll denote the User instruction along with the current grid state setting the full context for the LLM. The JSON state is then appended to the end of the user query to give the LLM a starting point to which it can make changes based on the user's query. The full query is shown in the snippet below.

**User instruction**: Show page 5.
The **current grid state** is:
{
    "page": 1,
}

When the user query is submitted, the LLM responds with a JSON response, shown in Figure 8 (4). You can see that the model has correctly interpreted the query of “Show page 5” into the JSON property page, with the desired page value 5. This is a simple one-to-one relationship between the state object and user query using the term page, from which the LLM can easily infer the meaning.

{
  "page": 5,
}

Let's continue with the more complex operation of applying a sort order to a grid column. The meta data for a grid column's name, and the sorting information are held in different JSON properties. In this query, the LLM has to make a connection between referencing a column by the name found in columnStates, and then applying the sort direction found in sortDescriptors. Let's assume that the grid is initially unsorted and the user is requesting to sort the City column in ascending order.

**User instruction**: Sort the City column in ascending order.

The **current grid state** is:
{
    "columnStates": [
        {
            "field": "City"
        }
    ],
    "sortDescriptors": []
}

In this response, the model was able to do a good job of forming a relationship between the objects correctly. However, because the initial state of the grid was unsorted, the sortDescriptors array is a guess at what the JSON should be. This could be considered a hallucination by the LLM. Even though the response is formatted well, it doesn't match values used by the Telerik Grid state object. The direction property should be sortDirection followed by a value of either 0 (ascending) or 1 (descending). Trying to apply the JSON in its current format on a Telerik Grid results in an error. This issue will need to be fixed before the prompt is useful.

"sortDescriptors": [
    {
      "field": "City",
      "direction": "ascending"
    }
  ],

Through a bit of prompt engineering, the sorting issue can be resolved. Because no context is initially available via the JSON data, the prompt can be updated to fill in the missing concepts. The model instructions are updated to include how the sortDescriptor JSON object is formatted.

... appended
# The "SortDescriptors" array contains sorting
 element "sortDescriptor"
- "sortDirection" is set to 0 for ascending
- "sortDirection" is set to 1 for descending

After updating the system prompt, the chat completion is tried again. This time the model outputs the desired response.

"sortDescriptors": [
    {
        "field": "City",
        "sortDirection": 0
    }
]

Using the playground, you've validated that the gpt-4o-mini model can properly format the grid state JSON object. The model can translate a user's request to page and sort when given the grid's current meta data. For items not within the current state, additional prompts are added to provide the model with the necessary details. Additional data, like a JSON schema, also improve the accuracy of the output and are added to the final prompt. After proving the concept, you can start assembling the application. The full prompt from this example can be found in Listing 2.

Listing 2: AI System Prompt

# You are helping to edit a JSON object that represents a Data Grid User Interface component's meta data.
Users can request assistance with configuring:
1. Filtering
2. Sorting
3. Grouping
4. Paging
# The "SortDescriptors" array contains sorting element "sortDescriptor"
- "sortDirection" is set to 0 for ascending
- "sortDirection" is set to 1 for descending
Return ONLY the JSON response in a web standard JSON format:
{

Creating a Blazor Demo Page

The next step to building your AI powered grid interface is to create a basic page in a full-stack Blazor application that will display the UI. In a Blazor application with the Telerik dependencies applied, you'll add a new page and begin setting up the necessary UI components. A Telerik Grid, Textbox, and Button are needed to display the grid, and record and submit the query. The Telerik Grid is used because of the ability to control its features and values through the GridState object.

Start by adding a Telerik Grid component to Home.razor that displays some customer data. The grid displays columns for each data point in the Customers array. In addition, some of the basic grid's features are enabled, like grouping, paging, sorting, and filtering. Enabling the features supported by the AI model ensures that the correct UI components are displayed in the grid when a user commands the feature as part of a query. When the page is initialized, you'll populate the grid by calling the GetCustomers method from CustomerService. The grid's Data parameter is bound to the Customers array and binds the result from GetCustomers to the grid upon initialization, as shown in Listing 3.

Listing 3: Completed Home.razor

@inject CustomerService DataService

<TelerikGrid Data="@Customers" @ref="@Grid"
             Groupable="true"
             Pageable="true"
             Sortable="true"
             FilterMode="@GridFilterMode.FilterRow">
    <GridColumns>
        <GridColumn Field="CustomerId"/>
        <GridColumn Field="CompanyName" />
        <GridColumn Field="Address" />
        <GridColumn Field="City" />
        <GridColumn Field="Country" />
        <GridColumn Field="Number" />
    </GridColumns>
</TelerikGrid>

@code {
    // Reference the Grid instance
    TelerikGrid<CustomerDto>? Grid { get; set; }
    IEnumerable<CustomerDto> Customers = [];

    protected override async Task OnInitializedAsync()
    {
        Customers = await DataService.GetCustomers();
    }
}

With the grid component displayed, continue by adding the property AIQuery to hold the value of the user's query. Next, a TelerikTextBox component is added to the page and its value is two-way data bound to AIQuery. The textbox's Placeholder value is set to Type your query. to indicate that it can be used to query the grid.

<TelerikTextBox @bind-Value="@AIQuery" Placeholder="Type your query."></TelerikTextBox>

@code {
    string AIQuery { get; set; } = "";
}

To complete the interface, add a button labeled Ask AI. A TelerikButton is added with an OnClick event bound to a method OnAIRequest. Because the AI service hasn't been created yet, a placeholder method is added with comments outlining the work that will be performed when the button is clicked.

<TelerikTextBox @bind-Value="@AIQuery" Placeholder="Type your query." />
<TelerikButton OnClick="OnAIRequest">
    Ask AI
</TelerikButton>

@code
{
    string AIQuery { get; set; } = "";
    
    Task OnAIRequest()
    {
        // Get the grid state
        // Send a request with the gridstate and query
        // Update the grid with the response
    }
}

Using the ref parameter, an instance reference GridInstance to the Telerik Grid is created. You'll need this reference for calling the GetState and SetStateAsync methods.

<TelerikGrid @ref="GridInstance" Data="@Customers" ...>

@code {
    // Reference the Grid instance
    TelerikGrid<CustomerDto>? GridInstance { get; set; }
}

In the OnAIRequest method, you'll continue stubbing out functionality. A variable gridState is added and the current grid state is captured by calling GetState on the GridInstance. In addition, a placeholder method is written setting up the ProcessGridRequest method that will be built next. This method passes the AIQuery and gridState values to a service and gets back a GridState<CustomerDto>? object.

// Get the grid state
GridState<CustomerDto> gridState = GridInstance.GetState();

// Send a request with the grid state and query
// GridState<CustomerDto>? response = await AI.ProcessGridRequest(AIQuery, gridState);

// Update the grid with the response
// await GridInstance.SetStateAsync(response);

Creating an API with Extensions AI

With the application's UI ready, a service is needed to call the model from Azure's AI services. The easiest way to connect a .NET application with AI services, including services hosted on Azure, is to implement Microsoft's Extensions.AI library. The Extensions.AI library is a part of the .NET ecosystem being developed by Microsoft as the core AI set of abstractions for multi-modal LLMs. You'll use this library for communicating with models for text, image, and audio based GenAI.

First, you'll need to authenticate the application calls to Azure. Authentication requires security keys for deployed models in Azure that should be stored in a secure location, such as User Secrets or Environment Variables. For this example, use the following schema for the Key, Endpoint, and ModelId stored in User Secrets.

/* Configuration Schema
 {
  "AI": {
    "AzureOpenAI": {
      "Key": "YOUR_SUBSCRIPTION_KEY",
      "Endpoint": "YOUR_ENDPOINT",
      "ModelId": "gpt-4o-mini"
    }
  }
}
*/

Next, an IChatClient is created and registered with the application's services collection. The IChatClient is a chat client abstraction from the Extensions.AI library that decouples the core application code from the implementation details of the inner chat service. With an IChatClient, the application's inner chat service is easily swapped with other service providers, or updated models when necessary. Building the IChatClient starts in the application's Program.cs file. An AzureOpenAIClient is added with the authentication key and endpoint. The authenticated AzureOpenAIClient is the inner chat client for the application and is used to create the IChatClient instance. The new AzureOpenAIClient is added to the Services collection with a singleton scope.

builder.Services.AddSingleton(
    new AzureOpenAIClient(
        new Uri(builder.Configuration["AI:AzureOpenAI:Endpoint"]),
        new AzureKeyCredential(builder.Configuration["AI:AzureOpenAI:Key"])
    )
);

Following the addition of the AzureOpenAIClient, an IChatClient is created. The Extensions.AI AddChatClient and AsChatClient register the IChatClient instance using the inner AzureOpenAIClient. With this service registered, IChatClient is injected into the application when needed.

builder.Services.AddChatClient(services =>
{
    services.GetRequiredService<AzureOpenAIClient>().AsChatClient(builder.Configuration[
      "AI:AzureOpenAI:Chat:ModelId"]);
});

With the IChatClient ready, the AI service is created. A new AIGridService is created in the project to manage the user's interaction with the model in Azure. In the constructor, the IChatClient is injected, and the AI's context is created by adding a new ChatMessage using the prompt developed in the Azure AI Foundry shown in Listing 2. Because IChatClient is accessing the OpenAI API directly, you can enforce structured outputs and omit the statement asking for the response in JSON, **Return ONLY the JSON response in a web standard JSON format: {.

public class AIGridService
{
    private readonly IChatClient chatClient;
    private readonly ChatMessage AIContext;

    public AIGridService(IChatClient chatClient)
    {
        this.chatClient = chatClient;
        string systemPrompt = <prompt>;
        AIContext = new(ChatRole.System, systemPrompt);
    }
}

Next, you'll create the ProcessGridRequest method to complete the user's chat request. The method returns a new GridState<T> based on the user's query and current GridState. Processing the request is responsible for serializing, forming user message, and performing the chat completion.

public class AIGridService
{
    public AIGridService(IChatClient chatClient) 
    { 
        ... 
    }

    public async Task<GridState<T>?> ProcessGridRequest<T>(string query, GridState<T> state)
    {
        // Serialize the grid state to JSON
        // Create a new chat message for the user
        // Complete a chat request
        // Return the deserialized result
    }
}

In ProcessGridRequest a ChatMessage is created for the user and then augmented so the model can differentiate between the query and data. The initial prompt AIContext and the user message UserMessage are used to call IChatClient.CompleteAsync<T>. The method call CompleteAsync<GridState<T>> returns a deserialized GridState<T> object. When the type is specified, the method ensures that the response is in JSON format with a schema for the object type, in this case GridState<T>. Because the user's query may contain any natural language text, a valid result may not be produced. Invalid results, in this instance, return a null value that cause the TelerikGrid` to safely reset to a default state. No additional null safety checks need to be applied before continuing. The completed NaturalLanguageGridService is shown in Listing 4.

Listing 4: Completed NaturalLanguageGridService

using Microsoft.Extensions.AI;
using System.Text.Json;
using Telerik.Blazor.Components;

namespace NaturalLanguageGrid.Components;

public class NaturalLanguageGridService
{
    private readonly IChatClient chatClient;
    private readonly ChatMessage AIRole;

    public NaturalLanguageGridService(IChatClient chatClient)
    {
        this.chatClient = chatClient;
        string systemPrompt = """
# You are helping to edit a JSON object that represents a Data Grid User Interface component's meta data.
Users can request assistance with configuring:
 1. Filtering
 2. Sorting
 3. Grouping
 4. Paging
# The "SortDescriptors" array contains sorting element
 "sortDescriptor"
- "sortDirection" is set to 0 for ascending
- "sortDirection" is set to 1 for descending
""";

        AIRole = new(ChatRole.System, systemPrompt);
    }

    public async Task<GridState<T>?> ProcessGridRequest<T>(string query, GridState<T> state)
    {
        // Serialize the grid state to JSON
        string currentJsonState = JsonSerializer.Serialize(state);

        // Create a new chat message for the user
        ChatMessage UserMessage = new(ChatRole.User, $"""
        **User instruction**: Update the given the Current GridState with my request. {query}
        The **current grid state** is: {currentJsonState}
        """);

        ChatOptions chatOptions = new() { ResponseFormat = ChatResponseFormat.Json };

        // Complete a chat request
        var response = await chatClient.CompleteAsync<GridState<T>>([AIRole, UserMessage], chatOptions);

        // Return the serialized result
        // If the response is not successful, return null
        // A null grid state will simply reset the grid to its original state
        response.TryGetResult(out var result);
        return result;
    }
}

With ProcessGridRequest complete, it will be called from the user interface. First, the service is added to the application's service collection through dependency injection. In the application's Program.cs, the NaturalLanguageGridService is registered as a Scoped service.

builder.Services
 .AddScoped<NaturalLanguageGridService>();

Next, in the user interface Home.razor, the NaturalLanguageGridService is injected as a property named AI. Then the OnAIRequest method is updated by removing the commented code for ProcessGridRequest and SetStateAsync.

@inject NaturalLanguageGridService AI

@code
{
    Task OnAIRequest()
    {
        // Get the grid state
        GridState<CustomerDto> gridState = GridInstance.GetState();

        // Send a request with the grid state and query
        GridState<CustomerDto>? response = await AI.ProcessGridRequest(AIQuery, gridState);

        // Update the grid with the response
        await GridInstance.SetStateAsync(response);
    }
}

Implementing the NaturalLanguageGridService completes the technical requirements for the feature. The application is now able to run and process a user's natural language query and update the grid features. As shown in Figure 9, AI applies the grid's filter feature when the user enters a natural language query into the prompt and clicks “Ask AI”.

Figure 9: A Telerik UI for Blazor data grid's filter feature is applied by AI when the user enters a natural language query into the prompt and clicks “Ask AI”.

Maximizing User Experience

The current UI in Home.razor has a simple textbox input. Let's update this input to use voice. With natural language interfaces, speech-to-text is a much more direct and obvious input method. Using natural language speech input provides accessibility benefits. Extending the interface with voice enables users that rely on alternative inputs to mouse and keyboard. In addition, voice helps reduce the cognitive load of needing to locate and operate buttons and menus to perform specific tasks.

In addition to reducing the cognitive load of locating and operating buttons and menus, speech-to-text is also useful for users who use alternative input devices.

Let's improve the existing UI by incorporating a voice input method. Speech-to-text could be accomplished using AI by sending audio to a multi-modal model, allowing the AI to process speech-to-text, and responding to the user's query directly. However, at the time of writing, multi-modal AI models are costly and there are cheaper options available. Because the application is browser-based using Blazor for the UI, speech-to-text can be implemented using browser-standard APIs that run on-device at no cost.

To simplify the process, let's use a free and open-source library called Blazorators. Blazorators is a C# source generator that creates fully functioning Blazor JavaScript interop code. The packages associated with Blazorators are .NET API wrappers created from common browser APIs, including the Web Speech API. Using Blazorators allows you to access browser APIs without writing JavaScript.

In the application, you'll reference the Blazorators.SpeechRecognition NuGet package.

<PackageReference 
  Include="Blazor.SpeechRecognition"
  Version="9.0.1" />

Next, add the SpeechRecognition services to the application's service collection in Program.cs.

builder.Services
 .AddSpeechRecognitionServices();

With the services added, build a SpeechToTextButton component for handling the speech-to-text UI and I/O. Start by adding the new razor component SpeechToTextButton.razor to the application. Next, the ISpeechRecognitionService service is injected into a property named SpeechRecognition.

@inject ISpeechRecognitionService SpeechRecognition

Then you'll add event handlers to the code block for supporting all the speech-to-text events that the API and its callback functions require. These events either support the state of the UI or provide the resulting text back from the API. The callback functions include:

onRecognized: Fired when the speech recognition service returns a result—a word or phrase has been positively recognized, and this has been communicated back to the app.
onError: Invoked when an error occurs.
onStarted: Fired when the speech recognition service began listening to incoming audio.
onEnded: Fired when the speech recognition service has disconnected.

@code
{
    Task OnError(SpeechRecognitionErrorEvent args) {}
    Task OnRecognized(string text) {}
    Task OnEnded() {}
    Task OnStarted() {}
}

A flag isRecording is added to indicate when the process is running. Then, the OnEnded and OnStarted flags are set. Because the events are triggered outside of the UI thread, a StateHasChanged is called to update any UI elements bound to isRecording.

bool isRecording;

Task OnEnded()
{
    isRecording = false;
    StateHasChanged();
    return Task.CompletedTask;
}

Task OnStarted()
{
    isRecording = true;
    StateHasChanged();
    return Task.CompletedTask;
}

Next, an OnRecord method is created to start the process for listening to the user's microphone and triggering the corresponding callbacks when events are triggered. Then, in OnRecord, you'll cancel any existing speech recognition processes and call RecognizeSpeechAsyc. When RecognizeSpeechAsync is called, the en language to be recognized is selected and the event delegates are assigned.

async Task OnRecord()
{
    if (isRecording)
        await SpeechRecognition.CancelSpeechRecognitionAsync(true);

    _recognitionSubscription?.Dispose();
    _recognitionSubscription = await SpeechRecognition.RecognizeSpeechAsync(
        "en",
        onError: OnError,
        onRecognized: OnRecognized,
        onStarted: OnStarted,
        onEnded: OnEnded
    );
}

When speech is recognized, the OnRecognized event is fired and text is returned. This event needs to be surfaced to the parent component so the results can be bubbled up through the UI. An EventCallback is used to delegate the task and is invoked whenever OnRecognized is triggered. The completed code for the speech to text component can be found in Listing 5.

Listing 5: Completed ISpeechRecognitionService

@inject ISpeechRecognitionService SpeechRecognition
@implements IDisposable
@if (isRecording)
{
    <TelerikButton Title="Stop Recording" OnClick="OnStopRecording">
    </TelerikButton>
}
else
{
    <TelerikButton Title="Record" OnClick="OnRecord">
    </TelerikButton>
}

@code
{
    IDisposable? _recognitionSubscription;
    bool isRecording;
    
    [Parameter]
    public EventCallback<string> OnRecongizededText { get; set; }

    Task OnRecognized(string recognizedText) => OnRecongizededText.InvokeAsync(recognizedText);

    async Task OnRecord()
    {
        if (isRecording)
            await SpeechRecognition.CancelSpeechRecognitionAsync(true);

        _recognitionSubscription?.Dispose();
        _recognitionSubscription = await SpeechRecognition.RecognizeSpeechAsync(
            "en",
            onError: OnError,
            onRecognized: OnRecognized,
            onStarted: OnStarted,
            onEnded: OnEnded
        );
    }

    async Task OnStopRecording() => await SpeechRecognition.CancelSpeechRecognitionAsync(true);

    Task OnEnded()
    {
        isRecording = false;
        StateHasChanged();
        return Task.CompletedTask;
    }

    Task OnStarted()
    {
        isRecording = true;
        StateHasChanged();
        return Task.CompletedTask;
    }

    Task OnError(SpeechRecognitionErrorEvent args)
    {
        switch (args.Error)
        {
            case "audio-capture":
            case "network":
            case "not-allowed":
            case "service-not-allowed":
            case "bad-grammar":
            case "language-not-supported":
                throw new Exception($"Speech: {args.Error}: {args.Message}");
            case "no-speech":
            case "aborted":
                break;
        }

        StateHasChanged();
        return Task.CompletedTask;
    }

    public void Dispose() => _recognitionSubscription?.Dispose();
}

After completing the SpeechToTextButton component, it's added to the Home page, enabling the user to talk directly with the application. An instance of the button is added to the page beside the textbox. Then, the OnRecognizedText event is delegated to a method of the same name. This method applies the returned text to the AIPrompt textbox and invokes OnAIRequest.

</TelerikButton>
<SpeechToTextButton OnRecongizededText="OnRecongizededText" />

@code {
    async Task OnRecongizededText(string text)
    {
        AIQuery = text;
        await OnAIRequest();
    }
}

Connecting the SpeechToTextButton completes the UX by allowing the user to press a button and begin talking to the grid component. As shown in Figure 10, as the user's speech pauses, the speech recognition service returns text based on the user's input or disables itself after a long period of silence. When text is returned, the AI service is called, and the grid's configuration is updated based on the user's spoken query.

Figure 10: The user's speech is captured and transformed into a query processed by AI, resulting in an updated user interface.

Conclusion

The natural language interface makes a logical progression from current UI practices to an enabled AI experience. This type of interface enables users to interact more intuitively while making the component more inclusive to those with accessibility needs. Enabling voice control allows users who use alternatives to keyboard and mouse input to have access. In addition, the need to physically interact with individual UI elements is significantly reduced to a single button.

Because LLMs are multi-lingual, the natural language interface also adapts itself to globalization capabilities. Internally, the LLM uses its relationships with language translation to adapt the user's query to features of the component. For instance, if a German-speaking user enters a prompt in their language, the grid state is modified correctly even if the meta data is in English, as shown in Figure 11. Additional globalization is required by the speech-to-text input, but newer multi-modal LLMs will remove this requirement in the future, as they take speech input directly.

Figure 11: The Telerik UI for Blazor Grid uses a German phrase as input to modify the state of the UI component, even though the component's data is in English.

With a short prompt, you're able to cover approximately eighty percent of the features the grid offers. Using prompt engineering to add additional context cues and a JSON schema, the feature parity has been achieved. A full solution can be found on GitHub (https://github.com/EdCharbeneau/NaturalLanguageGrid2) that includes the full prompt, unit tests, and refined user experience.

Microsoft.Extensions.AI

A base set of .NET libraries for the .NET ecosystem serve as a building block for popular libraries like Semantic Kernel. Extensions.AI provides a unified layer of C# abstractions for interacting with AI services, such as small and large language models (SLMs and LLMs), embeddings, and middleware. The goal of Extensions.AI is to act as a unifying layer within the .NET ecosystem, enabling developers to choose their preferred frameworks and libraries while ensuring seamless integration and collaboration across the ecosystem.

The Microsoft.Extentions.AI Library

The library offers a consistent set of APIs and conventions for integrating AI services into .NET applications. It includes abstractions for common AI functionalities like chat features, embedding generation, and tool calling. For example, the IChatClient interface allows consumption of language models from various providers, whether you're connecting to an Azure OpenAI service or running a local Ollama installation. This unified approach helps developers focus on coding against conceptual AI capabilities rather than specific platform implementations.

Microsoft.Extensions.AI is currently in preview, with plans to support .NET 9 in its stable release. (https://learn.microsoft.com/en-us/dotnet/ai/ai-extensions)

Adding Copilots to Your Apps

The future is here now and you don't want to get left behind. Unlock the true potential of your software applications by adding Copilots.

CODE Consulting can assess your applications and provide you with a roadmap for adding Copilot features and optionally assist you in adding them to your applications.

Reach out to us today to get your application assessment scheduled: www.codemag.com/ai

Natural Language AI-Powered Smart UI

Published in:

Filed under: