Full report – appendix A

Appendix A: Background

This appendix provides an overview of the whole-of-government Microsoft 365 Copilot trial, licensing and governance arrangements, and the overarching approach to the evaluation.

A.1 Overview of the Microsoft 365 Copilot trial

The whole-of-government trial of Copilot was conducted to examine the safe and responsible use of AI in the Australian Public Service (APS).

On 16 November 2023, the Australian Government announced a 6-month whole-of-government trial of Microsoft 365 Copilot.

The purpose of the trial was to explore the safe, responsible, and innovative use of generative AI by the APS. The trial aimed to uplift capability across the APS and determine whether Copilot, as an example of generative AI:

could be implemented in a safe and responsible way across the government
posed benefits and challenges/consequences in the short and longer term
faced barriers to broader adoption that may require changes to how the APS delivers on its work.

The trial provided a controlled setting for APS staff to explore new ways to innovate with generative AI. It also served as a useful case study that will inform the government’s understanding of the potential benefits and challenges associated with the implementation of generative AI.

The trial was co-ordinated by the DTA, with support from the AI in Government Taskforce, and ran from January to June 2024. The DTA distributed over 7,769 Copilot licenses across almost 60 participating agencies. The trial was non-randomised – agencies and individuals volunteered to participate. A full list of agencies that participated can be seen in A full list of agencies that participated can be seen in Appendix C.1 Overall participation.

The trial focused solely on Copilot – a generative-AI enabled intelligent assistant embedded within the Microsoft 365 suite.

Microsoft 365 Copilot, launched by Microsoft in November 2023, is a generative AI tool that interfaces directly with Microsoft applications such as Word, Excel, PowerPoint, Outlook, Teams and more. Copilot uses a combination of large language models (LLMs) to ‘understand, summarise, predict and generate content’. While Copilot continues to evolve, its functionalities in the trial broadly were:

Content generation – drafting documents, emails and PowerPoint presentations based on user prompts,
Summarisation and theming – providing an overview of meetings, documents and email threads, and identifying key messages,
Task management - suggesting follow up actions and next steps from meetings, documents and emails
Data analysis – creating formulas, analysing data sets and producing visualisations.

Copilot produces outputs by incorporating user and organisational data – or if configured by users, to also source Internet content – to produce an output. The ability to use organisational data is due to Microsoft Graph – a service that connects, integrates and provides access to data stored across Microsoft 365.

Microsoft Graph ensures that Copilot complies with an agency’s existing Copilot security and privacy settings and provides contextual awareness to outputs by drawing on information from emails, chats, documents and meeting transcripts which the user has access to.

Microsoft also offers a free web version of Copilot. Although not the subject of the evaluation, the AI-assisted chat service and web search (formerly named Bing Chat) offers similar functionality to Copilot, albeit not embedded into applications and does not utilise internal data and information.

Architecture and data flow of Microsoft 365 Copilot

The user’s prompts are sent to Copilot.
Copilot accesses Microsoft Graph and, optionally, other web services for grounding. (Microsoft Graph is an API that provides access to the user’s context and content, including emails, files, meetings, chats, calendars and contacts.)
Copilot sends a modified prompt to the LLM.
The LLM processes the prompt and sends a response back to Copilot.
Copilot accesses the Microsoft Graph to ensure that data handling adheres to necessary compliance and Purview standards.

More information about Microsoft 365 Copilot’s architecture can be found via Microsoft Learn at https://learn.microsoft.com/en-au/copilot/microsoft-365/microsoft-365-copilot-overview

Copilot was deemed a suitable proxy of generative AI for the purposes of the trial.

The DTA selected Copilot to be trialled as a proxy for generative AI. Copilot was chosen for 3 main reasons:

Copilot offered comparable features to other off-the-shelf generative AI products.

Copilot is powered by the same LLMs and possesses similar functionality to other publicly available generative AI tools.

Copilot could be rapidly deployed across the APS.

Copilot is already available within many APS agencies. The applications are incorporated into daily workflows and staff have existing competency in them. The government’s existing Volume Sourcing Agreement (VSA) with Microsoft also enabled agencies to quickly and easily procure and administer licences.

Copilot created a secure and guard railed environment for APS staff to experiment with generative AI.

Microsoft Graph ensured compliance with existing Copilot permission groups, allowing APS staff to familiarise themselves with generative AI in a controlled setting.

Although these characteristics made Copilot an understandable choice for the trial, there are limitations of it as a proxy. Other available generative AI tools possess some similar traits, but the compressed timelines of the trial dictated a solution that could be procured and deployed with confidence that it would be operational and secure within the trial timeframes.

A.2 Trial administration

Licensing arrangements

The DTA utilised the existing Microsoft VSA to administer the trial, in coordination with Data3, the License Service Provider (LSP) for that arrangement. Agencies purchased licenses through the DTA using a central enrolment that was established for the purpose of the trial only.

Governance

The trial was governed by a Program Board, chaired by the DTA and consisting of voting members representing the following 14 agencies across the Australian Government:

Australian Digital Health Agency
Australian Public Service Commission
Australian Taxation Office
Department of Agriculture, Fisheries and Forestry
Department of Employment and Workplace Relations
Department of Finance
Department of Health & Aged Care
Department of Home Affairs
Department of Industry, Science and Resources
Digital Transformation Agency
IP Australia
National Disability Insurance Agency
Services Australia
Treasury.

The Program Board reported to the AI Steering Committee (the governing body of the AI in Government work which reports to the Secretaries’ Board Future of Work sub-committee, and the Secretaries Digital and Data Committee) which provided operational oversight, monitoring and reporting, and escalation of issues outside the scope of the trial, as well as endorsing other key operational decisions. The Program Board was not responsible for reviewing or endorsing its evaluation, but where appropriate visibility was provided to voting members only. The evaluation of the Trial including the evaluation plan, content of participation surveys and the final reports were directly considered and endorsed by the AI Steering Committee.

Microsoft were invited to attend Program Board meetings as an observer and to update members on progress, such as their response to addressing issues raised through the central issues registers, and product roadmap updates. Before finalising the terms of reference for the Program Board, the DTA sought external probity advice to ensure there were no perceived or actual conflicts of interest in Microsoft’s participation.

A.3 Overview of the evaluation

The evaluation assessed the use, benefits, risks and unintended outcomes of Copilot in the APS during the trial.

The DTA engaged Nous to conduct an evaluation of the trial based on 4 evaluation objectives designed by the DTA, in consultation with:

the AI in Government Taskforce
the Australian Centre for Evaluation (ACE)
advisors from across the APS designed 4 evaluation objectives.

Employee-related outcomes

Evaluate APS staff sentiment about the use of Copilot, including:

staff satisfaction
innovation opportunities
confidence in the use of Copilot
ease of integration into workflow.

Productivity

Determine if Copilot, as an example of generative AI, benefits APS productivity in terms of:

efficiency
output quality
process improvements
agency ability to deliver on priorities.

Adoption of AI

Determine whether and to what extent Copilot, as an example of generative AI:

can be implemented in a safe and responsible way across government
poses benefits and challenges in the short and longer term
faces barriers to innovation that may require changes to how the APS delivers on its work.

Unintended consequences

Identify and understand unintended benefits, consequences, or challenges of implementing Copilot, as an example of generative AI, and the implications on adoption of generative AI in the APS.

Australian Government trial of Microsoft 365 Copilot