• Evaluation of the whole-of-government trial of Microsoft 365 Copilot

    The Australian Government recently completed a trial of Microsoft 365 Copilot that provided APS staff an opportunity to experiment with a generative AI tool in a safe and responsible way.

    The evaluation of the trial has concluded, and findings can be explored below.

    The summary report provides a high-level view of the evaluation findings and recommendations from the Australian Government's trial of Microsoft 365 Copilot.

    The full report provides a detailed analysis of the evaluation findings from the Australian Government's trial of Microsoft 365 Copilot, including supporting data tables, graphs and information on the evaluation approach.

  • Read the summary report

  • Download the summary Copilot trial report

    Note that this is placeholder content

  • Executive summary

    In the few years since its public introduction, generative AI (artificial intelligence) has become available and accessible to millions of people. The growing availability and speed of uptake in publicly available tools, such as ChatGPT, meant the Australian Public Service (APS) had to respond quickly to allow its workforce to experiment with generative AI in a safe, responsible and integrated way. To facilitate this experimentation, an appropriate generative AI tool needed to be selected. 

    Microsoft 365 Copilot (formerly Copilot for Microsoft 365) was one of the solutions available to enable the APS to undertake safe and responsible generative AI experimentation. On 16 November 2023, the Australian Government announced a 6-month whole-of-government trial of Copilot.

    This decision was predicated on how swiftly and seamlessly Copilot, as a capability nested within existing whole-of-government contracting arrangements with Microsoft, could be deployed for rapid APS experimentation purposes. Further, as Copilot is a supplementary product that integrates with the existing applications within the Microsoft 365 suite, it also allowed staff to experiment and learn within the context of applications that were already familiar to them.

    The trial involved the distribution of over 5,765 Copilot licenses between January to June 2024. The trial was non-randomised, with agencies nominating staff to be allocated a license. Trial participants comprised a range of APS classifications, job families, experience levels with generative AI, and expectations of generative AI capabilities.

    Further detail on the background of the evaluation can be found in Appendix A.

    More broadly, this trial and evaluation tested the extent to which much of the wider promise of generative AI capabilities would translate into real-world adoption by workers. The results will help the government consider future opportunities and challenges related to adopting generative AI. 

    This was just the first trial of a generative AI tool within the Australian Government and the future brings exciting opportunities to understand what other tools are available to explore a broad landscape of use cases.

    Overview of the evaluation

    Nous Group (Nous) was engaged by the Digital Transformation Agency (DTA) to assist with the evaluation of the trial. The Australian Centre of Evaluation was consulted on methodology and approach to ensure best practise. The evaluation was guided by 4 objectives and key lines of enquiry (KLEs) outlined in the table below.

    Table 1. Evaluation objectives and key lines of enquiry  
     Evaluation objectiveKey lines of enquiry
    Employee related outcomesDetermine whether Copilot, as an example of generative AI, benefits APS productivity in terms of efficiency, output quality, process improvements and agency ability to deliver on priorities.What are the perceived effects of Copilot on APS employees?
    ProductivityEvaluate APS staff sentiment about the use of Copilot.What are the perceived productivity benefits of Copilot?
    Whole-of-government adoption of generative AIDetermine whether and to what extent Copilot, as an example of generative AI, can be implemented in a safe and responsible way across government.What are the identified adoption challenges of Copilot, as an example of generative AI, in the APS in the short and long term?
    Unintended outcomesIdentify and understand unintended benefits, consequences, or challenges of implementing Copilot as an example of generative AI and the implications on adoption of generative AI in the APS.

    Are there any perceived unintended outcomes from the adoption of Copilot?

    Are there broader generative AI effects on the APS?

    The findings of the evaluation and the resulting implications are outlined at a high level in this executive summary. Further details are provided in the body of the report. 

  • Glossary

    Term   Meaning
    AI in Government TaskforceCo-led by the DTA and the Department of Industry, Science and Resources (DISR), the AI in Government Taskforce aimed to deliver policies, standards, and guidance for the safe, ethical and responsible use of AI technologies by government.
    Confidence intervalA confidence interval is a statistical concept used to estimate a population parameter based on sample data. It provides a range of values that likely contain the true population parameter with a certain level of confidence.
    Generative AIGenerative AI is a branch of artificial intelligence focused on designing algorithms that generate novel outputs, such as text, images or sounds, based on learned patterns from data.
    HallucinationsLarge Language Models (LLMs) such as Microsoft 365 Copilot are trained to predict patterns rather than understand facts, sometimes leading to it returning plausible sounding but inaccurate information, which is referred to as a ‘hallucination’.
    Large Language Model (LLM)Large language models are a category of foundation models trained on immense amounts of data making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks.
    Microsoft 365A cloud-based suite of productivity and collaboration tools offered by Microsoft, including Office applications, email and other services.
    Microsoft OfficeA suite of desktop productivity applications from Microsoft, including Word, Excel, PowerPoint and others.
    Microsoft GraphA Microsoft application programming interface (API) that provides access to data and intelligence across Microsoft 365 services, enabling developers to build apps that interact with organisational data.
    Mixed methodsCombined use of both qualitative and quantitative research approaches to provide a more comprehensive understanding of the subject being evaluated.
    Microsoft 365 CopilotAI-enabled functionality embedded into the Microsoft 365 application suite. Formerly called Copilot for Microsoft 365.
    P-valueA statistical measure that indicates the probability of observing a result as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true.
    Trial participantAn Australian Public Service staff member who participated in the whole-of-government Microsoft 365 Copilot trial, between January and July 2024.
    T-testA statistical test used to compare the means of 2 groups to determine if they are significantly different from each other, accounting for the variability in the data and sample size.
  • Employee-related outcomes

  • Productivity

  • Employee-related outcomes

  • Productivity

  • Whole-of-government adoption of generative AI

  • Unintended outcomes

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.