Copilot trial evaluation briefing
About the briefing
Following the release of the Microsoft 365 Copilot trial evaluation, the Digital Transformation Agency (DTA) hosted a public evaluation briefing. The briefing was held on Friday 25 October.
Participants were given the opportunity to ask questions before, during and after the briefing. Below are answers to the most frequent questions and, where possible, additional information to support industry and Australian Public Service (APS) staff.
Questions and answers
What questions were asked in the evaluation surveys?
You can access the questions for all 3 surveys on the Copilot trial survey page.
Is government aware of bespoke and standalone generative AI products?
The trial used Microsoft 365 Copilot to evaluate employee outcomes and productivity-related outcomes of general-use generative AI in the APS.
Agencies may choose to explore bespoke, standalone or other use cases and may seek information on or procure solutions through the marketplaces on BuyICT.
What should vendors consider if they wish to offer generative AI solutions or services to government?
Vendors should make sure their AI offerings align to applicable policies, including:
- the Policy for the responsible use of AI in government
- supporting standards, frameworks and guidance
- the National framework for the assurance of AI in government
- a forthcoming suite of AI technical standards for APS agencies, which the DTA will publish under an open licence in 2025.
As with any technology, vendors should be familiar with:
- Getting started as a seller on BuyICT
- digital sourcing resources for government
- the Guide to selling, published by the Department of Finance.
Are there plans for future trials of generative AI products from other vendors?
As of January 2025, the Digital Transformation Agency has no plans to conduct further whole-of-government trials of generative AI products. APS agencies may conduct their own trials or evaluations.
Will Copilot be offered to APS agencies as part of Microsoft's whole-of-government arrangements?
APS agencies may choose to procure Microsoft 365 Copilot within the whole-of-government single-seller arrangement.
Will the Australian Government train its own generative AI model?
As of January 2025, the Australian Government is not exploring a bespoke, whole-of-government generative AI model. Agencies may choose to procure, develop or collaborate on bespoke models to meet their specific needs.
How were privacy or security concerns managed during the trial?
As with any technology, agencies must apply relevant policies when using generative AI technologies.
Before the trial, Microsoft commissioned an updated Infosec Registered Assessors Program (IRAP) assessment for its products that integrate with and enable Copilot features.
This is available to tenant administrators on the Microsoft Service Trust portal. Agencies were also required to conduct a privacy impact assessment before deploying Microsoft 365 Copilot to their participating staff.
The evaluation noted that, during the trial, agencies faced:
- data challenges with data security and information management
- a lack of clarity on legal and regulatory requirements.
The DTA publishes and maintains AI-specific resources to support agencies through these challenges. This includes the Australian Government’s pilot AI assurance framework and a suite of AI technical standards. They are due for release in 2025.
Is there guidance available for APS agencies which develop, procure or deploy generative AI tools?
APS agencies which develop, procure, deploy or use AI must comply with whole-of-government policies, standards and guidance.
As with any technology, agencies must also align to other applicable policies, such as those related to:
- procurement
- cybersecurity
- privacy
- data protection and management
- Indigenous data governance
- transparency.
Did the benefits of Copilot to agencies outweigh the costs?
The trial evaluation did not assess the cost-benefit ratio for Microsoft 365 Copilot at a whole-of-government level. However, the overarching findings note that agencies should consider the costs of implementing Copilot and other generative AI products while they are in their early days.
Agencies may choose to conduct cost-benefit evaluations specific to their operating environment, whether drawing upon their agency-level observations from the whole-of-government trial or while independently piloting generative AI products.
Did the trial evaluate for accessibility benefits?
While the trial did not directly evaluate accessibility benefits, some positive outcomes for inclusivity and accessibility were detailed in the full report.
Did the trial evaluate environmental impacts?
While the trial did not directly evaluate environmental impacts, the report detailed some concerns that were observed around the use of generative AI and the APS’s environmental footprint’.
Did the trial benchmark or evaluate the accuracy of Copilot outputs?
The trial did not benchmark or technically evaluate the accuracy of Microsoft 365 Copilot’s outputs.
That said, participants reported that inaccuracy and unpredictability impacted their productivity. This could have implications for broader adoption of generative AI.
The full evaluation methodology can be explored in Appendix B.
Did the trial compare participants in technical and non-technical jobs?
Differences in experience between APS classifications and job families can be explored across the employee-related outcomes and productivity chapters of the full report.
Information about how the job families were aggregated and limitations, including positive bias sentiment, can be found in Appendix B.
The rate of survey participation by job family can be found in Appendix D.
What impact will generative AI have on the APS workforce?
The full report makes several observations related to the impact of generative AI tools such as Microsoft 365 Copilot on workforces.
Many of these are detailed in the unintended outcomes chapter.
They include potential:
- improvements to inclusivity and accessibility
- staff attraction and retention
- impacts on roles and employment opportunities
- skills development and decay.
The evaluation recommends proactive monitoring for current and emerging risks, including the effects on the workforce.
Is further training required for APS staff to effectively adopt generative AI?
The evaluation observed a positive relationship between training and capability. Participants found training to be more effective when tailored to an APS context.
Recommendation 3 suggests agencies should offer specialised training based on their specific use cases.
Meanwhile, whole-of-government policy strongly recommends minimum training for all staff as well as additional, role-based training. To help agencies fulfil this recommendation, the DTA has published an AI fundamentals training module.