Copilot briefing | digital.gov.au

Copilot trial evaluation briefing

About the briefing

Following the release of the Microsoft 365 Copilot trial evaluation, the Digital Transformation Agency (DTA) hosted a public evaluation briefing. The briefing was held on Friday 25 October.

Good morning everyone and thank you for joining us today. My name is Lucy Poole I'm the head of the division for Strategy, Planning and Performance here at the Digital Transformation Agency. I'd like to advise our attendees today that this session will be recorded, however, as it's a webinar only the presenters who present will be in the recording. Before I begin I would like to acknowledge the traditional custodians of the lands on which we are all meeting on today the various lands for me that is the Ngunnawal people and I would like to pay my respects to any First Nations people that may be joining us today. So firstly it's wonderful to see such interest in our briefing today I think our numbers are heading up towards 500 so that's an excellent turnout for us. As you can see from our agenda we've got a lot to cover of and in a moment um I'll explain the context for the trial within our wider work on AI in government. I'll then pass on to Lauren Mills who will give an overview of the evaluation approach findings and most importantly the recommendations and consequently what's next for government. Before we go any further I do need to let you know about a probity matter.

So, on the the 4th of October the DTA released an RFT on AusTender for the cloud Marketplace refresh.We also also working on the tender for the digital Marketplace panel too to maintain fairness we can't speak to nor answer questions about these processes today even if they see seem simple or confirm public information.If you have direct questions about them please reach out to the respective contact officers. Those details are shown on your screen.

I would also like to clarify from the outset that the Digital Transformation Agency is not responsible for how government departments or agencies choose to procure or adopt generative AI tools including co-pilot for Microsoft 365 . Now without other way let's go to setting the scene for today. First of all would like to um uh give thanks to everybody who's um who's here who provided questions ahead of today's session uh your questions have helped us to frame up the conversation that we'll have with you today and and with any luck we'll cover off the majority of those questions. Based on what you told us we're going to establish where the trial sits in the larger context of AI in government. What the recommendations mean for APS agencies and vendors and where we go from here. There are lots of specific questions that go deep into ways to use the product or very specific technical and security implications. We could talk for hours on these aspects but sadly we don't have the time so this morning's session will include some resources that go some way to explaining the wider picture of AI and government. It will also cover off the technical and security information about Microsoft 365 co-pilot. For those who haven't read it I highly encourage reading the full evaluation report in answers almost all the questions that won't be directly addressed today. We've also enabled the Q&A functions in Teams and our teams will try to answer as many of your questions as we can. Please do ask for follow-ups throughout the session today. We'll capture everything you ask and look to publish more information after the session to fill any any gaps that uh that we don't meet today. So let's move into the wider context for AI in government. For those of you that aren't familiar with the work of the Digital Transformation Agency, we are governments advisor for the development delivery and monitoring of whole of government strategies, policies standards for digital and ICT Investments and procurement. This includes setting the overall direction for how the Australian Government explores and procures and adopts new technology including generative AI. We aren't responsible for what we call whole of economy policy work on AI. Those that impact every business or person across Australia but we do ensure that our work aligns to this broader picture, because government should naturally be the exemplar of safe responsible use of AI that fulfills Australian AI ethics principles. We work very closely with the Department of Industry Science and Resources who are responsible for this Whole of economy piece. Together we co-led the AI and government task force through to the end of June this year. The work of that task force has directly informed the work of the DTA including the whole of government policy for the responsible use of AI. Accompanying standards and guidance to help agencies to fulfill the policies requirements.

Training and supporting guidance for APS staff the end users of generative AI. These instruments are available and were relevant applicable right now. We also have an ongoing slate of work which will see results through uh through the new year these include developing AI technical standards for use by government which will be openly available for use by other other governments organisations and Industry. We're also working on piloting the Australian government's own AI Assurance framework for positively managing the risks associated with different AI use cases, and continuing our work on progressively updating the AI policy to keep up with both changes in technology and also the expectations of the APS and the wider community.

It's in excuse me it's within this context that we undertook the trial of the Microsoft 365 co-pilot as an example of an a generative AI tool so that we can understand what impact this technology might begin to have on the way that public servants go about their work. That was the intent of the trial and its evaluation its impact on how people work. At the time of the trial while Microsoft 365 copilot was the most appropriate tool to undertake an evaluation of General generative AI capabilities in day-to-day technology Suites of the APS. Its integration with Office Products familiar to the our APS staff within existing whole of Government Contracting Arrangements allowed us to undertake the evaluation in a timeline that worked for our needs and in a way that would ensure a relatively consistent implementation and user experience across government. Now I'm starting to get in the weeds of the trial itself and how the evaluation worked so at this point I'll hand over to Lauren Mills thank you.

Thanks Lucy and good morning everyone I'm Lauren Mills I lead the strategy and prioritisation Branch here at the DTA I'm going to start with um a brief explanation of our evaluation approach noting the full report is available and it does go into a lot more detail for those that are interested I think the team are actually going to pop a link in the chat here so if you haven't had the opportunity to read it or you' just like to follow along today um you can grab that now. So as Lucy mentioned the tool we selected for the trial was Microsoft 365 Copilot which is ubiquitous about thethroughout the M365 Suite of products and so given this we thought that it wasn't the best approach to set a strict set of use cases for agencies to use. So for the avoidance of all doubt just to be clear we specifically looked at the Microsoft 365 Copilot and not other Copilot offerings. So defining use cases was going to be counterintuitive to the experimental nature of the trial and we wanted to give flexibility to agencies with their specific operating environments and their own requirements to effectively choose their own adventure and see what they could find with the product. This was really important because we had over 7,700 licenses purchased across 60 Australian government entities participating in the trial but we wanted to start up front with what do we need to find out through the course of the trial. So we worked closely with the AI and government task force and we identified four key outcome areas that we wanted to explore, so that was really around employee related outcomes. So what about staff sentiment in the use of copilot as an example of generative AI including staff satisfaction, opportunities for Innovation, confidence in the use of Copilot and how easy we could integrate to our existing workflows. Of course productivity was a key area we wanted to explore through the trial both in terms of efficiency but also in quality and whether there was opportunity for process improvements. We also wanted to look at the adoption of AI more broadly and to what extent copilot could be implemented in a safe and responsible way across government, how it could pose benefits and challenges in the short and longer term and also what barriers to Innovation do we have that might required changing the way that we deliver our services to embrace the opportunities of these new technologies. We also wanted to understand any unintended consequences both benefits and challenges of implementing co-pilot and the implications for broader adoption across the APS. So the report that you've seen publishes it includes the post evaluation findings but I also wanted to share some of the insights we we learned throughout the trial, as I mentioned one of the biggest challenges we we had was the sheer breadth of agencies who had signed on to participate in the trial. So every agency had a different level of maturity in their use of AI different operating environments and of course different risk appetites, however, although all very different across them these agencies there were a lot of common themes that came through and I'm just going to talk through some of those now. Initially security was a key focus area an agency's had various um levels of reliance on the Australian Cyber Security Center's Infosec registered assesses program or IRAP. IRAP assessments.

So while the IRAP actually assists agencies in their security assessments it is of course not mandatory across the APS, however, in many cases smaller agencies um particularly the smaller agencies have mandated it as part of their um establishing their ability to authority to operate so it was absolutely critical that we got that IRAP sorted straight straight away. Another key area was data governance while not a new risk for for us the nature of the tool the ease with which Copilot could highlight the access to all the documents and files that individual staff members had access to so that was something we needed to understand exactly from the start. Some agencies took advantage of it and actually used Copilot to undertake audits of their information systems before fully rolling out the product. Others were satisfied that they had the right risks risk management processes in place to identify and they put in processes to remediate anything found through the course of the trial. The other key area was privacy and similarly and in some cases directly related to data governance there were various levels of risk appetite in relation to privacy and agencies who held customer data actually applied a higher level of caution which makes sense, however, one of the biggest challenges was actually understanding what were the specific privacy considerations to generative AI. What made it different to other technologies? We had established a program board to govern the the trial and underneath this program board we established a privacy working group to unpack some of these privacy considerations in more detail. As part of this group we actually set up a cross APS privacy impact assessment which was coordinated by the department of Home Affairs to establish a base set of common assumptions and use cases. While obviously agencies are responsible for conducting their own assessments based on their own operating environments they could use this joint PIA to reduce the duplication of effort as well as to reduce costs and it was a really great example of how the APS can collaborate to efficiently solve problems and share our learnings. Finally another one was recordkeeping through the central issues register that we established for the trial there was a common theme around what constitutes a record under the archives act. Specifically for things like meeting recordings transcripts and of course first drafts of documents using Copilot . To unpack this further we established a second working group under the trials program board which worked in consultation with the National Archives of Australia to establish a whole of government advice on how these records should be treated. This work work remains ongoing but at a high level Copilot and other generative AI assistants could be viewed as simply another tool that staff may use to conduct their work and should have record retention periods that reflect that. This is not a catch all and there will be additional nuanced advice depending on the scenario or use case and so as I said this this work is continuing and we're hoping to get some whole government advice out. In terms of what we saw around success of the trial, through the course we saw that the agencies had the most um success in the adoption and the use of of this product were those that had already thought about their specific environments and the application of generative AI within their organisation, which makes sense and I think the other key finding which is around the it's quite common in change projects those who had champions particularly those who had strong executive sponsors promoting the benefits saw the highest adoption and were able to conduct robust internal evaluations.

So in terms of the post trial evaluation findings as I've mentioned the the reports are very detailed and will provide a really good source of information for for all of you so I'm going to cover off some of the findings just at a higher level only today. As I mentioned earlier there were four key areas that we looked at in terms of evaluation and so in terms of employee related outcomes we saw that most trial participants were positive about Copilot and wish to continue using it so 86% of trial participants said they wanted to keep using the product. Interestingly Senior Executive Service staff about 93% and corporate roles about 81% had the highest positive sentiment towards co-pilot however despite the positive sentiment use of copilot was moderate our analysis was conducted across both job families and those different levels across the APS and moderate usage was consistent across these classifications and job families but those specific use cases varied for example a higher proportion of SES and EL2 staff used the meeting summarisation features compared to other APS classifications which makes sense. Microsoft Teams and Word were the most frequently used and met participants needs, however, there was considered to be very poor Excel functionality and the access issues and Outlook did hamper use in that in that product. As expected content summarisation and rewriting were the most used Copilot functions but it was clear that other generative AI tools might be more effective and meeting users needs in terms of things like writing code generating images or searching research databases.

It's clear that tailored training and propagation of high value use cases could improve adoption, so we saw that training significantly enhanced confidence in the use of Copilot and was most effective when it was tailored to an agency specific context it's also important that we identify specific use cases for Copilot which will help as I said help that adoption and promote that use of the product. In productivity most trial participants believed Copilot improved the speed and quality of their work so improvements in both efficiency and quality were seen with perceived time savings around an hour a day for some people including uh for some of those tasks such as summarisation, preparing that first draft of a document or information services, and I think um really great to see that 40% of survey respondents reported reallocating their time for some important activities such as mentoring and culture building strategic planning engaging with stakeholders and product enhancement, however, copilot's inaccuracy did reduce the scale of these productivity benefits. So the gains on quality were more subdued relative to those efficiency gains and the potential unpredictability and lack of contextual knowledge in terms of those outputs from copilot required time um spent on that output verification which kind of negated some of the efficiency savings. In terms of adoption there is as I've said a need for agencies to engage in that planning activities about how they bring on board generative AI tools and making sure that those governance structures and processes appropriately reflect their risk appetites. Many of the insights under this outcome reflect what we found during the trial but some of the key barriers were the integration challenges with non-microsoft 365 applications, however, it should be noted that um these Integrations were actually out of scope for the trial. Prompt engineering identifying relevant use cases and understanding the information requirements of copilot across the Microsoft Office Products with significant capability barriers and of course that planning and and to reflect the role in release nature of gen AI tools alongside our relevant governance structures um is is important. And finally just some of the unintended outcomes some both benefits and concerns that we will need to be actively monitored throughout the adoption of AI.

So in terms of benefits a really interesting outcome was that Gen AI could improve inclusivity and accessibility in the workplace for particularly for those who are neurodiverse with a disability or from a culturally and linguistically diverse background and that the adoption of Copilot and gen more broadly could actually help the APS attract and retain employees. However, there were some concerns particularly around the potential impact of Gen AI on APS jobs and skills needs in the future. Also as we've seen more broadly that the outputs might be biased towards our Western norms and may not appropriately used cultural data and information such as misusing First Nations images and misspelling First Nation words . There was concerns that the the use of Gen AI might lead to a loss of skill in summarisation and writing or conversely that a lack of adoption of Gen AI may result in a false assumption that people who use it might be more productive than those who don't. Participants also expressed concerns relating to vendor lock in, however, the report found the realised benefits were limited to specific features and use cases as we've discussed. Finally participants were also concerned about the APS's increased impact on the environment resulting from Gen AI use. So in terms of the recommendations um the overarching findings reveal several considerations um for the APS in the context of future adoption of Gen AI and we put together eight recommendations in total across three focus areas so we need to ensure we do detailed and adaptive implementation so in terms of product selection agencies should consider which Gen AI solution is most appropriate for their overall operating environment and their specific use cases particularly for these AI assistant tools. In system configuration we must configure our information systems permissions and processes to safely accommodate Gen AI products. Specialised training is essential reflecting agency specific use cases and developing broader Gen AI capabilities including prompt training. As I discussed change management is key, effective change management should support the integration of Gen AI and potentially identifying Gen AI champions to highlight the benefits and encourage adoption. We need to develop clear guidance on using Gen AI including when consent and disclaimers are needed such as in meeting recordings and a clear articulation of accountabilities. We need to encourage greater adoption through analysing our workflows across various job families and classifications to identify further use cases that can approve adoption. We need to continue to share use cases, we've seen great collaboration and bringing together of knowledge across the APS around the use of this emerging technology and we need to continue that and look where we can share those appropriate whole of government forums to facilitate the adoption of Gen AI. And finally we've talked a lot about some of these impacts and we need to proactively monitor the impacts of generative AI including its effects on the workforce to manage current and emerging risks effectively. So what's next? Many of the questions we've received from you were future focused which is great to see the findings from the trial will directly inform the next iteration of the policy for the responsible use of AI and government as well as the AI Assurance framework which Lucy mentioned we are piloting right now. In addition to this we'll continue to explore the work of privacy and recordkeeping under those working groups and these will continue to remain in place post the conclusion of the trial and we're working closely with the National Archives Australia as well as the office of the Australian Information Commission to progress that work. In terms of Gen AI adoption across government the decision to adopt generative AI remains the responsibility of each agency, as Lucy may clear up front. We do play a role here at the DTA in supporting that decision-making through our policies and frameworks and access to vendors through the Digital Marketplace. Specific to Copilot we do understand there may be an uptake across the service and so we are currently finalising some technical readiness documentation to support agencies who do choose to implement Copilot. This suite of documents aims to support that safe and responsible implementation of the product within the Australian government context, and we're also working closely with the Australian Cyber Security center to complete that work. So that brings us to the end um of that section and moving into the Q&A part of today's agenda, as noted at the start of of the brief we have grouped our questions into key themes in order to cover as much as possible, so we're going to start with the questions that were asked through the registration process and where possible some that were raised today uh my colleagues I believe are also responding to questions through the Q&A function. Thank you so much Lauren, we have had a fairly significant disruption to our MS Team service, our end, which is meaning that we're not able to access the questions coming through nor even make some workarounds of the more manual type so what we're going to do is we're going to we're going to call it a day here and what we will do is endeavor to as I mentioned earlier that the presentation today will be shared uh online so you'll have full access to that but there will be more information that the Digital Transformation Agency looks to to push out based on the questions that we know that have come through that may not have been addressed throughout the session and I do appreciate the thumbs up and the clapping that's coming through you know. Whenever you're doing a live session you got to expect these things I guess.

So I'd like to just give thanks to both the DTA team in the background who are are you know frantically trying to fix the problem but also those that were that were enrolled directly in the pilot itself thank you to Lauren Mills and most importantly thank you all for attending today your interest is greatly appreciated and and we hope that the the information that we covered has been helpful for you. So thank you, enjoy the rest of your Friday and have a fabulous weekend thank you.

Participants were given the opportunity to ask questions before, during and after the briefing. Below are answers to the most frequent questions and, where possible, additional information to support industry and Australian Public Service (APS) staff.

Questions and answers

What questions were asked in the evaluation surveys?

You can access the questions for all 3 surveys on the Copilot trial survey page.

Is government aware of bespoke and standalone generative AI products?

The trial used Microsoft 365 Copilot to evaluate employee outcomes and productivity-related outcomes of general-use generative AI in the APS.

Agencies may choose to explore bespoke, standalone or other use cases and may seek information on or procure solutions through the marketplaces on BuyICT.

What should vendors consider if they wish to offer generative AI solutions or services to government?

Vendors should make sure their AI offerings align to applicable policies, including:

the Policy for the responsible use of AI in government
supporting standards, frameworks and guidance
the National framework for the assurance of AI in government
a forthcoming suite of AI technical standards for APS agencies, which the DTA will publish under an open licence in 2025.

As with any technology, vendors should be familiar with:

Getting started as a seller on BuyICT
digital sourcing resources for government
the Guide to selling, published by the Department of Finance.

Are there plans for future trials of generative AI products from other vendors?

As of January 2025, the Digital Transformation Agency has no plans to conduct further whole-of-government trials of generative AI products. APS agencies may conduct their own trials or evaluations.

Will Copilot be offered to APS agencies as part of Microsoft's whole-of-government arrangements?

APS agencies may choose to procure Microsoft 365 Copilot within the whole-of-government single-seller arrangement.

Will the Australian Government train its own generative AI model?

As of January 2025, the Australian Government is not exploring a bespoke, whole-of-government generative AI model. Agencies may choose to procure, develop or collaborate on bespoke models to meet their specific needs.

How were privacy or security concerns managed during the trial?

As with any technology, agencies must apply relevant policies when using generative AI technologies.

Before the trial, Microsoft commissioned an updated Infosec Registered Assessors Program (IRAP) assessment for its products that integrate with and enable Copilot features.

This is available to tenant administrators on the Microsoft Service Trust portal. Agencies were also required to conduct a privacy impact assessment before deploying Microsoft 365 Copilot to their participating staff.

The evaluation noted that, during the trial, agencies faced:

The DTA publishes and maintains AI-specific resources to support agencies through these challenges. This includes the Australian Government’s pilot AI assurance framework and a suite of AI technical standards. They are due for release in 2025.

Is there guidance available for APS agencies which develop, procure or deploy generative AI tools?

APS agencies which develop, procure, deploy or use AI must comply with whole-of-government policies, standards and guidance.

As with any technology, agencies must also align to other applicable policies, such as those related to:

procurement
cybersecurity
privacy
data protection and management
Indigenous data governance
transparency.

Did the benefits of Copilot to agencies outweigh the costs?

The trial evaluation did not assess the cost-benefit ratio for Microsoft 365 Copilot at a whole-of-government level. However, the overarching findings note that agencies should consider the costs of implementing Copilot and other generative AI products while they are in their early days.

Agencies may choose to conduct cost-benefit evaluations specific to their operating environment, whether drawing upon their agency-level observations from the whole-of-government trial or while independently piloting generative AI products.

Did the trial evaluate for accessibility benefits?

While the trial did not directly evaluate accessibility benefits, some positive outcomes for inclusivity and accessibility were detailed in the full report.

Did the trial evaluate environmental impacts?

While the trial did not directly evaluate environmental impacts, the report detailed some concerns that were observed around the use of generative AI and the APS’s environmental footprint’.

Did the trial benchmark or evaluate the accuracy of Copilot outputs?

The trial did not benchmark or technically evaluate the accuracy of Microsoft 365 Copilot’s outputs.

That said, participants reported that inaccuracy and unpredictability impacted their productivity. This could have implications for broader adoption of generative AI.

The full evaluation methodology can be explored in Appendix B.

Did the trial compare participants in technical and non-technical jobs?

Differences in experience between APS classifications and job families can be explored across the employee-related outcomes and productivity chapters of the full report.

Information about how the job families were aggregated and limitations, including positive bias sentiment, can be found in Appendix B.

The rate of survey participation by job family can be found in Appendix D.

What impact will generative AI have on the APS workforce?

The full report makes several observations related to the impact of generative AI tools such as Microsoft 365 Copilot on workforces.

Many of these are detailed in the unintended outcomes chapter.

They include potential:

improvements to inclusivity and accessibility
staff attraction and retention
impacts on roles and employment opportunities
skills development and decay.

The evaluation recommends proactive monitoring for current and emerging risks, including the effects on the workforce.

Is further training required for APS staff to effectively adopt generative AI?

The evaluation observed a positive relationship between training and capability. Participants found training to be more effective when tailored to an APS context.

Recommendation 3 suggests agencies should offer specialised training based on their specific use cases.

Meanwhile, whole-of-government policy strongly recommends minimum training for all staff as well as additional, role-based training. To help agencies fulfil this recommendation, the DTA has published an AI fundamentals training module.

Presentation slide deck

Slides - Copilot evaluation industry briefing - 25 October 2024.pdf (pdf, 119.25 KB)
LAST UPDATED: 25 Oct 2024

Australian Government trial of Microsoft 365 Copilot