Search

Detailed and adaptive implementation
1.1    Product selection
Agencies should consider which generative AI solution are most appropriate for their overall operating environment and specific use cases, particularly for AI Assistant Tools.
1.2    System configuration
Agencies must configure their information systems, permissions, and processes to safely accommodate generative AI products.
1.3    Specialised training
Agencies should offer specialised training reflecting agency-specific use cases and develop general generative AI capabilities, including prompt training.
1.4    Change management
Effective change management should support the integration of generative AI by identifying ‘Generative AI Champions’ to highlight the benefits and encourage adoption.
1.5    Clear guidance
The APS must provide clear guidance on using generative AI, including when consent and disclaimers are needed, such as in meeting recordings, and a clear articulation of accountabilities.
Encourage greater adoption
1.6    Workflow analysis
Agencies should conduct detailed analyses of workflows across various job families and classifications to identify further use cases that could improve generative AI adoption.
1.7    Use case sharing
Agencies should share use cases in appropriate whole-of-government forums to facilitate the adoption of generative AI across the APS.
Proactive risk management
1.8    Impact monitoring
The APS should proactively monitor the impacts of generative AI, including its effects on the workforce, to manage current and emerging risks effectively.
Evaluation objectives
The evaluation assessed the use, benefits, risks and unintended outcomes of Copilot in the APS during the trial.
The Digital Transformation Agency (DTA) designed 4 evaluation objectives, in consultation with:
the AI in Government Taskforce
the Australian Centre for Evaluation (ACE)
advisors from across the APS designed four evaluation objectives.
Employee-related outcomes
Evaluate APS staff sentiment about the use of Copilot, including:
staff satisfaction
innovation opportunities
confidence in the use of Copilot
ease of integration into workflow.
Productivity
Determine if Copilot, as an example of generative AI, benefits APS productivity in terms of:
efficiency
output quality
process improvements
agency ability to deliver on priorities.
Adoption of AI
Determine whether and to what extent Copilot, as an example of generative AI:
can be implemented in a safe and responsible way across government
poses benefits and challenges in the short and longer term
faces barriers to innovation that may require changes to how the APS delivers on its work.
Unintended consequences
Identify and understand unintended benefits, consequences, or challenges of implementing Copilot as an example of generative AI and the implications on adoption of generative AI in the APS.
There are both benefits and concerns that will need to be actively monitored.
Benefits
Generative AI could improve inclusivity and accessibility in the workplace particularly for people who are neurodiverse, with disability or from a culturally and linguistically diverse background.
The adoption of Copilot and generative AI more broadly in the APS could help the APS attract and retain employees.
Concerns
There are concerns regarding the potential impact of generative AI on APS jobs and skills needs in the future. This is particularly true for administrative roles, which then have a disproportionate flow on impact to marginalised groups, entry-level positions and women who tend to have greater representation in these roles as pathways into the APS.
Copilot outputs may be biased towards western norms and may not appropriately use cultural data and information such as misusing First Nations images and misspelling First Nations words.
The use of generative AI might lead to a loss of skill in summarisation and writing. Conversely a lack of adoption of generative AI may result in a false assumption that people who use it may be more productive than those that do not.
Participants expressed concerns relating to vendor lock-in, however the realised benefits were limited to specific features and use cases.
Participants were also concerned with the APS’ increased impact on the environment resulting from generative AI use.
Methodology
To ensure breadth and depth of insight through the evaluation, a mixed-methods approach was used. Qualitative and quantitative data collection methods were leveraged, including:
- a centralised issues register
- outreach interviews during the initial stages of the trial
- pre-use, post-use and pulse surveys
- post-trial interviews with key stakeholders
- focus groups.
A desktop review of reports provided by agencies and other documents relevant to the trial was also undertaken. The evaluation engaged with over 50 agencies and more than 2,000 trial participants between January to July 2024 across various engagement streams.
Information was gathered using several methods of evaluation.
Document/data review
The evaluation synthesised existing evidence, including:
- government research papers on Copilot and generative AI
- the trial issue register
- 6 agency-led internal evaluations.
Consultations
It also involved thematic analysis through:
- 24 outreach interviews conducted by the DTA
- 17 focus groups facilitated by Nous Group
- 8 interviews facilitated by Nous Group.
Surveys
Analysis was conducted on data collected from:
- 1,556 respondents in pre-use survey
- 1,159 respondents in pulse survey
- 831 respondents in post-use survey.
A thematic, frequency and comparative analysis of both qualitative and quantitative data was undertaken. Evaluation objectives and KLEs shaped the thematic analysis completed on qualitative data. In addition to this, frequency analysis provided insight into the majority sentiment of participants. Where possible, a comparative analysis was undertaken on survey responses. A total of 330 responses from the pre-use and post-use survey were linked via a unique survey ID.
Off
To ensure breadth and depth of insight through the evaluation, a mixed-methods approach was used. Qualitative and quantitative data collection methods were leveraged, including:
a centralised issues register
outreach interviews during the initial stages of the trial
pre-use, post-use and pulse surveys
post-trial interviews with key stakeholders
focus groups.
A desktop review of reports provided by agencies and other documents relevant to the trial was also undertaken. The evaluation engaged with over 50 agencies and more than 2,000 trial participants between January to July 2024 across various engagement streams.
Information was gathered using several methods of evaluation.
Document/data review
The evaluation synthesised existing evidence, including:
government research papers on Copilot and generative AI
the trial issue register
6 agency-led internal evaluations.
Consultations
It also involved thematic analysis through:
24 outreach interviews conducted by the DTA
17 focus groups facilitated by Nous Group
8 interviews facilitated by Nous Group.
Surveys
Analysis was conducted on data collected from:
1,556 respondents in pre-use survey
1,159 respondents in pulse survey
831 respondents in post-use survey.
A thematic, frequency and comparative analysis of both qualitative and quantitative data was undertaken. Evaluation objectives and KLEs shaped the thematic analysis completed on qualitative data. In addition to this, frequency analysis provided insight into the majority sentiment of participants. Where possible, a comparative analysis was undertaken on survey responses. A total of 330 responses from the pre-use and post-use survey were linked via a unique survey ID.
Limitations
Evaluation fatigue may have reduced trial participants’ engagement with the evaluation.
During the trial period, agencies and individual trial participants were involved in a variety of research activities managed internally by their own agencies as well as those driven centrally by the DTA. Research fatigue was a key challenge that influenced participation rates across focus groups, interviews and the post-use survey. Lower response rates in the post-use survey (n = 831) and for those who completed both the pre-use and post-use survey (n = 330) may impact how representative the data is of the trial population. However, the total number of responses means we were still able to effectively test changes in proportions before and after at the 5% level of significance. Where possible, the evaluation has drawn on insights from agency-specific evaluations to complement the evaluation findings.
This means that final evaluation research activities may not have captured the full spectrum of experiences and perspectives.
The non-randomised sample of trial participants may not reflect the views of the broader APS.
When comparing the proportion of trial participants with the population of the broader APS, there is an overrepresentation of Executive Level (EL)1s, EL2s and Senior Executive Services (SES) participants. In addition to this, there was a lower representation of junior APS classifications (APS1 to 4).
Trial participants voluntarily chose to take part in the trial, which may have led to a selection bias. While there were efforts made during the trial to invite participants from a range of backgrounds and experience with generative AI, there was a high proportion of trial participants who contributed to the evaluation who have previous experience with generative AI (66%) and are generally optimistic about Copilot (73%).
This means that results identified through this evaluation may not be fully representative of the views held by the entire APS.
There was an inconsistent rollout of Copilot across agencies.
The experience and sentiment of trial participants may be affected by when their agency began participating in the trial and their agency’s version of Copilot. Agencies received their Copilot licences between 1 January and 1 April 2024. Agencies that joined the trial later may not have been able to contribute to early evaluation activities, such as the pre-use survey or initial interviews, therefore excluding their perspective and preventing later comparison of outcomes.
Since the trial began, Microsoft has released 60 updates to Copilot to enable new features – including rectifying early technical glitches. Due to either information security requirements or a misalignment between agency update schedules, the new features of Copilot may have been inconsistently adopted across participating agencies or at times, not at all.
This means that there could be significant variation in Copilot’s functionality across agencies, and ability for agencies to build capability and identify use cases for Copilot.
The impact of Copilot relies on trial participants’ self-assessment of productivity benefits.
The evaluation methodology relied on trial participants self-assessing the impacts of Copilot, which may naturally under or overestimate the benefits – particularly time savings. Where possible, the evaluation compared its productivity findings against other APS agency evaluations and external research to verify the productivity savings put forth by trial participants.
Nevertheless, there is a risk that the impact of Copilot – in particular the productivity estimates from Copilot use – may not accurately reflect Copilot’s actual productivity impacts.
A comprehensive overview of the evaluation’s methodology and limitations is detailed in Appendix B.
Overarching findings
Generative AI is a disruptive technology that could transform APS’ productivity and ways of working. However, agencies will need to carefully weigh the potential benefits of efficiency and quality improvements against the costs, risks and suitability of generative AI to meet their agency’s needs.
There are perceived improvements to efficiency and quality for summarisation, preparing a first draft of a document and information searches.
From the trial, Copilot users from both this evaluation and agency-specific evaluations consistently reported quality and efficiency improvements in 3 key activities: summarisation of content, creating first drafts and information searches.
Trial participants estimated efficiency gains of around an hour when completing one of these 3 activities. Trial participants in junior levels (APS3-6), EL1s and information and communications technology (ICT) -related roles perceived the most efficiencies in these activities. In addition, 40% of post-use survey respondents reported they were able to reallocate their time to higher value activities such as staff engagement, culture building and mentoring, and building relationships with end users and stakeholders.
Overall, trial participants across all job classifications and job families were satisfied with Copilot and the majority wish to continue using it.
However, the adoption of generative AI requires a concerted effort to address technical, cultural and capability barriers and to improve usage.
Agencies faced adoption challenges during the trial. Technical barriers to adoption included needing to ensure information systems and processes were configured to safely accommodate Copilot.
Capability challenges were highlighted as a key barrier to adoption as trial participants needed both tailored training that provided agency-specific use cases as well as general generative AI training in prompt engineering. In addition, there were also cultural barriers with perceived stigma in using generative AI and discomfort with the use of meeting transcriptions. Some focus group participants reported feeling uncomfortable about being recorded and transcribed and perceived they were being pressured to consent.
Trial participants also noted the need for clear guidance and information regarding their accountabilities and the security of prompt information which in turn affected their use of Copilot. Finally, focus group participants also acknowledged the need to have change management supports in place including identifying ‘champions’ to illustrate generative AI's benefits to drive adoption.
These adoption challenges have contributed to the moderate use of Copilot during the Trial - only a third of trial participants used Copilot daily with its use concentrated in summarising meetings and information and re-writing content. There is currently a small number of identified novel or specific use cases across job families, and a limited use of broader Copilot functionalities.
There are a range of longer-term costs and risks that agencies will need to monitor and account for.
Interviews with government agencies highlighted that generative AI may have a large impact on the composition of APS jobs and skills, especially for women and junior staff who are perceived to be at a greater risk of job displacement by generative AI.
Trial participants also highlighted a range of broader concerns regarding the use of generative AI – from vendor lock-in to its environmental impact – that reflect the general unknown nature of generative AI but also its potentially wide-ranging impacts that will require close monitoring.
Findings summary
Employee related outcomes
77% were optimistic about Microsoft 365 Copilot at the end of the trial.
1 in 3 used Copilot daily.
Over 70% of used Microsoft Teams and Word during the trial, mainly for summarising and re-writing content
75% of participants who received 3 or more forms of training were confident in their ability to use Copilot, 28 percentage points higher than those who received one form of training.
Most trial participants were positive about Copilot and wish to continue using it
86% of trial participants wished to continue to use Copilot.
Senior Executive Service (SES) staff (93%) and Corporate (81%) roles had the highest positive sentiment towards Copilot.
Despite the positive sentiment, use of Copilot was moderate
Moderate usage was consistent across classifications and job families but specific use cases varied. For example, a higher proportion of SES and Executive Level (EL) 2 staff used meeting summarisation features, compared to other APS classifications.
Microsoft Teams and Word were used most frequently and met participants’ needs. Poor Excel functionality and access issues in Outlook hampered use.
Content summarisation and re-writing were the most used Copilot functions.
Other generative AI tools may be more effective at meeting users’ needs in reviewing or writing code, generating images or searching research databases.
Tailored training and propagation of high-value use cases could drive adoption
Training significantly enhanced confidence in Copilot use and was most effective when it was tailored to an agency’s context.
Identifying specific use cases for Copilot could lead to greater use of Copilot.
Productivity
69% of survey respondents agreed that Copilot improved the speed at which they could complete tasks.
61% agreed that Copilot improved the quality of their work.
40% of survey respondents reported reallocating their time for:
mentoring / culture building
strategic planning
engaging with stakeholders
product enhancement.
Most trial participants believed Copilot improved the speed and quality of their work
Improvements in efficiency and quality were perceived to occur in a few tasks with perceived time savings of around an hour a day for these tasks. These tasks include:
summarisation
preparing a first draft of a document
information searches.
Copilot had a negligible impact on certain activities such as communication.
APS 3-6 and EL1 classifications and ICT-related roles experienced the highest time savings of around an hour a day on summarisation, preparing a first draft of a document and information searches.
Around 65% of managers observed an uplift in productivity across their team.
Around 40% of trial participants were able to reallocate their time to higher value activities.
Copilot’s inaccuracy reduced the scale of productivity benefits.
Quality gains were more subdued relative to efficiency gains.
Up to 7% of trial participants reported Copilot added time to activities.
Copilot’s potential unpredictability and lack of contextual knowledge required time spent on output verification and editing which negated some of the efficiency savings.
Whole-of-government adoption of generative AI
61% of managers in the pulse survey could not confidently identify Copilot outputs.
There is a need for agencies to engage in adaptive planning while ensuring governance structures and processes appropriately reflect their risk appetites.
Adoption of generative AI requires a concerted effort to address key barriers.
Technical
There were integration challenges with non-Microsoft 365 applications, particularly JAWS and Janusseal, however it should be noted that such integrations were out of scope for the trial. Note: JAWS is a software product designed to improve the accessibility of written documents. Jannusseal is a data classification tool used to easily distinguish between sensitive and non-sensitive information.
Copilot may magnify poor data security and information management practices.
Capability
Prompt engineering, identifying relevant use cases and understanding the information requirements of Copilot across Microsoft Office products were significant capability barriers.
Legal
Uncertainty regarding the need to disclose Copilot use, accountability for outputs and lack of clarity regarding the applicability of Freedom of Information requirements were barriers to Copilot use – particularly for meeting transcriptions.
Cultural
Negative stigmas and ethical concerns associated with generative AI adversely impacted its adoption.
Governance
Adaptive planning is needed to reflect the rolling release cycle nature of generative AI tools, alongside relevant governance structures aligned to agencies’ risk appetites.
Unintended outcomes
There are both benefits and concerns that will need to be actively monitored.
Benefits
Generative AI could improve inclusivity and accessibility in the workplace particularly for people who are neurodiverse, with disability or from a culturally and linguistically diverse background.
The adoption of Copilot and generative AI more broadly in the APS could help the APS attract and retain employees.
Concerns
There are concerns regarding the potential impact of generative AI on APS jobs and skills needs in the future. This is particularly true for administrative roles, which then have a disproportionate flow on impact to marginalised groups, entry-level positions and women who tend to have greater representation in these roles as pathways into the APS.
Copilot outputs may be biased towards western norms and may not appropriately use cultural data and information such as misusing First Nations images and misspelling First Nations words.
The use of generative AI might lead to a loss of skill in summarisation and writing. Conversely a lack of adoption of generative AI may result in a false assumption that people who use it may be more productive than those that do not.
Participants expressed concerns relating to vendor lock-in, however the realised benefits were limited to specific features and use cases.
Participants were also concerned with the APS’ increased impact on the environment resulting from generative AI use.
There are perceived improvements to efficiency and quality for summarisation, preparing a first draft of a document and information searches.
From the trial, Copilot users from both this evaluation and agency-specific evaluations consistently reported quality and efficiency improvements in 3 key activities: summarisation of content, creating first drafts and information searches.
Trial participants estimated efficiency gains of around an hour when completing one of these 3 activities. Trial participants in junior levels (APS3-6), EL1s and information and communications technology (ICT) -related roles perceived the most efficiencies in these activities. In addition, 40% of post-use survey respondents reported they were able to reallocate their time to higher value activities such as staff engagement, culture building and mentoring, and building relationships with end users and stakeholders.
Overall, trial participants across all job classifications and job families were satisfied with Copilot and the majority wish to continue using it.
Limitations
Evaluation fatigue may have reduced trial participants’ engagement with the evaluation.
During the trial period, agencies and individual trial participants were involved in a variety of research activities managed internally by their own agencies as well as those driven centrally by the DTA. Research fatigue was a key challenge that influenced participation rates across focus groups, interviews and the post-use survey. Lower response rates in the post-use survey (n = 831) and for those who completed both the pre-use and post-use survey (n = 330) may impact how representative the data is of the trial population. However, the total number of responses means we were still able to effectively test changes in proportions before and after at the 5% level of significance. Where possible, the evaluation has drawn on insights from agency-specific evaluations to complement the evaluation findings.
This means that final evaluation research activities may not have captured the full spectrum of experiences and perspectives.
The non-randomised sample of trial participants may not reflect the views of the broader APS.
When comparing the proportion of trial participants with the population of the broader APS, there is an overrepresentation of Executive Level (EL)1s, EL2s and Senior Executive Services (SES) participants. In addition to this, there was a lower representation of junior APS classifications (APS1 to 4).
Trial participants voluntarily chose to take part in the trial, which may have led to a selection bias. While there were efforts made during the trial to invite participants from a range of backgrounds and experience with generative AI, there was a high proportion of trial participants who contributed to the evaluation who have previous experience with generative AI (66%) and are generally optimistic about Copilot (73%).
This means that results identified through this evaluation may not be fully representative of the views held by the entire APS.
There was an inconsistent rollout of Copilot across agencies.
The experience and sentiment of trial participants may be affected by when their agency began participating in the trial and their agency’s version of Copilot. Agencies received their Copilot licences between 1 January and 1 April 2024. Agencies that joined the trial later may not have been able to contribute to early evaluation activities, such as the pre-use survey or initial interviews, therefore excluding their perspective and preventing later comparison of outcomes.
Since the trial began, Microsoft has released 60 updates to Copilot to enable new features – including rectifying early technical glitches. Due to either information security requirements or a misalignment between agency update schedules, the new features of Copilot may have been inconsistently adopted across participating agencies or at times, not at all.
This means that there could be significant variation in Copilot’s functionality across agencies, and ability for agencies to build capability and identify use cases for Copilot.
The impact of Copilot relies on trial participants’ self-assessment of productivity benefits.
The evaluation methodology relied on trial participants self-assessing the impacts of Copilot, which may naturally under or overestimate the benefits – particularly time savings. Where possible, the evaluation compared its productivity findings against other APS agency evaluations and external research to verify the productivity savings put forth by trial participants.
Nevertheless, there is a risk that the impact of Copilot – in particular the productivity estimates from Copilot use – may not accurately reflect Copilot’s actual productivity impacts.
A comprehensive overview of the evaluation’s methodology and limitations is detailed in Appendix B.
Off
Evaluation fatigue may have reduced trial participants’ engagement with the evaluation.
During the trial period, agencies and individual trial participants were involved in a variety of research activities managed internally by their own agencies as well as those driven centrally by the DTA. Research fatigue was a key challenge that influenced participation rates across focus groups, interviews and the post-use survey. Lower response rates in the post-use survey (n = 831) and for those who completed both the pre-use and post-use survey (n = 330) may impact how representative the data is of the trial population. However, the total number of responses means we were still able to effectively test changes in proportions before and after at the 5% level of significance. Where possible, the evaluation has drawn on insights from agency-specific evaluations to complement the evaluation findings.
This means that final evaluation research activities may not have captured the full spectrum of experiences and perspectives.
The non-randomised sample of trial participants may not reflect the views of the broader APS.
When comparing the proportion of trial participants with the population of the broader APS, there is an overrepresentation of Executive Level (EL)1s, EL2s and Senior Executive Services (SES) participants. In addition to this, there was a lower representation of junior APS classifications (APS1 to 4).
Trial participants voluntarily chose to take part in the trial, which may have led to a selection bias. While there were efforts made during the trial to invite participants from a range of backgrounds and experience with generative AI, there was a high proportion of trial participants who contributed to the evaluation who have previous experience with generative AI (66%) and are generally optimistic about Copilot (73%).
This means that results identified through this evaluation may not be fully representative of the views held by the entire APS.
There was an inconsistent rollout of Copilot across agencies.
The experience and sentiment of trial participants may be affected by when their agency began participating in the trial and their agency’s version of Copilot. Agencies received their Copilot licences between 1 January and 1 April 2024. Agencies that joined the trial later may not have been able to contribute to early evaluation activities, such as the pre-use survey or initial interviews, therefore excluding their perspective and preventing later comparison of outcomes.
Since the trial began, Microsoft has released 60 updates to Copilot to enable new features – including rectifying early technical glitches. Due to either information security requirements or a misalignment between agency update schedules, the new features of Copilot may have been inconsistently adopted across participating agencies or at times, not at all.
This means that there could be significant variation in Copilot’s functionality across agencies, and ability for agencies to build capability and identify use cases for Copilot.
The impact of Copilot relies on trial participants’ self-assessment of productivity benefits.
The evaluation methodology relied on trial participants self-assessing the impacts of Copilot, which may naturally under or overestimate the benefits – particularly time savings. Where possible, the evaluation compared its productivity findings against other APS agency evaluations and external research to verify the productivity savings put forth by trial participants.
Nevertheless, there is a risk that the impact of Copilot – in particular the productivity estimates from Copilot use – may not accurately reflect Copilot’s actual productivity impacts.
A comprehensive overview of the evaluation’s methodology and limitations is detailed in Appendix B.
There’s a concern of vendor lock-in as the APS becomes more dependent on this tool.

Focus group participant
It’s difficult to account for a bias that you are yet to identify.

Focus group participant
Copilot could cause myself and colleagues to lack deep knowledge of topics.

Pre-use survey respondent
The overarching findings reveal several considerations for the APS in the context of future adoption of generative AI.
Detailed and adaptive implementation
1.1.1.1    Product selection
Agencies should consider which generative AI solution are most appropriate for their overall operating environment and specific use cases, particularly for AI Assistant Tools.
1.1.1.2    System configuration
Agencies must configure their information systems, permissions, and processes to safely accommodate generative AI products.
1.1.1.3    Specialised training
Agencies should offer specialised training reflecting agency-specific use cases and develop general generative AI capabilities, including prompt training.
1.1.1.4    Change management
Effective change management should support the integration of generative AI by identifying ‘Generative AI Champions’ to highlight the benefits and encourage adoption.
1.1.1.5    Clear guidance
The APS must provide clear guidance on using generative AI, including when consent and disclaimers are needed, such as in meeting recordings, and a clear articulation of accountabilities.
Encourage greater adoption
1.1.1.6    Workflow analysis
Agencies should conduct detailed analyses of workflows across various job families and classifications to identify further use cases that could improve generative AI adoption.
1.1.1.7    Use case sharing
Agencies should share use cases in appropriate whole-of-government forums to facilitate the adoption of generative AI across the APS.
Proactive risk management
1.1.1.8    Impact monitoring
The APS should proactively monitor the impacts of generative AI, including its effects on the workforce, to manage current and emerging risks effectively.
Recommendations
Detailed and adaptive implementation
1. Product selection
Agencies should consider which generative AI solution are most appropriate for their overall operating environment and specific use cases, particularly for AI Assistant Tools.
2. System configuration
Agencies must configure their information systems, permissions, and processes to safely accommodate generative AI products.
3. Specialised training
Agencies should offer specialised training reflecting agency-specific use cases and develop general generative AI capabilities, including prompt training.
4. Change management
Effective change management should support the integration of generative AI by identifying ‘Generative AI Champions’ to highlight the benefits and encourage adoption.
5. Clear guidance
The APS must provide clear guidance on using generative AI, including when consent and disclaimers are needed, such as in meeting recordings, and a clear articulation of accountabilities.
Encourage greater adoption
6. Workflow analysis
Agencies should conduct detailed analyses of workflows across various job families and classifications to identify further use cases that could improve generative AI adoption.
7. Use case sharing
Agencies should share use cases in appropriate whole-of-government forums to facilitate the adoption of generative AI across the APS.
Proactive risk management
8. Impact monitoring
The APS should proactively monitor the impacts of generative AI, including its effects on the workforce, to manage current and emerging risks effectively.
The overarching findings reveal several considerations for the APS in the context of future adoption of generative AI.
This section outlines the expectations and use of Microsoft 365 Copilot amongst trial participants including its use across the Microsoft 365 suite and identification of current, novel and future use cases.
Key insights
Most trial participants (77%) were satisfied with Copilot and wish to continue using the product.
The positive sentiment towards Copilot was not uniformly observed across all MS products or activities. In particular, MS Excel and Outlook Copilot functionality did not meet expectations.
Other generative AI tools may be more effective at meeting bespoke users’ needs than Copilot. In particular, Copilot was perceived to be less advanced in: writing and reviewing code, producing complex written documents, generating images for internal presentations, and searching research databases.
Despite the overall positive sentiment, the use of Copilot is moderate with only a third of post-use survey respondents using Copilot on a daily basis. Due to a combination of user capability, perceived benefit of the tool and convenience, and user interface, Copilot is yet to be ingrained in the daily habits of APS staff.
Copilot is currently used mainly for summarisation and re-writing content in Teams and Word.
There was a positive relationship between the provision of training and capability to use Copilot. Copilot training was most effective when tailored to the APS, the users’ role and the agency context.
There are opportunities to further enhance the use of generative AI across the policy lifecycle to increase adoption and benefits of generative AI.
Rather than just getting through the daily tasks reactively to meet deadlines, I feel as though I have more time to consider and work through about the way we do things and why we are doing them.

Trial participant from the administration job family, post-use survey
Trial participants also remarked on the ability to spend more time on face-to-face activities such as staff engagement, culture building, mentoring and taking more time to build relationships with end users and stakeholders. Acknowledging the human dependent nature of these tasks, these respondents redistributed their time into face-to-face activities such as communications to support their team and/or customers.
The quality of Copilot output limited the scale of productivity benefits.
Contextual irrelevance impacted the quality of Copilot outputs.
Overall, Copilot’s improvements to work quality were more subdued than improvements to work efficiency. As highlighted in Figure 14, while the majority of trial participants viewed that Copilot was effective at developing first drafts of documents and lifting overall quality, editing was almost always needed to tailor content for the audience or context thereby reducing total efficiency gains.

Figure 14 | Post-use survey responses reporting time savings of 0.5 hours or more (n=795) and overall agreement of improved quality of work (n=801), by type of activity
I have concerns about accuracy and hallucinations (both of which I experienced) which leads to distrust and needing to "double-check" its outputs; this significantly impacts any time savings made.

Trial participant from the ICT and Digital Solutions job family, post-use survey.

Detailed and adaptive implementation

1.1 Product selection

1.2 System configuration

1.3 Specialised training

1.4 Change management

1.5 Clear guidance

Encourage greater adoption

1.6 Workflow analysis

1.7 Use case sharing

Proactive risk management

1.8 Impact monitoring

Evaluation objectives

Employee-related outcomes

Productivity

Adoption of AI

Unintended consequences

Benefits

Concerns

Document/data review

Consultations

Surveys

Limitations

Evaluation fatigue may have reduced trial participants’ engagement with the evaluation.

The non-randomised sample of trial participants may not reflect the views of the broader APS.

There was an inconsistent rollout of Copilot across agencies.

The impact of Copilot relies on trial participants’ self-assessment of productivity benefits.

Overarching findings

Findings summary

Employee related outcomes

Most trial participants were positive about Copilot and wish to continue using it

Despite the positive sentiment, use of Copilot was moderate

Tailored training and propagation of high-value use cases could drive adoption

Productivity

Most trial participants believed Copilot improved the speed and quality of their work

Copilot’s inaccuracy reduced the scale of productivity benefits.

Whole-of-government adoption of generative AI

Adoption of generative AI requires a concerted effort to address key barriers.

Technical

Capability

Legal

Cultural

Governance

Unintended outcomes

Benefits

Concerns

Evaluation fatigue may have reduced trial participants’ engagement with the evaluation.

The non-randomised sample of trial participants may not reflect the views of the broader APS.

There was an inconsistent rollout of Copilot across agencies.

The impact of Copilot relies on trial participants’ self-assessment of productivity benefits.

Recommendations

Detailed and adaptive implementation

1. Product selection

2. System configuration

3. Specialised training

4. Change management

5. Clear guidance

Encourage greater adoption

6. Workflow analysis

7. Use case sharing

Proactive risk management

8. Impact monitoring

Key insights

The quality of Copilot output limited the scale of productivity benefits.

Contextual irrelevance impacted the quality of Copilot outputs.

Connect with the digital community