Evaluation findings

Employee related outcomes

  • 77% were optimistic about Microsoft 365 Copilot at the end of the trial.
  • 1 in 3 used Copilot daily.
  • Over 70% of used Microsoft Teams and Word during the trial, mainly for summarising and re-writing content.
  • 75% of participants who received 3 or more forms of training were confident in their ability to use Copilot, 28 percentage points higher than those who received one form of training.

Most trial participants were positive about Copilot and wish to continue using it 

  • 86% of trial participants wished to continue to use Copilot.
  • Senior Executive Service (SES) staff (93%) and Corporate (81%) roles had the highest positive sentiment towards Copilot.

Despite the positive sentiment, use of Copilot was moderate

Moderate usage was consistent across classifications and job families but specific use cases varied. For example, a higher proportion of SES and Executive Level (EL) 2 staff used meeting summarisation features, compared to other APS classifications.

Microsoft Teams and Word were used most frequently and met participants’ needs. Poor Excel functionality and access issues in Outlook hampered use.

Content summarisation and re-writing were the most used Copilot functions.

Other generative AI tools may be more effective at meeting users’ needs in reviewing or writing code, generating images or searching research databases.

Tailored training and propagation of high-value use cases could drive adoption

Training significantly enhanced confidence in Copilot use and was most effective when it was tailored to an agency’s context.

Identifying specific use cases for Copilot could lead to greater use of Copilot.

Productivity

  • 69% of survey respondents agreed that Copilot improved the speed at which they could complete tasks.
  • 61% agreed that Copilot improved the quality of their work.
  • 40% of survey respondents reported reallocating their time for:
    • mentoring / culture building
    • strategic planning
    • engaging with stakeholders
    • product enhancement.

Most trial participants believed Copilot improved the speed and quality of their work

Improvements in efficiency and quality were perceived to occur in a few tasks with perceived time savings of around an hour a day for these tasks. These tasks include: 

  • summarisation
  • preparing a first draft of a document 
  • information searches. 

Copilot had a negligible impact on certain activities such as communication.

APS 3-6 and EL1 classifications and ICT-related roles experienced the highest time savings of around an hour a day on summarisation, preparing a first draft of a document and information searches.

Around 65% of managers observed an uplift in productivity across their team.

Around 40% of trial participants were able to reallocate their time to higher value activities.

Copilot’s inaccuracy reduced the scale of productivity benefits.

Quality gains were more subdued relative to efficiency gains.

Up to 7% of trial participants reported Copilot added time to activities.

Copilot’s potential unpredictability and lack of contextual knowledge required time spent on output verification and editing which negated some of the efficiency savings.

Whole-of-government adoption of generative AI

61% of managers in the pulse survey could not confidently identify Copilot outputs.

There is a need for agencies to engage in adaptive planning while ensuring governance structures and processes appropriately reflect their risk appetites.

Adoption of generative AI requires a concerted effort to address key barriers.

Technical

There were integration challenges with non-Microsoft 365 applications, particularly JAWS and Janusseal, however it should be noted that such integrations were out of scope for the trial. Note: JAWS is a software product designed to improve the accessibility of written documents. Jannusseal is a data classification tool used to easily distinguish between sensitive and non-sensitive information.

Copilot may magnify poor data security and information management practices.

Capability

Prompt engineering, identifying relevant use cases and understanding the information requirements of Copilot across Microsoft Office products were significant capability barriers.

Legal

Uncertainty regarding the need to disclose Copilot use, accountability for outputs and lack of clarity regarding the remit of Freedom of Information were barriers to Copilot use – particularly in regard to transcriptions.

Cultural

Negative stigmas and ethical concerns associated with generative AI adversely impacted its adoption.

Governance

Adaptive planning is needed to reflect the rolling release cycle nature of generative AI tools, alongside relevant governance structures aligned to agencies’ risk appetites.

Unintended outcomes

There are both benefits and concerns that will need to be actively monitored.

Benefits

Generative AI could improve inclusivity and accessibility in the workplace particularly for people who are neurodiverse, with disability or from a culturally and linguistically diverse background.

The adoption of Copilot and generative AI more broadly in the APS could help the APS attract and retain employees.

Concerns

There are concerns regarding the potential impact of generative AI on APS jobs and skills needs in the future. This is particularly true for administrative roles, which then have a disproportionate flow on impact to marginalised groups, entry-level positions and women who tend to have greater representation in these roles as pathways into the APS.

Copilot outputs may be biased towards western norms and may not appropriately use cultural data and information such as misusing First Nations images and misspelling First Nations words.

The use of generative AI might lead to a loss of skill in summarisation and writing. Conversely a lack of adoption of generative AI may result in a false assumption that people who use it may be more productive than those that do not.

Participants expressed concerns relating to vendor lock-in, however the realised benefits were limited to specific features and use cases.

Participants were also concerned with the APS’ increased impact on the environment resulting from generative AI use.

There’s a concern of vendor lock-in as the APS becomes more dependent on this tool.

Focus group participant

It’s difficult to account for a bias that you are yet to identify.

Focus group participant

Copilot could cause myself and colleagues to lack deep knowledge of topics.

Pre-use survey respondent
  • A mixed-methods approach was adopted for the evaluation.

    Over 2,000 trial participants from more than 50 agencies contributed to the evaluation. The final report was written based on document/data review, consultations and surveys.

    Document/data review

    The evaluation synthesised existing evidence, including:

    • government research papers on Copilot and generative AI
    • the trial issue register
    • 6 agency-led internal evaluations.

    Consultations

    It also involved thematic analysis through:

    • 24 outreach interviews conducted by the DTA
    • 17 focus groups facilitated by Nous Group
    • 8 interviews facilitated by Nous Group.

    Surveys

    Analysis was conducted on data collected from:

    • 1,556 respondents in pre-use survey
    • 1,159 respondents in pulse survey
    • 831 respondents in post-use survey.

Appendix

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.