Summary report – executive summary

Executive summary

Preface

The uptake of publicly available generative artificial intelligence (AI) tools, like ChatGPT, has grown. In the few years since its public introduction, generative artificial intelligence has become available and accessible to millions.

This meant the Australian Public Service (APS) had to respond quickly to allow its workforce to experiment with generative AI in a safe, responsible and integrated way. To make this experimentation possible, an appropriate generative AI tool needed to be selected.

This decision was dependent on:

how swiftly and seamlessly the tool could be deployed for rapid APS experimentation purposes
the ability for staff to experiment and learn using applications familiar to them.

One solution to enable the APS to experiment with safe and responsible generative AI was Microsoft 365 Copilot (formerly Copilot for Microsoft 365). On 16 November 2023, the Australian Government announced a 6-month whole-of-government trial of Copilot. Copilot is a supplementary product that integrates with the existing applications in the Microsoft 365 suite and it’s nested within existing whole-of-government contracting arrangements with Microsoft. This made it a rapid and familiar solution to deploy.

Broadly, the trial and evaluation tested the extent the wider promise of generative AI capabilities would translate into real-world adoption by workers. The results will help the Australian Government consider future opportunities and challenges related to the adoption of generative AI.

This was the first trial of a generative AI tool in the Australian Government. The future brings exciting opportunities to understand what other tools are available to explore a broad landscape of use cases.

Overarching findings

There are clear benefits to the adoption of generative AI but also challenges with adoption and concerns that need to be monitored.

Copilot use was moderate and focused on a few use cases

Use of Copilot was moderate. However most trial participants across classifications and job families were optimistic about Copilot and wished to keep using it.

Only a third of trial participants across classifications and job families used Copilot daily.
Copilot was predominantly used to summarise information and re-write content.
Copilot in Microsoft Word and Teams were viewed favourable and used most frequently.
Access barriers prevented Copilot use in Outlook.

Perceived improvements to efficiency and quality

Trial participants estimated time savings of up to an hour when summarising information, preparing a first draft of a document and searching for information.

The highest efficiency gains were perceived by APS levels 3-6, Executive Level (EL) 1 staff and ICT roles.

The majority of managers (64%) perceived uplifts in efficiency and quality in their teams.

40% of trial participants reported their ability to reallocate their time to higher-value activities such as staff engagement and strategic planning.

There is potential for Copilot to improve inclusivity and accessibility in the workplace and in government communication.

Adoption requires a concerted effort to address barriers

There are key integration, data security and information management considerations agencies must consider prior to Copilot adoption, including scalability and performance of the GPT integration and understanding of the context of the large language model.

Training in prompt engineering and use cases tailored to agency needs is required to build capability and confidence in Copilot.

Clear communication and policies are required to address uncertainty regarding the security of Copilot, accountabilities and expectation of use.

Adaptive planning is needed to reflect the rolling feature release cycle of Copilot alongside governance structures that reflect agencies’ risk appetite, and clear roles and responsibilities across government to provide advice on generative AI use. Given its infancy, agencies would need to consider the costs of implementing Copilot in its current version. More broadly this should be a consideration for other generative AI tools.

Broader concerns on AI that require active monitoring

There are broader concerns on the potential impact of generative AI on APS jobs and skills, particularly on entry-level jobs and women.

Large language model (LLM) outputs may be biased towards western norms and may not appropriately use cultural data and information.

There are broader concerns regarding vendor lock-in and competition, as well as the use of generative AI on the APS’ environmental footprint.

Recommendations

The overarching findings reveal several considerations for the APS in the context of future adoption of generative AI.

Detailed and adaptive implementation

1.1 Product selection

Agencies should consider which generative AI solution are most appropriate for their overall operating environment and specific use cases, particularly for AI Assistant Tools.

1.2 System configuration

Agencies must configure their information systems, permissions, and processes to safely accommodate generative AI products.

1.3 Specialised training

Agencies should offer specialised training reflecting agency-specific use cases and develop general generative AI capabilities, including prompt training.

1.4 Change management

Effective change management should support the integration of generative AI by identifying ‘Generative AI Champions’ to highlight the benefits and encourage adoption.

1.5 Clear guidance

The APS must provide clear guidance on using generative AI, including when consent and disclaimers are needed, such as in meeting recordings, and a clear articulation of accountabilities.

Encourage greater adoption

1.6 Workflow analysis

Agencies should conduct detailed analyses of workflows across various job families and classifications to identify further use cases that could improve generative AI adoption.

1.7 Use case sharing

Agencies should share use cases in appropriate whole-of-government forums to facilitate the adoption of generative AI across the APS.

Proactive risk management

1.8 Impact monitoring

The APS should proactively monitor the impacts of generative AI, including its effects on the workforce, to manage current and emerging risks effectively.

Evaluation objectives

The evaluation assessed the use, benefits, risks and unintended outcomes of Copilot in the APS during the trial.

The Digital Transformation Agency (DTA) designed 4 evaluation objectives, in consultation with:

the AI in Government Taskforce
the Australian Centre for Evaluation (ACE)
advisors from across the APS designed four evaluation objectives.

Employee-related outcomes

Evaluate APS staff sentiment about the use of Copilot, including:

staff satisfaction
innovation opportunities
confidence in the use of Copilot
ease of integration into workflow.

Productivity

Determine if Copilot, as an example of generative AI, benefits APS productivity in terms of:

efficiency
output quality
process improvements
agency ability to deliver on priorities.

Adoption of AI

Determine whether and to what extent Copilot, as an example of generative AI:

can be implemented in a safe and responsible way across government
poses benefits and challenges in the short and longer term
faces barriers to innovation that may require changes to how the APS delivers on its work.

Unintended consequences

Identify and understand unintended benefits, consequences, or challenges of implementing Copilot as an example of generative AI and the implications on adoption of generative AI in the APS.

Australian Government trial of Microsoft 365 Copilot