Productivity
Key insights
The majority of survey respondents (69% of post-use survey respondents) agreed that Copilot improved the speed at which they could complete tasks and uplifted the quality of their work (61% of post-use survey respondents).
Approximately 65% of the managers in the post-use survey found that Copilot had a positive impact on the quality and efficiency of their team members in particular in assisting team members to quickly produce briefing materials and uplifting the quality of written outputs.
Copilot contributed the most perceived time savings in tasks related to summarisation, information searches and preparing first drafts with an estimated time savings of around an hour a day for those tasks.
The ICT and Digital Solution job family perceived the most efficiency gains of around one hour a day across summarisation activities and preparing first drafts of documents. Across APS classifications, APS3-6 and EL1s perceived the similar time savings of an hour a day in summarisation tasks and creating first drafts.
40% of trial participants reported being able to reallocate their time to higher value activities. Trial participants also remarked on the ability to spend more time on face-to-face activities such as staff engagement, culture building and mentoring, and taking more time to build relationships with end users and stakeholders.
The quality of Copilot’s output limited the scale of productivity benefits. Overall, Copilot’s improvements to work quality were more subdued than improvements to work efficiency. While the majority of trial participants viewed that Copilot was effective at developing first drafts of documents and lifting overall quality, editing was almost always needed to tailor content for the audience or context thereby reducing total efficiency gains. Copilot is perceived to improve the efficiency and quality of outputs.
Copilot is perceived to improve the efficiency and quality of outputs
The majority of post-use survey respondents perceived that Copilot positively affected their productivity.
Trial participants generally perceived that Copilot had a positive impact on 2 key measures of productivity – efficiency and quality. As shown in Figure 11, the majority of post-use survey respondents agreed that Copilot improved the speed at which they could complete tasks (69%) and uplifted the quality of their work (61%).
Managers have also noticed productivity improvements within their teams.
Approximately 65% of the managers in the post-use survey found that Copilot had a positive impact on the quality and efficiency of their team members. As shown in Figure 12, less than 3% of this cohort believed Copilot had a negative effect on their team.
Manager respondents in the post-use survey indicated that Copilot helped team members to quickly produce briefing materials and added value to written deliverables. Some managers in focus groups thought that Copilot made writing more consistent across their teams and lifted the overall standard of work.
Efficiencies are concentrated in a few tasks
Copilot contributed the highest perceived time savings in tasks related to summarisation, preparing first drafts and information searches.
Post-use survey respondents perceived Copilot contributed the highest time savings in activities related to information summarisation preparing first drafts and information searches. Respondents estimated that Copilot saved up to an hour a day in these activities, shown in Table 4. These figures are approximations and likely quote the upper bound of time savings Copilot could contribute (assuming APS employees perform the tasks every day).
Activity | Hours |
---|---|
Communicating through digital means other than meetings | 0.5 |
Summarising existing information | 1.1 |
Preparing first draft of a document | 1.0 |
Searching for information required for a task | 0.8 |
Undertaking preliminary data analysis | 0.5 |
Undertaking preliminary data analysis | 0.6 |
Preparing meeting minutes | 0.9 |
Table notes:
- Hours saved on tasks was approximated by first calculating the mean of the time brackets specified in the question (e.g. 0, 1-4, 5-8, 9-12…).
- The average time was then multiplied by the number of respondents (for each bracket) to determine total time on the activity. The total time is then divided by the number of respondents to estimate average time per respondent.
Productivity benefits were concentrated in a narrow set of tasks that are commonly undertaken by APS staff
In activities where Copilot was perceived to save a significant proportion of time – preparing meeting minutes, summarising information and preparing slides – AI assistants, in the future, could become the primary means to significantly reduce the effort to complete these tasks, but there still remains the need for human involvement and accountability.
The time savings associated with these activities were also observed in agency evaluations. The Australian Tax Office (ATO) saw the greatest proportional efficiencies in these activities (Australian Taxation Office 2024:3) and Home Affairs Copilot trial participants observed that Copilot may provide time savings in scribing, minute-taking, writing up action items and transcribing (Department of Home Affairs 2024:10). For other tasks such as ‘summarising existing information’ and ‘preparing first draft of a document,’ Copilot was perceived to reduce the time spent on these tasks by between 50-70%.
Finally, there is an interesting intersection between time saved and usage. For example, PowerPoint was not frequently used by trial participants but it saved a significant proportion of time. The ATO identified a similar insight in their evaluation as the highest absolute time savings were in data visualisation, taking nearly an hour off the activity (Australian Taxation Office 2024:3). This implies that for those who do use a broad range of MS products and Copilot functionality, the potential time savings from applications such as PowerPoint could be significant.
Copilot’s impact on efficiency varied according to job requirements.
The ICT and Digital Solution job family experienced the most efficiency gains.
Across all the activities provided in the post-use survey, the ICT and Digital Solution job family group estimated the highest efficiency savings across all activities. As shown in Table 5, the ICT and digital solutions job family reported an efficiency saving of around an hour a day when performing summarisation and document drafting activities.
Average | Corporate | ICT and digital solutions | Policy and program management | Technical | |
---|---|---|---|---|---|
Searching for information required for a task (n=718) | 0.76 | 0.7 | 0.85 | 0.68 | 0.86 |
Summarising existing information (n=735) | 1.03 | 1 | 1.06 | 0.99 | 1.08 |
Preparing meeting minutes (n=608) | 0.94 | 0.82 | 1.06 | 0.91 | 1 |
Preparing first draft of a document (n=715) | 0.99 | 0.94 | 1.12 | 0.96 | 0.96 |
Undertaking preliminary data analysis (n=586) | 0.59 | 0.67 | 0.69 | 0.57 | 0.43 |
Preparing slides (n=605) | 0.59 | 0.55 | 0.64 | 0.55 | 0.63 |
Communicating through digital means other than meetings (n=680) | 0.49 | 0.45 | 0.54 | 0.51 | 0.46 |
Attending meetings (n=713) | 0.37 | 0.33 | 0.48 | 0.41 | 0.26 |
Writing or reviewing code in a programming language (n=393) | 0.5 | 0.48 | 0.58 | 0.3 | 0.6 |
Average | APS 3-6 | EL 1 | EL 2 | SES | |
---|---|---|---|---|---|
Searching for information required for a task (n=690) | 0.73 | 0.83 | 0.84 | 0.63 | 0.61 |
Summarising existing information for various purposes (n=708) | 0.99 | 1.06 | 1.07 | 0.97 | 0.86 |
Preparing meeting minutes (n=582) | 0.95 | 0.99 | 0.97 | 0.89 | 0.95 |
Preparing first draft of a document (n=687) | 0.93 | 1.1 | 1.09 | 0.76 | 0.78 |
Undertaking preliminary data analysis (n=561) | 0.57 | 0.64 | 0.67 | 0.45 | 0.52 |
Preparing slides (n=575) | 0.61 | 0.66 | 0.6 | 0.51 | 0.68 |
Communicating through digital means other than meetings (n=651) | 0.48 | 0.56 | 0.54 | 0.39 | 0.44 |
Attending meetings (n=682) | 0.37 | 0.38 | 0.48 | 0.28 | 0.35 |
Writing or reviewing code in a programming language (n=370) | 0.40 | 0.68 | 0.57 | 0.21 | 0.12 |
Within agencies, APS 3 to 4 (usually graduates) are usually expected to lead notetaking and summarisation tasks as well as create the first draft of document. APS staff in more junior levels may not yet possess the capability to complete these tasks efficiently. It is likely that Copilot positively augments their ability to a greater extent than more experienced employees.
Around 40% of trial participants reported the ability to reallocate their time to higher value activities.
For some trial participants, Copilot was seen as a facilitator for engagement in more substantive and complex work. As shown in Figure 13, 41% of post-use survey respondents believed Copilot enabled them to spend more time on higher-value tasks.
Post-use survey respondents remarked they felt they spent less time playing ‘corporate archaeologist’ in searching for information and documents and more time in strategic thinking and deep analysis.
Trial participants also remarked on the ability to spend more time on face-to-face activities such as staff engagement, culture building, mentoring and taking more time to build relationships with end users and stakeholders. Acknowledging the human dependent nature of these tasks, these respondents redistributed their time into face-to-face activities such as communications to support their team and/or customers.
The quality of Copilot output limited the scale of productivity benefits.
Contextual irrelevance impacted the quality of Copilot outputs.
Overall, Copilot’s improvements to work quality were more subdued than improvements to work efficiency. As highlighted in Figure 14, while the majority of trial participants viewed that Copilot was effective at developing first drafts of documents and lifting overall quality, editing was almost always needed to tailor content for the audience or context thereby reducing total efficiency gains.
Figure 14 | Post-use survey responses reporting time savings of 0.5 hours or more (n=795) and overall agreement of improved quality of work (n=801), by type of activity
A concern voiced by many focus group and post-use survey participants was that Copilot could not emulate the standard style of Australian Government documents. Some of these participants highlighted that heavy re-work was needed to meet the tone expected by senior stakeholders within their agency and of government more broadly.
For this reason, focus group participants noted they would not use Copilot for important documents or communications. Some trial participants acknowledged that Copilot could get closer to the desired output through follow-up prompts and clarifications, but this was not viewed as being worth the additional effort.
Copilot’s unpredictability and inaccuracy limited the scale of productivity benefits.
The unpredictability of Copilot affected the trust of trial participants and their productivity gains. Generative AI is a non-deterministic form of AI, meaning it will almost always produce a different output even if given the same exact prompt. Copilot is trained to predict patterns rather than understand facts, sometimes leading to it returning plausible sounding but inaccurate information, which is referred to as a ‘hallucination’.
Many trial participants across focus groups and the post-use survey commented that due to fears of hallucinations, they combed through Copilot’s outputs to verify its accuracy. In some cases, this involved reading the entire document Copilot produced to check for any errors which significantly reduced any efficiency gains.
As shown in Table 4, up to 7% of post-use survey respondents reported that Copilot added time to tasks in part due to the effort required to verify outputs. Distrust of Copilot’s outputs also surfaced in DISR’s internal mid-trial survey insights, with 60% of trial participants claiming they had to make a moderate to significant number of edits to outputs (see Department of Industry, Science and Resources, 2024:6).
Response | Added time |
---|---|
Preparing slides (n=620) | 7% |
Undertaking preliminary data analysis (n=603) | 6% |
Writing or reviewing code in a programming language (n=620) | 6% |
Attending meetings (n=739) | 4% |
Summarising existing information for various purposes (n=759) | 3% |
Preparing the first draft of a document (n=739) | 3% |
Searching for information required for a task (n=744) | 3% |
Communicating through digital means other than meetings (n=705) | 3% |
Most focus group participants reported they experienced inaccuracies or hallucinations in Copilot outputs during the trial. These inaccuracies typically materialised in the form of believable but ill-informed statements.
Focus group participants viewed that the search functionality through the Teams chat would often retrieve outdated or irrelevant documents. They also noted that Copilot could not prioritise retrieval of documents developed by their team or department, often surfacing documents from across the entire organisation.
Focus group participants also noted that while Copilot attaches sources to its outputs, this is currently limited to 3 documents and does not provide visibility on why the documents were selected. Observations from Home Affairs also identified that Copilot appeared to be unreliable in its approach to referencing information provided against data sources (Department of Home Affair 2024:10).
The potential inaccuracy of Copilot represents a large reputational risk to the APS with thorough quality assurance processes needed to mitigate risks of inaccurate information in Copilot outputs. However, it is likely the additional need of quality assurance processes will reduce the productivity benefits of Copilot.
References
- Australian Taxation Office (2024) ‘M365 Copilot Trial Update’, Australian Taxation Office, Canberra, ACT, 3.
- Department of Home Affairs (2024) ‘Copilot Hackathon’, Department of Home Affairs, Canberra, ACT, 10.
- Department of Industry, Science and Resources (2024) ‘DISR Internal Mid-trial Survey Insights’, Department of Industry, Science and Resources, Canberra, ACT, 6.