• Data table for figure 10

    Post-use survey responses to 'Do you agree with the following statement: I feel confident in my skills and abilities to use Copilot', by amount of training received (n=810).

    SentimentNot at all confidentNot very confidentModerately confidentFairly confidentVery confident
    Overall (n=810)2%13%35%35%14%
    One form of training (n=276)4%23%35%28%10%
    Two forms of training (n=282)1%13%40%34%12%
    Three or more forms of training (n=252)0%4%31%44%21%
    Off
  • The general sentiment among focus group participants who were involved in their agency’s implementation (namely chief technology officers (CTOs)/chief information officers (CIOs) and Copilot Champions) was that training requirements were greater than anticipated. 

    Complicating this is the diverse digital literacy and maturity of staff. Some agencies were better positioned to manage this than others but the positive relationship between perceived capability and amount of training suggests that a concerted, material and ongoing effort is needed to build confidence. A one-off session is unlikely to have a lasting impact on a user’s skills and abilities.

    Trial participants broadly found Copilot training useful but noted areas for improvement.

    The majority of post-use survey respondents (76%) who attended either agency or Microsoft-led training found the sessions useful. Anecdotal evidence from focus groups, however, suggested that more could be done to personalise training, particularly the training delivered by Microsoft.

    Focus group participants believed that Microsoft training was too focused on the features of Copilot, rather than its applications and use cases. Participants also noted that some Microsoft trainers did not understand the APS context and could not answer targeted questions. 

    To supplement Microsoft-led sessions, almost all agencies that participated in the evaluation offered some form of training. The quality and exhaustiveness of this training, however, varied according to the time and resource constraints of agencies. Some focus group participants had dedicated resources to lead the training effort, while others were encouraged to learn Copilot through hands-on use.

    Training was most effective when tailored to APS and agency context.

    One focus group participant found Microsoft’s industry specific advice and prompt library a useful aid to upskilling, others expressed a similar desire for cheat sheets with tailored prompts aligned to their roles. Several focus group participants remarked that they gained the most knowledge on impactful use cases through forums their agency created, such as ‘lunch and learns’, webinars, ‘promptathons’ or similar.

    A more flexible, community of practice approach was also seen as an effective training method as it provided a means to identify and propagate highly relevant use cases for Copilot. In general, there appears to be a strong demand for training even amongst trial participants who had a high proportion of individuals who were already experienced in generative AI. This included a desire for a wide range of training and information sources supplemented by opportunities to share use cases and broad skills in generative AI.

    There are opportunities to further explore use cases in the APS

    There were a few novel use cases for Copilot in the APS.

    Some trial participants identified a few novel use cases which were highly specific to the roles of participants but highlight the potential of Copilot to support higher order and more bespoke activities. These included:

    • Writing and reviewing PowerShell script (for task automation)
    • Assessing documents against a rubric or criteria
    • Converting technical documentation into plain language (to distribute to a broader audience)
    • Drafting content for internal exercises e.g. phishing simulations
    • Drafting content for business cases and Cabinet Submissions
    • Converting information into standard forms and templates (for processing and assessment).

    The presence of novel use cases highlights that there are opportunities to innovatively use Copilot beyond its summarisation, information search and content drafting features.

    References

    1. Commonwealth Scientific and Industrial Research Organisation (2024)Copilot for Microsoft 365; Data and Insights’, Commonwealth Scientific and Industrial Research Organisation, Canberra, ACT, 28.
    2. Department of Industry, Science and Resources (2024) ‘DISR Internal Mid-Trial Survey Insights’, Department of Industry, Science and Resources, Canberra, ACT, 2.
  • The main use cases are an illustration of the current perceived strengths of Copilot’s predictive engine - it is adept at natural language processing and synthesis, where it can output human-like text in response to provided information and prompts. 

    Of note, the use of Copilot within Whiteboard and Loop were particularly low. This is not unsurprising when compared with the usage trends before the trial. It is also interesting to note the relatively lower use of Copilot to create content and ideas relative to summarisation of information. 

    There was a positive relationship between the provision of training and capability to use Copilot.

    There was no standard approach undertaken to train trial participants on how to use Copilot. Participants adopted a combination of methods based on their perceived capability and resources provided to them. The 4 main training options available to trial participants were:

    • accessing Copilot resources on the Internet
    • hands-on experimentation with Copilot
    • attending agency-facilitated Copilot training
    • attending Microsoft-led Copilot training.

    Almost half of all post-use survey respondents felt fairly or very confident in their skills and abilities to use Copilot. As depicted in Figure 10, the proportion of ‘fairly’ and ‘very confident’ responses combined is 16 percentage points higher when post-use survey respondents accessed 3 or more forms of training compared to overall. This indicates a positive correlation between the amount of training participants received and their ability to use Copilot. The importance of training was also highlighted in the CSIRO’s evaluation which reported that trial participants needed additional training and/or resources that would support advanced features and usage (CSIRO 2024:11).

    A graph demonstrating that participants’ confidence using Copilot increased significantly if they received a higher amount of training.
    Figure 10 | Post-use survey responses to 'Do you agree with the following statement: I feel confident in my skills and abilities to use Copilot', by amount of training received (n=810)
  • This section outlines the impact of Microsoft 365 Copilot on trial participants’ perceived productivity in terms of both efficiency and quality.

  • Key insights

    The majority of survey respondents (69% of post-use survey respondents) agreed that Copilot improved the speed at which they could complete tasks and uplifted the quality of their work (61% of post-use survey respondents).

    Approximately 65% of the managers in the post-use survey found that Copilot had a positive impact on the quality and efficiency of their team members in particular in assisting team members to quickly produce briefing materials and uplifting the quality of written outputs.

    Copilot contributed the most perceived time savings in tasks related to summarisation, information searches and preparing first drafts with an estimated time savings of around an hour a day for those tasks.

    The ICT and Digital Solution job family perceived the most efficiency gains of around one hour a day across summarisation activities and preparing first drafts of documents. Across APS classifications, APS3-6 and EL1s perceived the similar time savings of an hour a day in summarisation tasks and creating first drafts.

    40% of trial participants reported being able to reallocate their time to higher value activities. Trial participants also remarked on the ability to spend more time on face-to-face activities such as staff engagement, culture building and mentoring, and taking more time to build relationships with end users and stakeholders.

    The quality of Copilot’s output limited the scale of productivity benefits. Overall, Copilot’s improvements to work quality were more subdued than improvements to work efficiency. While the majority of trial participants viewed that Copilot was effective at developing first drafts of documents and lifting overall quality, editing was almost always needed to tailor content for the audience or context thereby reducing total efficiency gains. Copilot is perceived to improve the efficiency and quality of outputs.

    Copilot is perceived to improve the efficiency and quality of outputs

    The majority of post-use survey respondents perceived that Copilot positively affected their productivity.

    Trial participants generally perceived that Copilot had a positive impact on 2 key measures of productivity – efficiency and quality. As shown in Figure 11, the majority of post-use survey respondents agreed that Copilot improved the speed at which they could complete tasks (69%) and uplifted the quality of their work (61%).

    A graph showing that participants were slightly more likely to report Copilot improved the speed at which they complete tasks than it improved the quality of their work.
    Figure 11 | Post-use responses to 'What extent do you agree with the following statements: using Copilot has improved the…', from respondents who completed both pre- and post-use surveys (n=330)
  • Data table for figure 11

    Post-use responses to 'What extent do you agree with the following statements: using Copilot has improved the…', from respondents who completed both pre- and post-use surveys (n=330).

    SentimentStrongly disagreeDisagreeSomewhat disagreeNeutralSomewhat agreeAgreeStrongly agree
    speed at which I complete tasks4%5%7%16%26%26%17%
    quality of my work4%6%7%22%26%24%11%

    Totals may amount to less or more than 100% due to rounding.

    Off
  • Managers have also noticed productivity improvements within their teams.

    Approximately 65% of the managers in the post-use survey found that Copilot had a positive impact on the quality and efficiency of their team members. As shown in Figure 12, less than 3% of this cohort believed Copilot had a negative effect on their team. 

    A graph showing managers held a similarly positive view of how Copilot impacted the quality of work and efficiency of their staff.
    Figure 12 | Post-use survey responses to 'What is the impact of Copilot on…', from respondents who manage staff (n=209)
  • Data for figure 12

    Post-use survey responses to 'What is the impact of Copilot on…', from respondents who manage staff (n=209).

    SentimentNegativeSomewhat negativeNeutralSomewhat positivePositive
    quality of your team's output (n=209)0%3%32%47%17%
    efficiency of your staff (n=208)0%2%31%49%17%

    Totals may amount to less or more than 100% due to rounding.

    Off
  • Manager respondents in the post-use survey indicated that Copilot helped team members to quickly produce briefing materials and added value to written deliverables. Some managers in focus groups thought that Copilot made writing more consistent across their teams and lifted the overall standard of work.

    Efficiencies are concentrated in a few tasks

    Copilot contributed the highest perceived time savings in tasks related to summarisation, preparing first drafts and information searches.

    Post-use survey respondents perceived Copilot contributed the highest time savings in activities related to information summarisation preparing first drafts and information searches. Respondents estimated that Copilot saved up to an hour a day in these activities, shown in Table 4. These figures are approximations and likely quote the upper bound of time savings Copilot could contribute (assuming APS employees perform the tasks every day).

    Table 4. Averaged post-use survey responses to 'On average, how many hours per day has Copilot helped you save in the following areas' from respondents who completed both pre- and post-use surveys (n=330)
    ActivityHours
    Communicating through digital means other than meetings0.5
    Summarising existing information1.1
    Preparing first draft of a document1.0
    Searching for information required for a task0.8
    Undertaking preliminary data analysis0.5
    Undertaking preliminary data analysis0.6
    Preparing meeting minutes0.9

     

    Table notes: 

    • Hours saved on tasks was approximated by first calculating the mean of the time brackets specified in the question (e.g. 0, 1-4, 5-8, 9-12…).
    • The average time was then multiplied by the number of respondents (for each bracket) to determine total time on the activity. The total time is then divided by the number of respondents to estimate average time per respondent.

    Productivity benefits were concentrated in a narrow set of tasks that are commonly undertaken by APS staff

    In activities where Copilot was perceived to save a significant proportion of time – preparing meeting minutes, summarising information and preparing slides – AI assistants, in the future, could become the primary means to significantly reduce the effort to complete these tasks, but there still remains the need for human involvement and accountability. 

    The time savings associated with these activities were also observed in agency evaluations. The Australian Tax Office (ATO) saw the greatest proportional efficiencies in these activities (Australian Taxation Office 2024:3) and Home Affairs Copilot trial participants observed that Copilot may provide time savings in scribing, minute-taking, writing up action items and transcribing (Department of Home Affairs 2024:10). For other tasks such as ‘summarising existing information’ and ‘preparing first draft of a document,’ Copilot was perceived to reduce the time spent on these tasks by between 50-70%.

    Finally, there is an interesting intersection between time saved and usage. For example, PowerPoint was not frequently used by trial participants but it saved a significant proportion of time. The ATO identified a similar insight in their evaluation as the highest absolute time savings were in data visualisation, taking nearly an hour off the activity (Australian Taxation Office 2024:3). This implies that for those who do use a broad range of MS products and Copilot functionality, the potential time savings from applications such as PowerPoint could be significant. 

    Copilot’s impact on efficiency varied according to job requirements. 

    The ICT and Digital Solution job family experienced the most efficiency gains.

    Across all the activities provided in the post-use survey, the ICT and Digital Solution job family group estimated the highest efficiency savings across all activities. As shown in Table 5, the ICT and digital solutions job family reported an efficiency saving of around an hour a day when performing summarisation and document drafting activities.

    Table 5. Averaged post-use survey responses to 'On average, how many hours per day has Copilot helped you save in the following areas', by APS job family
     AverageCorporateICT and digital solutionsPolicy and program managementTechnical
    Searching for information required for a task (n=718)0.760.70.850.680.86
    Summarising existing information
    (n=735)
    1.0311.060.991.08
    Preparing meeting minutes
    (n=608)
    0.940.821.060.911
    Preparing first draft of a document
    (n=715)
    0.990.941.120.960.96
    Undertaking preliminary data analysis (n=586)0.590.670.690.570.43
    Preparing slides
    (n=605)
    0.590.550.640.550.63
    Communicating through digital means other than meetings (n=680)0.490.450.540.510.46
    Attending meetings
    (n=713)
    0.370.330.480.410.26
    Writing or reviewing code in a programming language (n=393)0.50.480.580.30.6

     

    Table 6. Averaged post-use survey responses to 'On average, how many hours per day has Copilot helped you save in the following areas', by APS classification
     AverageAPS 3-6EL 1EL 2SES
    Searching for information required for a task (n=690)0.730.830.840.630.61
    Summarising existing information for various purposes (n=708)0.991.061.070.970.86
    Preparing meeting minutes (n=582)0.950.990.970.890.95
    Preparing first draft of a document (n=687)0.931.11.090.760.78
    Undertaking preliminary data analysis (n=561)0.570.640.670.450.52
    Preparing slides (n=575)0.610.660.60.510.68
    Communicating through digital means other than meetings (n=651)0.480.560.540.390.44
    Attending meetings (n=682)0.370.380.480.280.35
    Writing or reviewing code in a programming language (n=370)0.400.680.570.210.12

    Within agencies, APS 3 to 4 (usually graduates) are usually expected to lead notetaking and summarisation tasks as well as create the first draft of document. APS staff in more junior levels may not yet possess the capability to complete these tasks efficiently. It is likely that Copilot positively augments their ability to a greater extent than more experienced employees.

    Around 40of trial participants reported the ability to reallocate their time to higher value activities.

    For some trial participants, Copilot was seen as a facilitator for engagement in more substantive and complex work. As shown in Figure 13, 41of post-use survey respondents believed Copilot enabled them to spend more time on higher-value tasks.

    A graph showing that most participants either did not observe or did not believe Copilot allowed them to spend more time on higher-value or more complex tasks.
    Figure 13 | Post-use survey responses to 'What extent do you agree with the following statement: Copilot has enabled me to allocate my time to perform tasks that are higher value and/or more complex' (n=807)
  • Data for figure 13

    Post-use survey responses to 'What extent do you agree with the following statement: Copilot has enabled me to allocate my time to perform tasks that are higher value and/or more complex' (n=807).

    SentimentStrongly disagreeDisagreeNeutralAgreeStrongly agree
    Response4%10%44%32%9%

    Totals may amount to less or more than 100% due to rounding.

    Off
  • Post-use survey respondents remarked they felt they spent less time playing ‘corporate archaeologist’ in searching for information and documents and more time in strategic thinking and deep analysis.

  • Data for figure 14

    Post-use survey responses reporting time savings of 0.5 hours or more (n=795) and overall agreement of improved quality of work (n=801), by type of activity.

    ResponseImproved qualitySome time saved
    Summarising existing information for various purposes69%76%
    Preparing the first draft of a document58%67%
    Preparing meeting minutes60%68%
    Searching for information required for a task54%62%
    Undertaking preliminary data analysis32%44%
    Preparing slides35%40%
    Communicating through digital means other than meetings31%35%
    Writing or reviewing code in a programming language30%30%
    Off
  • A concern voiced by many focus group and post-use survey participants was that Copilot could not emulate the standard style of Australian Government documents. Some of these participants highlighted that heavy re-work was needed to meet the tone expected by senior stakeholders within their agency and of government more broadly.

    For this reason, focus group participants noted they would not use Copilot for important documents or communications. Some trial participants acknowledged that Copilot could get closer to the desired output through follow-up prompts and clarifications, but this was not viewed as being worth the additional effort.

    Copilot’s unpredictability and inaccuracy limited the scale of productivity benefits.

    The unpredictability of Copilot affected the trust of trial participants and their productivity gains. Generative AI is a non-deterministic form of AI, meaning it will almost always produce a different output even if given the same exact prompt. Copilot is trained to predict patterns rather than understand facts, sometimes leading to it returning plausible sounding but inaccurate information, which is referred to as a ‘hallucination’.

    Many trial participants across focus groups and the post-use survey commented that due to fears of hallucinations, they combed through Copilot’s outputs to verify its accuracy. In some cases, this involved reading the entire document Copilot produced to check for any errors which significantly reduced any efficiency gains. 

    As shown in Table 4, up to 7% of post-use survey respondents reported that Copilot added time to tasks in part due to the effort required to verify outputs. Distrust of Copilot’s outputs also surfaced in DISR’s internal mid-trial survey insights, with 60% of trial participants claiming they had to make a moderate to significant number of edits to outputs (see Department of Industry, Science and Resources, 2024:6).
     

    Table 7. Post-use survey responses reporting that Copilot added time to activity, by type
    ResponseAdded time
    Preparing slides (n=620)7%
    Undertaking preliminary data analysis (n=603)6%
    Writing or reviewing code in a programming language (n=620)6%
    Attending meetings (n=739)4%
    Summarising existing information for various purposes (n=759)3%
    Preparing the first draft of a document (n=739)3%
    Searching for information required for a task (n=744)3%
    Communicating through digital means other than meetings (n=705)3%
  • This section outlines findings in relation to the challenges experienced by the Australian Public Service (APS) in adopting Microsoft 365 Copilot and its broader lessons that are applicable to the APS’ future adoption of broader generative AI.

  • Key insights

    There are currently challenges with the integration of Copilot with products outside of the Microsoft Office suite, limiting its potential benefits to agencies that use non-Microsoft products.

    Poor data security and information management processes could lead to Copilot inappropriately accessing sensitive information.

    Agencies should also note that that some Copilot functionality – in particular for Outlook – requires the newest versions of Microsoft Office products.

    Tailored training in prompt engineering and agency/role-specific use cases were needed to build capability in Copilot. A range of methods were used to upskill staff, including formal training sessions and informal forums. Managers also require specific training to help verify Copilot outputs.

    There are cultural barriers that may be impeding the uptake of Copilot ranging from the perceived negative stigma of using generative AI to a lack of trust with generative AI products.

    Further clarity on personal accountabilities for Copilot outputs alongside greater guidance on the extent consent and disclaimers are needed for generative AI use are required to improve adoption.

    Given the evergreen nature of generative AI, there is a need for agencies to engage in adaptive planning while setting up appropriate governance structures and processes that reflect their risk appetites.

    There are key integration, data security and information management considerations

    Copilot may require plugins or other connectors to ensure seamless integration across an organisations’ technology stack.  

  • … their labelling of classified or sensitive information does not work with Copilot as they use the third party Janusseal.

    Agency representative in DTA interview.
  • Copilot is available through applications within the Microsoft 365 ecosystem. However, to access data or applications that sit outside this ecosystem, organisations need to leverage plugins or Microsoft Graph connectors to create extensibility

    A small number of pre-use survey respondents and DTA interview participants raised an issue of the lack of Copilot integration with third-party software in particular with Janusseal, a software that enables enterprise-grade data classification for Windows users,  and JAWS, a computer screen reader program that allows blind and visually impaired users to read the screen either with a text-to-speech output or with a refreshable Braille display. Issues with JAWS integration comprised 16% of total issues recorded in the issues register. 

    The lack of integration with Janusseal creates a potential limit to the usefulness of Copilot for APS staff regularly interacting with sensitive information in organisations where data classification is managed through third-party providers such as Janusseal. Interviews conducted by the DTA noted a lack of integration with Janusseal could lead to APS staff gaining access to information they did not have permissions for. Microsoft has advised that this is a third-party labelling issue, not a security issue, and that Copilot has an in-built fail safe to protect against this issue. It should be noted that such integrations were out of scope for the trial and Microsoft has further advised that a more permanent fix to the labelling issue is in the pipeline (Microsoft 2024).

    The newest versions of Microsoft Office products were required to enable some Copilot functionality

    Copilot is available in Outlook, enabling users to more efficiently manage their inboxes. Newer features of Copilot were initially released with the new version of Outlook, rather than classic Outlook

    The integration of Copilot with Microsoft Outlook was frequently raised as a key issue as Copilot features in Outlook were only available with the newest version of Microsoft Outlook or the web version of Copilot. Microsoft initially planned to release Copilot updates to the new version of Outlook and later for classic Outlook. Focus group participants often lamented not being able to access the full capabilities of Copilot as they did not have access to the new Outlook. One trial participant noted through the issues register that, ‘classic Outlook will only support the bare minimum Copilot features’. 

    Focus group participants also reported that the online versions of Microsoft Office apps had a poorer user experience than the desktop applications, which dissuaded them from using Outlook online. For agencies without the newest version of Microsoft Outlook, the overall potential benefits of Copilot would likely be significantly reduced and restricted to other use cases such as the summarisation and drafting use cases in Microsoft Word and Teams.

    Poor information, data management practices and permissions resulted in inappropriate access and sharing of sensitive information.

    Agencies classify their data and apply permissions to ensure access is limited to authorised personnel and that staff understand a document’s security levels.

  • Their information management in SharePoint is not great which has resulted in end users finding information that they shouldn’t have had access to, though this is a governance and data management issue - not a Copilot issue.

    Agency representative in DTA interview.
  • Use of Copilot enabled some participants to access documents that they should not have had permission to access. Trial participants raised instances where Copilot surfaced sensitive data that staff had not classified or stored appropriately. This was largely because their organisation had not properly assured the security and storage of some instances of data and information before adopting Copilot. Without the appropriate data infrastructure and governance in place, the use of Copilot may further exacerbate risks of data and security breaches in the APS.

    Tailored training in prompt engineering and use cases is needed to build capability and confidence   

    Prompt engineering and understanding the information requirements of Copilot across Microsoft Office products were significant capability barriers for trial participants.

    There are 2 key skills required for users to realise the benefits of Copilot: writing effective prompts and understanding the different information structures that Copilot needs across different Microsoft products.

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.