• There are perceived improvements to efficiency and quality for summarisation, preparing a first draft of a document and information searches.

  • From the trial, Copilot users from both this evaluation and agency-specific evaluations consistently reported quality and efficiency improvements in 3 key activities: summarisation of content, creating first drafts and information searches.

    Trial participants estimated efficiency gains of around an hour when completing one of these 3 activities. Trial participants in junior levels (APS3-6), EL1s and information and communications technology (ICT) -related roles perceived the most efficiencies in these activities. In addition, 40% of post-use survey respondents reported they were able to reallocate their time to higher value activities such as staff engagement, culture building and mentoring, and building relationships with end users and stakeholders. 

    Overall, trial participants across all job classifications and job families were satisfied with Copilot and the majority wish to continue using it.

  • Limitations

    Evaluation fatigue may have reduced trial participants’ engagement with the evaluation.

    During the trial period, agencies and individual trial participants were involved in a variety of research activities managed internally by their own agencies as well as those driven centrally by the DTA. Research fatigue was a key challenge that influenced participation rates across focus groups, interviews and the post-use survey. Lower response rates in the post-use survey (n = 831) and for those who completed both the pre-use and post-use survey (n = 330) may impact how representative the data is of the trial population. However, the total number of responses means we were still able to effectively test changes in proportions before and after at the 5% level of significance. Where possible, the evaluation has drawn on insights from agency-specific evaluations to complement the evaluation findings.

    This means that final evaluation research activities may not have captured the full spectrum of experiences and perspectives.

    The non-randomised sample of trial participants may not reflect the views of the broader APS.

    When comparing the proportion of trial participants with the population of the broader APS, there is an overrepresentation of Executive Level (EL)1s, EL2s and Senior Executive Services (SES) participants. In addition to this, there was a lower representation of junior APS classifications (APS1 to 4). 

    Trial participants voluntarily chose to take part in the trial, which may have led to a selection bias. While there were efforts made during the trial to invite participants from a range of backgrounds and experience with generative AI, there was a high proportion of trial participants who contributed to the evaluation who have previous experience with generative AI (66%) and are generally optimistic about Copilot (73%).

    This means that results identified through this evaluation may not be fully representative of the views held by the entire APS. 

    There was an inconsistent rollout of Copilot across agencies.

    The experience and sentiment of trial participants may be affected by when their agency began participating in the trial and their agency’s version of Copilot. Agencies received their Copilot licences between 1 January and 1 April 2024. Agencies that joined the trial later may not have been able to contribute to early evaluation activities, such as the pre-use survey or initial interviews, therefore excluding their perspective and preventing later comparison of outcomes.

    Since the trial began, Microsoft has released 60 updates to Copilot to enable new features – including rectifying early technical glitches. Due to either information security requirements or a misalignment between agency update schedules, the new features of Copilot may have been inconsistently adopted across participating agencies or at times, not at all. 

    This means that there could be significant variation in Copilot’s functionality across agencies, and ability for agencies to build capability and identify use cases for Copilot.

    The impact of Copilot relies on trial participants’ self-assessment of productivity benefits.

    The evaluation methodology relied on trial participants self-assessing the impacts of Copilot, which may naturally under or overestimate the benefits – particularly time savings. Where possible, the evaluation compared its productivity findings against other APS agency evaluations and external research to verify the productivity savings put forth by trial participants. 

    Nevertheless, there is a risk that the impact of Copilot – in particular the productivity estimates from Copilot use – may not accurately reflect Copilot’s actual productivity impacts.

    A comprehensive overview of the evaluation’s methodology and limitations is detailed in Appendix B.

    Off
  • There’s a concern of vendor lock-in as the APS becomes more dependent on this tool.

    Focus group participant
  • It’s difficult to account for a bias that you are yet to identify.

    Focus group participant
  • Copilot could cause myself and colleagues to lack deep knowledge of topics.

    Pre-use survey respondent
  • The overarching findings reveal several considerations for the APS in the context of future adoption of generative AI.
    Detailed and adaptive implementation

    1.1.1.1    Product selection

    Agencies should consider which generative AI solution are most appropriate for their overall operating environment and specific use cases, particularly for AI Assistant Tools. 

    1.1.1.2    System configuration

    Agencies must configure their information systems, permissions, and processes to safely accommodate generative AI products.

    1.1.1.3    Specialised training

    Agencies should offer specialised training reflecting agency-specific use cases and develop general generative AI capabilities, including prompt training.

    1.1.1.4    Change management

    Effective change management should support the integration of generative AI by identifying ‘Generative AI Champions’ to highlight the benefits and encourage adoption.

    1.1.1.5    Clear guidance

    The APS must provide clear guidance on using generative AI, including when consent and disclaimers are needed, such as in meeting recordings, and a clear articulation of accountabilities.

    Encourage greater adoption

    1.1.1.6    Workflow analysis

    Agencies should conduct detailed analyses of workflows across various job families and classifications to identify further use cases that could improve generative AI adoption.

    1.1.1.7    Use case sharing

    Agencies should share use cases in appropriate whole-of-government forums to facilitate the adoption of generative AI across the APS.
    Proactive risk management

    1.1.1.8    Impact monitoring

    The APS should proactively monitor the impacts of generative AI, including its effects on the workforce, to manage current and emerging risks effectively.

  • Recommendations

  • Detailed and adaptive implementation

    1. Product selection

    Agencies should consider which generative AI solution are most appropriate for their overall operating environment and specific use cases, particularly for AI Assistant Tools. 

    2. System configuration

    Agencies must configure their information systems, permissions, and processes to safely accommodate generative AI products.

    3. Specialised training

    Agencies should offer specialised training reflecting agency-specific use cases and develop general generative AI capabilities, including prompt training.

    4. Change management

    Effective change management should support the integration of generative AI by identifying ‘Generative AI Champions’ to highlight the benefits and encourage adoption.

    5. Clear guidance

    The APS must provide clear guidance on using generative AI, including when consent and disclaimers are needed, such as in meeting recordings, and a clear articulation of accountabilities.

    Encourage greater adoption

    6. Workflow analysis

    Agencies should conduct detailed analyses of workflows across various job families and classifications to identify further use cases that could improve generative AI adoption.

    7. Use case sharing

    Agencies should share use cases in appropriate whole-of-government forums to facilitate the adoption of generative AI across the APS.

    Proactive risk management

    8. Impact monitoring

    The APS should proactively monitor the impacts of generative AI, including its effects on the workforce, to manage current and emerging risks effectively.

  • The overarching findings reveal several considerations for the APS in the context of future adoption of generative AI.

  • This section outlines the expectations and use of Microsoft 365 Copilot amongst trial participants including its use across the Microsoft 365 suite and identification of current, novel and future use cases.

  • Key insights

    Most trial participants (77%) were satisfied with Copilot and wish to continue using the product.

    The positive sentiment towards Copilot was not uniformly observed across all MS products or activities. In particular, MS Excel and Outlook Copilot functionality did not meet expectations.

    Other generative AI tools may be more effective at meeting bespoke users’ needs than Copilot. In particular, Copilot was perceived to be less advanced in: writing and reviewing code, producing complex written documents, generating images for internal presentations, and searching research databases.

    Despite the overall positive sentiment, the use of Copilot is moderate with only a third of post-use survey respondents using Copilot on a daily basis. Due to a combination of user capability, perceived benefit of the tool and convenience, and user interface, Copilot is yet to be ingrained in the daily habits of APS staff.

    Copilot is currently used mainly for summarisation and re-writing content in Teams and Word.

    There was a positive relationship between the provision of training and capability to use Copilot. Copilot training was most effective when tailored to the APS, the users’ role and the agency context.

    There are opportunities to further enhance the use of generative AI across the policy lifecycle to increase adoption and benefits of generative AI.

  • Rather than just getting through the daily tasks reactively to meet deadlines, I feel as though I have more time to consider and work through about the way we do things and why we are doing them.

    Trial participant from the administration job family, post-use survey
  • Trial participants also remarked on the ability to spend more time on face-to-face activities such as staff engagement, culture building, mentoring and taking more time to build relationships with end users and stakeholders. Acknowledging the human dependent nature of these tasks, these respondents redistributed their time into face-to-face activities such as communications to support their team and/or customers.

    The quality of Copilot output limited the scale of productivity benefits.

    Contextual irrelevance impacted the quality of Copilot outputs.

    Overall, Copilot’s improvements to work quality were more subdued than improvements to work efficiency. As highlighted in Figure 14, while the majority of trial participants viewed that Copilot was effective at developing first drafts of documents and lifting overall quality, editing was almost always needed to tailor content for the audience or context thereby reducing total efficiency gains.

    A graph showing that more than 50% of participants reported quality improvements and even greater time savings for tasks involving summarising information, creating first drafts, preparing meeting minutes and searching for required information.
    A graph showing that less than 50% of participants reported quality improvements or time savings across tasks involving preliminary data analysis, preparing slides, communicating through emails and messages or writing and reviewing code.

    Figure 14 | Post-use survey responses reporting time savings of 0.5 hours or more (n=795) and overall agreement of improved quality of work (n=801), by type of activity

  • I have concerns about accuracy and hallucinations (both of which I experienced) which leads to distrust and needing to "double-check" its outputs; this significantly impacts any time savings made.

    Trial participant from the ICT and Digital Solutions job family, post-use survey.
  • Most focus group participants reported they experienced inaccuracies or hallucinations in Copilot outputs during the trial. These inaccuracies typically materialised in the form of believable but ill-informed statements. 

    Focus group participants viewed that the search functionality through the Teams chat would often retrieve outdated or irrelevant documents. They also noted that Copilot could not prioritise retrieval of documents developed by their team or department, often surfacing documents from across the entire organisation. 

    Focus group participants also noted that while Copilot attaches sources to its outputs, this is currently limited to 3 documents and does not provide visibility on why the documents were selected. Observations from Home Affairs also identified that Copilot appeared to be unreliable in its approach to referencing information provided against data sources (Department of Home Affair 2024:10).

    The potential inaccuracy of Copilot represents a large reputational risk to the APS with thorough quality assurance processes needed to mitigate risks of inaccurate information in Copilot outputs. However, it is likely the additional need of quality assurance processes will reduce the productivity benefits of Copilot.

    References

    1. Australian Taxation Office (2024) ‘M365 Copilot Trial Update’, Australian Taxation Office, Canberra, ACT, 3.
    2. Department of Home Affairs (2024) ‘Copilot Hackathon’, Department of Home Affairs, Canberra, ACT, 10.
    3. Department of Industry, Science and Resources (2024) ‘DISR Internal Mid-trial Survey Insights’, Department of Industry, Science and Resources, Canberra, ACT, 6.
  • 1. Basic information

    1.1    AI use case profile

    This section is intended to record basic information about the AI use case.

    Name of AI use case

    Choose a clear, simple name that accurately conveys the nature of the use case.

    Reference number

    Assign a unique reference number for your assessment. Unless otherwise advised by your agency or the Digital Transformation Agency (DTA), we recommend using an abbreviation of your agency’s name followed by the date (YYMMDD) that work first began on this assessment and a sequence number if multiple assessments start on the same day. This is intended to assist with internal record keeping and engagement with the DTA.

    Lead agency

    This should be the agency with primary responsibility for the AI use case. Where 2 or more agencies are jointly leading, nominate one as the contact point for the assessment.

    Assessment contact officer

    This should be the officer with primary responsibility for the completion and accuracy of the AI assurance assessment. 

    Executive sponsor

    This should be the SES officer with primary responsibility for reviewing and signing off on the AI use case assessment. 

    AI use case description

    Briefly explain how you are using or intending to use AI. This should be targeted at the level of an ‘elevator pitch’ that gives the reader a clear idea of the kind of AI use intended, without going into unnecessary technical detail. You may wish to include a high‑level description of the problem that the AI use case is trying to solve, the way AI will be used and the outcome it is intended to achieve (drawing on your answers in section 2). Use simple, clear language, avoiding technical jargon where possible.

    Type of AI technology

    Briefly explain what type of AI technology you are using or intend to use (for example, supervised or unsupervised learning, computer vision, natural language processing, generative AI). 

    While this may require a more technical answer than the use case description, aim to be clear and concise with your answer and use terms that a reasonably informed person with experience in the AI field would understand. 

    1.2    Lifecycle stage

    The lifecycle stages and the guidance below are adapted from the OECD’s definition of the AI system lifecycle.

    Early experimentation

    Intended to cover experimentation which does not:

    • commit you to proceeding with a use case or to any design decisions that would affect implementation later
    • commit you to expending significant resources or time
    • risk harming anyone
    • introduce or exacerbate any privacy or cybersecurity risks
    • produce outputs that will form the basis of policy advice, service delivery or regulatory decisions.

    Design, data and models

    A context-dependent phase encompassing planning and design, data collection and processing, and model building. 

    ‘Planning and design of an AI system’ involves articulating the system’s concept and objectives, underlying assumptions, context and requirements and potentially building a prototype.

    ‘Data collection and processing’ includes gathering and cleaning data, performing checks for completeness and quality, documenting metadata and characteristics of the data set. 

    ‘Model building and interpretation’ involves the creation, adaptation or selection of models and algorithms, their calibration and/or training and interpretation.

    Verification and validation

    Involves executing and tuning models, with tests to assess performance across various dimensions and considerations.

    Deployment

    Into live productions involves piloting, checking compatibility with legacy systems, managing organisational change and evaluating user experience.

    Operation and monitoring

    Involves operating the AI system and continuously assessing the recommendations and impacts (intended and unintended) in light of objectives and ethical considerations. This phase identifies problems and adjusts by reverting to other phases or, if necessary, retiring an AI system from production.

    Retirement

    Involves ceasing operation or development of a system and may include activities such as evaluation, decommissioning and data migration.

    These phases are often iterative and not necessarily sequential. The decision to retire an AI system from operation may occur at any point in the operation and monitoring phase.

    1.3    Review date

    Include the estimated date when this assessment will next need to be reviewed. For example, ‘Moving to deployment – Q3 2026’.

    The triggers for a review are:

    • an AI use case moving to a different stage of its lifecycle (for example, from ‘design, data and models’ to ‘verification and validation’)
    • a significant change to the scope, function or operational context of the use case.

    Agencies may choose to conduct reviews at regular intervals even if the above review triggers have not been met, in line with internal policies and risk tolerance. For assistance in determining the next appropriate review date, consult the DTA.

    1.4    Assessment review history

    For each review of the assessment, record the review date and summarise the change or changes arising from the review. 

  • AI use cases covered by this framework

    In determining whether an AI use case meets this framework’s definition of a ‘covered AI use case’, you may wish to refer to the risk consequence and risk likelihood rating advice attached to this guidance to assist you in considering whether the use of AI could ‘materially influence’ a decision leading to more than insignificant harm.

    A decision may be considered ‘materially influenced’ by an AI system if:

    • the decision was automated by an AI system, with little to no human oversight
    • a component of the decision was automated by an AI system, with little to no human oversight (for example, a computer makes the first 2 limbs of a decision, with the final limb made by a human)

    the AI system is likely to influence decisions that are made (for example, the output of the AI system recommended a decision to a human for consideration or provided substantive analysis to inform a decision).

    Off
  • AI use cases covered by this framework

    In determining whether an AI use case meets this framework’s definition of a ‘covered AI use case’, you may wish to refer to the risk consequence and risk likelihood rating advice attached to this guidance to assist you in considering whether the use of AI could ‘materially influence’ a decision leading to more than insignificant harm.

    A decision may be considered ‘materially influenced’ by an AI system if:

    • the decision was automated by an AI system, with little to no human oversight
    • a component of the decision was automated by an AI system, with little to no human oversight (for example, a computer makes the first 2 limbs of a decision, with the final limb made by a human)
    • the AI system is likely to influence decisions that are made (for example, the output of the AI system recommended a decision to a human for consideration or provided substantive analysis to inform a decision).
    Off
  • 2. Purpose and expected benefits

    2.1    Problem definition

    Describe the problem that you are trying to solve. 

    For example, the problem might be that your agency receives a high volume of public submissions, and that this volume makes it difficult to engage with the detail of issues raised in submissions in a timely manner.

    Do not describe how you plan to fix the problem or how AI will be used. 

    Though ‘problem’ implies a negative framing, the problem may be that your agency is not able to take full advantage of an opportunity to do things in a better or more efficient way. 

    2.2    AI use case purpose

    Clearly and concisely describe the purpose of your use of AI, focusing on how it will address the problem you described at 2.1. 

    Your answer may read as a positive restatement of the problem and how it will be addressed.

    For example, the purpose may be to enable you to process public submissions more efficiently and effectively and engage with the issues that they raise in more depth. 

    2.3    Non‑AI alternatives

    Briefly outline non‑AI alternatives that could address the problem you described at 2.1. 

    Non‑AI alternatives may have advantages over solutions involving AI. For example, they may be cheaper, safer or more reliable. 

    Considering these alternatives will help clarify the benefits and drawbacks of using AI and help your agency make a more informed decision about whether to proceed with an AI‑based solution. 

    2.4    Identifying stakeholders

    Conduct a mapping exercise to identify the individuals or groups who may be affected by the AI use case. Consider holding a workshop or brainstorm with a diverse team to identify the different direct and indirect stakeholders of your AI use case. 

    The list below may help generate discussion on the types of stakeholder groups to consider. Please note the stakeholder types below have been provided as a prompt to aid discussion and is not intended as a prescriptive or comprehensive list.

    For each type, identify the use case stakeholders and how they might be affected (positively or negatively).

    End users

    People who will use the AI system and/or interpret its outputs.

    Evaluation or decision subjects

    People or groups who will be evaluated or monitored by the system (e.g. who the system is making predictions or recommendations about).

    Oversight team

    The person or team who is managing, operating, overseeing or controlling and monitoring the system during and after deployment.

    System owner or deployer 

    The executive executives responsible for making decisions on whether to use a system for a particular use.

    AI model or AI system engineers

    Those involved in AI model or system design, development and maintenance.

    Rights holders

    Those who hold the rights to materials used by AI (e.g. copyright owners or creators).

    Malicious actors

    Those who may intentionally misuse the system.

    Bystanders

    Those in vicinity of system that may be impacted.

    Regulators and civil society organisations

    Those who regulate, advocate for regulation, or are concerned about compliance.

    Communities, or groups

    Communities who are likely to be affected by the use of the system.

    Associated parties

    Third parties impacted by an evaluation or decision and other stakeholders who may have an interest in the use of the system based on their relationship to other stakeholders.

    Non-end-user APS staff

    APS staff whose roles and workflows will be affected by AI but are not end users of your AI use case.

    Intermediaries

    A facilitator or agent between 2 parties whose role may evolve with AI integration (e.g. tax agents).

    2.5    Expected benefits

    This section requires you to explain the expected benefits of the AI use case, considering the stakeholders identified in the previous question.

    This analysis should be supported by specific metrics or qualitative analysis. Metrics should be quantifiable measures of positive outcomes that can be measured after the AI is deployed to assess the value of using AI. Any qualitative analysis should consider whether there is an expected positive outcome and whether AI is a good fit to accomplish the relevant task, particularly when compared to the non‑AI alternatives you identified previously. Benefits may include gaining new insights or data. 

    Consider consulting the following resources for further advice

  • 3. Threshold assessment

    3.1    Threshold assessment process

    To complete the threshold assessment, follow these steps.

    3.1.1   Determine likelihood and consequence

    For each risk category listed in the assessment, determine the likelihood and consequence of the risk occurring for your AI use case. You should consult the likelihood and consequence descriptors at the Attachment to this guidance.

    The risk assessment should reflect the intended scope, function and risk controls of the AI use case.

    In conducting your assessment, you should be clear on:

    • key factors contributing to the likelihood and consequence of the risk
    • how any existing or planned risk controls contribute to the likelihood and consequence of the risk
    • any assumptions or uncertainties affecting your risk assessment.

    3.1.2    Determine risk severity

    Use the risk matrix provided in the framework and at the attachment to this guidance to determine the risk severity for each category.

    3.1.3    Provide explanations

    In the ‘rationale’ column, provide a clear and concise explanation for each risk rating (aim for no more than 200 words per risk but use additional words if necessary).

    You should cover the factors, controls and assumptions outlined above at step 1.

    3.2    Assessment contact officer recommendation

    Once completed, if the Assessment Contact Officer is satisfied that all risks are low, they may recommend that a full assessment is not required and that the executive sponsor accept the low risks and endorse the use case. If one or more risks are medium or higher, the assessment contact officer must either: 

    • complete a full assessment
    • amend the scope, function or risk controls to a point where the threshold assessment results in a low risk rating
    • decide to not accept the risk and not proceed with the AI use case.

    3.3    Executive sponsor review

    Once the assessment contact officer has made their recommendation, the executive sponsor must: 

    1. review the recommendation 
    2. confirm whether they are satisfied by the supporting analysis 
    3. agree that a full assessment is or is not necessary for the use case.

    When completing the threshold assessment, keep in mind the following:

    • Try to be objective and honest in your assessment of risks. Underestimating risks at this stage could lead to inadequate risk management.
    • Determining risk ratings can be challenging. Seek guidance from others to assist you (especially subject matter experts and those experienced in safe and responsible AI risk assessments).
    • Consider the perspectives of stakeholders, including those identified at section 2.4, in assessing the likelihood and consequence of risks. 
      • Ensure you consider the perspectives of marginalised groups, including First Nations people, especially in relation to the risks relating to discrimination and stereotyping. You may not have the background or life experience to fully appreciate these risks.
    • Where there is uncertainty or disagreement about the appropriate risk severity rating, err on the side of caution and choose the higher rating.
    • Document key assumptions or evidence used in determining the risk severity ratings, as this will help explain the rationale for your assessment to reviewers.
    • Consider the expected benefits of the AI use case before deciding whether to proceed based on significant but mitigable risks.
  • 4. Fairness

    4.1   Defining fairness

    Fairness is a core principle in the design and use of AI systems, but it is a complex and contextual concept. Australia’s AI Ethics Principles state that AI systems should be inclusive and accessible and should not involve or result in unfair discrimination. However, there are different and sometimes conflicting definitions of fairness, and people may disagree on what is fair. 

    For example, there is a distinction between individual fairness (treating individuals similarly) and group fairness (similar outcomes across different demographic groups). Different approaches to fairness involve different trade‑offs and value judgments. The most appropriate fairness approach will depend on the specific context and objectives of your AI use case.

    When defining fairness for your AI use case, you should be aware that AI models are typically trained on broad sets of data that may contain bias. Bias can arise in data where it is incomplete, unrepresentative or reflects societal prejudices. AI models may reproduce biases present in the training data, which can lead to misleading or unfair outputs, insights or recommendations.

    This may disproportionally impact some groups, such as First Nations people, people with disability, LGBTIQ+ communities and multicultural communities. For example, an AI tool used to screen job applicants might systematically disadvantage people from certain backgrounds if trained on hiring data that reflects past discrimination.

    When defining fairness for your AI use case, it is recommended that you:

    • consult relevant domain experts, affected parties and stakeholders (such as those you have identified at section 2.4) to help you understand the trade‑offs and value judgements that may be involved
    • document your definition of fairness in your response to section 4.1, including how you have balanced competing priorities and why you believe it to be appropriate to your use case
    • be transparent about your fairness definition and be open to revisiting it based on stakeholder feedback and real‑world outcomes.

    You should also ensure that your definition of fairness complies with anti‑discrimination laws. In Australia, it is unlawful to discriminate on the basis of a number of protected attributes including age, disability, race, sex, intersex status, gender identity and sexual orientation in certain areas of public life, including education and employment. Australia’s federal anti‑discrimination laws are contained in the following legislation:

    • Age Discrimination Act 2004
    • Disability Discrimination Act 1992
    • Racial Discrimination Act 1975
    • Sex Discrimination Act 1984.

    Resources

    4.2   Measuring fairness 

    You may be able to use a combination of quantitative and qualitative approaches to measuring fairness. Quantitative fairness metrics can allow you to compare outcomes across different groups and assess this against fairness criteria. Qualitative assessments, such as stakeholder engagement and expert review, can provide additional context and surface issues that metrics alone might miss.

    Quantifying fairness

    The specific quantitative metrics you use to measure fairness will depend on the definition of fairness you have adopted for your use case. When selecting fairness metrics, you should:

    • choose metrics that align with your fairness definition, recognising the trade‑offs between different fairness criteria and other objectives like accuracy
    • confirm whether you have appropriate data to assess those metrics, including sensitive attributes where appropriate (see Australian Privacy Principles 3.3)
    • set clear and measurable acceptance criteria (see guidance for 5.4) 
    • establish a plan for monitoring these metrics (see 5.6) and processes for remediation, intervention or safely disengaging the AI system if those thresholds are not met.

    For examples of commonly used fairness metrics, see the Fairness Assessor Metrics in CSIRO Data61’s Responsible AI Pattern Catalogue.

    Qualitatively assessing fairness

    Consider some of these qualitative approaches, which may be useful to overcome data limitations and to surface issues that metrics may overlook.

    Stakeholder engagement

    Consult affected communities, stakeholders and domain experts to understand their perspectives and identify potential issues.

    User testing and feedback

    Test your AI system with diverse users and solicit their feedback on the fairness and appropriateness of the system’s outputs. Seek out the perspectives of marginalised groups and those groups that may be impacted by the AI system.

    Expert review

    Engage experts, such as AI ethicists or accessibility and inclusivity specialists, to review the fairness of your system’s outputs and the overall fairness approach and identify potential gaps or unintended consequences.

    Resources

Connect with the digital community

Share, build or learn digital experience and skills with training and events, and collaborate with peers across government.