Risk of Bias 2: Frequently asked questions

About Risk of Bias 2 (RoB 2)

What is RoB 2 and how is it different from the previous risk of bias tool?

RoB 2 is an updated version of the Cochrane risk of bias tool and has been created by the same team. The main differences are: each domain has a set of questions to help you decide if there is bias or not; you assess bias for specific results within an RCT, rather than for the RCT as a whole; algorithms embedded in the tool propose judgements; there is an overall bias decided across all the domains for each result. A short webinar describes the new tool and explains why the team decided to update it and what the advantages are. Full details about the tool can be found via the riskofbias.info website.

I am new to using RoB 2. How do I incorporate the tool into my protocol?

RoB 2 is very different to other tools used for assessing bias in RCTs. We have a set of protocol considerations listed in the RoB 2 Starter Pack. These will help you think through what you need to do. Also, in the Starter Pack are links to useful webinars including an introduction to RoB2.

RoB 2 is “results-based”; what does this mean?

This means that authors assess bias for a specific result rather than assessing the RCT as whole. This is important because one result, for example mortality at end of treatment, may be assessed as low risk of bias. But mortality assessed at 2-year follow-up might be assessed as high risk of bias. This could happen, for example, if the triallists have lost a lot of people from the study through attrition. Authors using RoB 2 need to specify the outcomes they plan to assess for bias and then assess bias for each result, in that study, that contributes to those outcomes.

Why does RoB 2 not include affiliation bias assessments?

RoB 2 focusses on mechanisms through which bias may arise (e.g. randomisation processes; influences on outcome assessments due to knowledge of intervention received; processes of selecting which outcome data to report). Affiliation bias may lead to bias in trial results through one or more of the mechanisms covered by RoB 2 but also leads to problems such as inappropriate choice of comparators, or suppression of results, neither of which is covered by RoB 2. A new tool titled Tool for Addressing Conflicts of Interest in Trials (TACIT) is currently under development that will assess affiliation bias. More information on this will be available in due course: see the TACIT website.

I heard that RoB2 is a “results-based” or "results-level" tool but have seen it is sometimes called “outcome-based”. Which is it and what do these mean?

RoB 2 is called “results-based” or “results-level” because it is used to assess bias for a specific result reported in an individual study. The original risk of bias tool, and most other tools, assess bias across all outcomes and results for an entire study. Assessing bias for a specific result means that it is more precise. For example, for the domain “Missing outcome data” one result in a study may have minimal missing data at an early time-point and could be judged as low risk of bias; the same outcome measured 6 months later might have considerable missing data and be judged as some concerns or high risk of bias. By assessing bias for each of these results separately we can present a more accurate assessment of bias. In some cases, the phrase “outcome-based” or “outcome-level” is used. They are used when the bias assessments of several results, for the same outcome are drawn together. For example, in the outcome-level risk of bias tables that feature in RevMan Web.

Editing Cochrane reviews that use RoB 2

I am a Group Editor – do all new intervention reviews in my Cochrane Group need to use RoB 2?

No. RoB 2 will result in better quality reviews so we do encourage its use, but it is not yet mandated across all new Cochrane intervention reviews. As an editor, you can decide whether or not to approve RoB 2 methods for reviews in your Cochrane Group. There is more information here.

I am a Group Editor and want to check if a review has used the RoB 2 tool in the right way – how can I tell?

The Starter Pack has a set of protocol considerations that authors will find helpful to ensure their review is set up to use RoB 2 correctly. In addition, it has recommendations for reporting RoB 2 in the completed review. These are closely aligned with MECIR and The Cochrane Handbook. We are asking editors to check that all Cochrane reviews and protocols are following these recommendations. If you have any concerns or queries you can contact the Methods Support Unit (MSU) for advice.

Using RoB 2 in Cochrane reviews

What outcomes should I choose for RoB 2 assessments?

We advise that the outcomes listed in your Summary of Findings table(s) should have RoB 2 assessments as you will use these to inform your GRADE judgements.

Do I need to complete a risk of bias assessment for all of the outcomes from all of the studies in my review?

No. We advise that authors focus on the key outcomes for the key comparisons in your review. If there is no numerical result for an outcome from a specific study, then you do not need to complete a risk of bias assessment as it will not be contributing to the review.

Are we supposed to produce reviews that focus only on a few primary outcomes?

The numbers of outcomes a review team selects to assess for bias should be based on their review question(s), and data needed for decision making, clinical need and research. As stated in the previous question, we recommend that authors focus on key outcomes for the key comparisons

Ideally, when should the risk of bias assessments be completed in the review process – before or after data analysis?

Risk of bias can (and should) be assessed before the synthesis itself is conducted.

Is there any way to help organise our assessments using RoB 2? Where should I store all the answers to the signalling questions?

There are different means to store your data. The Risk of Bias 2 development team have developed both Word and Excel documents to assist review teams. For Cochrane reviews, we encourage review teams to use the RoB 2 Excel tool to store their data.

The instruction to "Specify the numerical result" on the first page of the RoB 2 excel sheet form. Does this relate to a specific time point?

Yes. The specific result is the data reported by the study for the outcome, time point and measurement method that you are assessing. The exception is if the result could relate to several timepoints, for example if it is estimated from a regression analysis across multiple time points (based on the trajectory of each individual over time) in which case, the result will not relate to a specific time point.

What can authors do in the early stages to prepare for data extraction and risk of bias screening?

Author teams could run a calibration exercise for all authors in a review (both methodological and clinical). This would involve completing RoB 2 assessments for a sample of results from three or so RCTs with group discussions about how best to answer the signalling questions. For the calibration exercise the team should have all documentation (e.g. trial reports, protocols, and trial registry reports, etc) available at the start of the RoB 2 assessments, together with the guidance from the risk of bias.info website

If the default is a high RoB rating overall if one of the five domains is assessed as high risk, why continue to do a full assessment of the remaining domains?

Having all domains completed is an important factor in the move towards transparency in review production. In the same way we would ask for transparency in RCT reporting following CONSORT guidelines, so it is important to be transparent when producing reviews. In addition, and just as important, as a review author, it is important to know the detail of each study in the review. So that is one reason to complete all domains for all outcomes being assessed.

Examination of bias across the outcomes in a review can reveal if triallists are making decisions that are likely to cause outcomes to be judged at high risk of bias or, what aspects of trial design and analysis could reduce bias for an outcome. This is useful and important for both the “Implications for research” section of your review and for transparency in review production. It would be important to have data from all domains to see the pattern of bias emerges across outcomes.

Finally, it is important to have a full picture of the risk of bias across all domains for a particular outcome, as this information will be needed to decide how to the GRADE risk of bias assessment for the outcome.

What should I do if my review does not have a meta-analysis?

RoB 2 aims to assess risk of bias in a quantitative result that feeds into a synthesis, whether the synthesis uses meta-analysis or a Synthesis Without Meta-analysis (SWiM) approach. Authors should include their bias assessments for outcomes that are summarised using SWiM methods and should include them in their GRADE assessments for the Summary of Findings tables too.

Should only studies with results at low risk of bias be included in the meta-analysis?

Restricting analyses to include only results at low risk of bias is an option. It can also be appropriate to pool data from results at all levels of bias including high risk and use a sensitivity analysis to assess the effects of restricting the analysis to include only results at low risk of bias. Ideally decisions should be made at the protocol stage and pre-specified.

Is it OK to stratify analyses by domain? Or do we have to use overall RoB?

RoB 2 was designed so the overall risk of bias represents the risk of bias for that result. If authors add additional subgroup analyses and sensitivity analyses, it will increase the risk of false results by chance. Therefore, we strongly encourage all authors to use overall risk of bias for any subgroup or sensitivity analyses. Any decision to use specific domains must be justified.

I have already registered my Cochrane review title and was planning to use the original risk of bias tool. However, after reading about RoB 2, can I switch to that instead?

If you want to switch from the original risk of bias tool to RoB 2, the earlier you are in the review process, the easier the switch will be. Once a protocol has been published that includes the original risk of bias tool, authors should not switch to RoB 2 without getting approval from their Group Editor. We strongly recommend that you do not switch tool after protocol publication as there are key considerations for RoB 2 that must be pre-specified and the authors will be at high risk of using RoB 2 incorrectly. However, if you do want to switch you can, you just need to add an explanation in the 'Differences between protocol and review' section.

Updating Cochrane Reviews

When updating a review, should we move from the original risk of bias tool to RoB 2?

It will never be mandatory to switch to RoB 2 during an update if the previous version of the review used the original risk of bias tool. However, authors and Cochrane groups can make decisions on whether to switch to RoB 2 on a case-by-case basis. If a published review only has a handful of RCTs and you are expecting many more when you update, switching to RoB 2 would probably be best for the quality of the review. However, if the published review includes a large number of RCTs and you are not expecting many new studies the gains will be marginal, and it would be reasonable to continue using the original tool. If you do switch to RoB 2, the relevant methods will need updating before you begin your update to cover all of the protocol considerations detailed in the Starter Pack and this can be included in the version history. Changes to the original protocol can be included in the section 'Differences between protocol and review'. If authors do not pre-specify these RoB 2 considerations, they will be at high risk of using RoB 2 incorrectly. More information can be found in the Starter Pack.

I am updating my review; can I use the original risk of bias tool for the old trials and RoB 2 for the new trials?

No. If you to choose to use RoB 2 all results will need to be assessed in the same way. If you are updating your review, there will be a choice to keep the original risk of bias tool or switch to the new tool. The decision to switch can be made on the basis of the number of studies in the review before it is updated, and likely number after the update. If a published review only has a handful of RCTs and you are expecting many more when you update, switching to RoB 2 would probably be best for the quality of the review. However, if the published review includes a large number of RCTs and you are not expecting many new studies the gains will be marginal, and it would be reasonable to continue using the original tool. If you do switch to RoB 2, the relevant methods will need updating before you begin your update to cover all of the protocol considerations detailed in the Starter Pack and this can be included in the version history. Changes to the original protocol can be included in the section 'Differences between protocol and review'. If authors do not pre-specify these RoB 2 considerations, they may apply RoB 2 incorrectly. More information can be found in the Starter Pack.

RoB 2 and variants of RCTs

Is it appropriate to use RoB 2 for quasi-randomised trials?

'Quasi-randomised trials' (or 'pseudo-randomised trials') lie on a spectrum from studies that look very similar to randomised trials (e.g. using alternation, perhaps even with the sequence concealed at the point of allocation) to things that don’t look much like randomised trials at all. For these reasons we strongly advise that reviewers define the precise nature of the design of studies they are labelling as 'quasi-RCTs' to avoid ambiguity. The definition should use the features of the study design to describe the studies you plan to include.

It is the reviewers’ decision whether to include studies other than RCTs or not. If you know you will have enough RCTs to answer a question we would advise excluding studies that have not used randomisation. If you do include them, we would advise a secondary analysis to see how data from studies without randomisation influences the treatment effect estimate.

The second part of our answer covers which risk of bias tool to use. We recommend that

reviews including RCTs and quasi-randomised trials should use RoB 2 for all studies.
reviews including non-randomised studies of interventions including quasi-randomised trials should use ROBINS-I for all studies.

We have included cluster RCTs, is RoB 2 able to assess cluster RCTs?

Yes. There is version of RoB 2 specific to cluster RCTs and it was released in March2 021. It contains an additional domain to assess bias associated with timing and recruitment of participants. The new version is available from riskofbias.info website.

We have included cross-over RCTs, can I use the tool for cross-over RCTs?

Yes. There is a version of RoB 2 specific to cross-over RCTs and it was released in March 2021. It contains an additional domain to assess bias associated with period and carryover effects. The new version is available from riskofbias.info website.

Are there Excel RoB 2 tools for the cluster and crossover versions of the RoB 2 tool?

Yes. The Excel tools for the cluster and cross-over versions of the RoB 2 tool are available from the riskofbias.info website.

RoB 2 and RevMan Web

How can I 'switch on' RoB 2 in RevMan Web?

Authors are not able to switch on the RoB 2 features in RevMan Web themselves. Please inform your Group Editor or the Cochrane Support Team (support@cochrane.org) when you are ready to switch on the RoB 2 features in your review and they can enable this advanced feature for you.

Does the RoB 2 Excel tool link into RevMan Web?

The Excel file cannot be imported into RevMan Web (at this point in time). The RoB 2 team are working on an online tool.

Our advice is that detailed risk of bias assessments (with consensus responses to the signalling questions) may be presented in supplementary materials via an online repository such as datadryad.org or figshare.com (see the Editorial policy here). Then authors need to manually copy over the decisions for each domain, the overall risk of bias and the support for judgements into RevMan Web. The domain ratings from all authors assessing risk of bias do not need to be included (only the agreed, consolidated ratings).

If we need to use one of the tools and decide to use the Excel tool, do we then need to manually enter only the domain judgements into RevMan Web, or also the “decision information”?

Only the final, consensus agreed judgements and support for judgements (for each result assessed, each domain and overall) should be included in RevMan Web. See this step-by-step guide on what to input into RevMan Web

How will RoB 2 be displayed in RevMan Web if we have a bias assessment per outcome?

Authors are expected to include one figure per outcome and one table per outcome. Once bias assessments have been entered into RevMan Web the software can generate forest plots with traffic lights and risk of bias tables for each outcome. Visual representations of risk of bias assessments when Synthesis Without Meta-analysis (SWiM) methods have been used will need to be created by the authors, for example by using robvis software. For outcomes with a meta-analysis we recommend that in the main text of the review authors cite the relevant Analysis. Doing this will link to the relevant forest plot with the traffic lights plot. You will not need to create additional figures. The risk of bias outcome-level tables are interactive and show the judgment for each domain, for each study, for each outcome. These judgements 'expand' to reveal the 'justification' or support for judgement on the html version of the review. For full guidance please see the Starter Pack and RevMan Web Knowledge Base

Would you recommend that we create a separate Excel file for each outcome in order to be able to create the summary tables and outputs required for RevMan Web?

Some reviewers are using Excel for all outcomes. Other reviewers create a new Excel file for each outcome.

Is it possible to do the RoB 2 assessments directly in RevMan Web or do we need to use one of the tools (e.g. Excel tool) and then add this to RevMan Web?

We advise that you use the Excel tool to complete the RoB 2 assessments. This is available on the riskofbias.info website. We recommend this because it will store the answers to all the signalling questions for each reviewer in a succinct format, allows authors manage the reaching of consensus agreements and can be used to create an Excel file of consensus agreements to upload and make available for readers and peer reviewers.

Once your author team confirms the final, consensus agreed judgements and support for judgements (for each outcome) these should be inputted into RevMan Web. At the moment this needs to be done manually by copy and pasting from the tool to RevMan Web.

We ask that review teams make the consensus decisions and answers for each signalling question available as a link to data repository as peer reviewers and their Group editors may want to check a few of these. They will need to be available to readers In the finished review . Detailed guidance is available in the Starter Pack and RevMan Web Knowledge Base. RevMan Web Knowledge Base.

I have a study with two subgroups with different ROB 2 assessments for each of the subgroups, can I plot these on the same Forest plot?

Unfortunately, it is not possible, in RevMan Web, to store two sets of ROB 2 assessments for a single study. This means for subgroups you can only input one set of ROB 2 assessments.

Reporting RoB 2

Should we be using the fixed headings for ‘Domains’ in the ‘Risk of bias in included studies’ section in RevMan Web? These do not appear to fit with the RoB 2 guidance - which is to not present the domains.

The new reporting guidance encourages authors to link risk of bias discussions more closely to the effect estimates for each outcome. This is because RoB 2 assesses bias separately for each outcome. The headings for risk of bias domains are removed in RevMan Web when the RoB 2 feature is switched on. Detailed reporting guidance is included in the Starter Pack.

Our review includes RoB 2 assessments for both ‘Effect of Adherence’ and ‘Effect of Assignment’ how can we present them both in RevMan Web?

At the moment it is not possible to present risk of bias for both types of 'Effects' in RevMan Web. Our advice is to present the bias that is more prominent in your review (for example, that related to the key research questions) into forest plots and RoB 2 tables that are embedded in RevMan Web. We advise that you create additional figures and tables outside of RevMan Web for the secondary bias assessment. These can be included either as 'Additional table' and 'Figures in RevMan or included in an appendix. The robvis tool allows authors to create traffic light plots. The images created can be exported in a range of high-quality image files that can be imported into the review. Guidance on how to do this is in the Starter Pack.

How do we create figures when a review includes cluster and crossover trials?

RoB2 for cross-over RCTs includes an additional domain 'Risk of bias arising from period and carryover effects. RoB 2 for cluster RCTs includes an additional domain 'Bias related to timing and recruitment of participants'. It is not possible, at the moment, to display this using the current figures in RevMan Web.

Therefore, we advise that you use the robvis tool to prepare a figure. The images can be exported and added to RevMan Web as additional figure. Guidance on how to do this is in the Starter Pack.

We have entered bias assessments for each outcome for our main analysis in RevMan Web. We want to do a sensitivity analysis do we have to re-enter the bias assessment reason for the judgment?

We advise authors to create a duplicate of their main analysis, which will also duplicate the risk of bias data (you do not have to re-enter the reasons for the judgements). You can then edit this for your sensitivity analysis. There is a feature specifically for sensitivity analysis in RevMan Web, you can see how it works here.

Do we need to keep the answers to the signalling questions for each of the results for each study?

Our recommendation is that detailed risk of bias assessments (with consensus responses to the signalling questions) may be presented in supplementary materials via an online repository such as datadryad.org or figshare.com (see Editorial Policy. Then authors need to manually copy over the decisions for each domain, the overall risk of bias and the support for judgements into RevMan Web. The domain ratings from all authors assessing risk of bias do not need to be included (only the agreed, consensus judgements).

RoB 2 and other review development software

Can we use Covidence to do RoB 2?

At the moment the risk of bias assessment section in Covidence is not compatible with RoB 2.

How does RoB 2 fit with CiNEMA for network meta-analyses?

RoB 2 works well with CINeMA as it expects an overall risk of bias judgement to inform the confidence considerations, which the RoB 2 tool produces. See this article on CINeMA.

Can we use Distiller to do RoB 2?

At the moment there is no formal arrangement, but it might be possible in the future to export RoB2 information in a format compatible with RevManWeb, once the RoB 2 import function is working in RevManWeb.

FAQs relating to Domain 1: Randomisation

Signalling question 1.2: If allocation concealment has not been implemented appropriately, but it still appears that baseline characteristics are balanced, would it be acceptable to NOT automatically judge the result to be at high risk of bias?

Answer: No. We cannot be sure that all important variables are equally distributed between groups, as not all variables can be reported in tables, or even known about. The point of randomisation is to ensure that all known and unknown prognostic variables are balanced between groups (as far as is compatible with chance).

Signalling question 1.3: In small trials, the numbers in groups at baseline can be unbalanced. Similarly, in trials in which the authors have reported results for many different characteristics at baseline, there can sometimes be imbalances between groups for some of these variables. How should I account for these differences when answering signalling question 1.3?

Answer: This question is only concerned with imbalances that are NOT compatible with chance and which therefore suggest that there has been a problem with the randomisation process. In small trials, one would expect that the numbers in groups would be unbalanced. Likewise, in trials that measure lots of different characteristics at baseline, there will be chance differences between groups for some characteristics. Note that if groups are excessively similar, this may also raise concerns that the randomisation process has not been implemented appropriately.

FAQs relating to Domain 2: Deviations from intended interventions

Overall Domain: If participants and carers/clinical personnel are aware of treatment allocation, how can a result be at low risk of bias in this domain? This is very different to Risk of Bias 1.

Answer: We believe that if participants and personnel are not blinded, this does not in itself lead to bias. It is important to follow the signalling questions in this domain to assess whether a lack of blinding may have led to bias. Please remember that the signalling questions will be different depending on whether you are assessing the effect of being assigned to an intervention, or the effect of adhering to an intervention. This recent BMJ paper has more information about the impact of blinding on effect size estimates in RCTs: https://www.bmj.com/content/368/bmj.m229

Overall Domain: Is there a plain language explanation of the difference between 'effect of assignment' and 'effect of adherence'?

Answer: Firstly, we consider the effect of assignment:

· Most systematic reviews will be interested in the ‘effect of assignment’. This is also known as the ITT or ‘intention to treat’ effect. This is what most RCTs set out to assess. The aim of ITT analysis in trials is to determine the effect of the intervention as delivered in the real world.

· In an ITT analysis, every participant randomised should be included in the primary analysis. Participants who drop-out, are excluded from the trial after randomisation or switch treatments, should all be included and analysed in the group to which they were originally randomised. The aim is to keep each group as complete as possible. This is sometimes called the ‘full analysis set’.

· Participants may be excluded from the analysis set if outcome data were not available. However, this problem is addressed in the domain ‘Bias due to missing outcome data’

· The reason this is important in systematic reviews is that we want to provide evidence from the real world on how effective an intervention is. If people are assigned to the intervention, and drop out – switch treatments, etc – that is all part of how effective the treatment is. For example, some people will forget to take their pills, or may stop going to therapy sessions if they start feeling better. These are all legitimate ‘effects’ of the intervention as delivered.

The ‘effect of adherence’, or the ‘per protocol effect’ is often of interest to patients, but not usually to reviewers. A good example is screening.

· The aim of a per-protocol analysis in a trial is to find the treatment effect under optimum conditions. What would the potential benefits/ harms be if a participant took the intervention exactly as directed?

· For this analysis, we need to consider i) any major protocol deviations (i.e., people not taking the intervention or following the intervention as it should be), ii) non-availability of measurements from the primary endpoint and iii) participants not getting ‘enough’ of the intended intervention. Unfortunately, the methods that are needed to address these problems are complex. The solution is not simply to drop participants from the data set.

· If you were preparing a review, ‘Effects of topical interventions for eczema in children’, then you would most likely be interested in effect of assignment. Does the intervention work as prescribed to people to give to their children? For a given RCT of experimental cream versus placebo cream, you would include all the participants, no matter what happened during the treatment. Overall, the effect size found in the analysis would be that of participants in a real clinical situation (as much as is possible in an RCT). It would include, in intervention and control arms, all the people who stopped applying the cream, applied it less regularly than was prescribed, those who were moved to a more intensive treatment etc.

· If you were preparing a review of prostate cancer screening, you might want to assess both effects. The effect of assignment would examine how well screening works in reducing mortality were you to roll out a screening service in your country. This is because we know that not all men asked to attend screening would attend. In addition to this, you would also want to assess the effect of adhering to the intervention. This is because you would want to answer the question, ‘what is the effect of screening on individuals who actually attend screening?’. We know that those people who do not attend for screening are likely to be systematically different to those to take up screening. For example, they might be more likely to be older and therefore might be more at risk of prostate cancer.

· There is more information in this webinar: https://evidencesynthesisireland.ie/rob2-0/

Signalling questions 2.1 and 2.2: If trial authors just state that their trial was ‘double-blind’, do we just accept this and give them the benefit of the doubt?

Answer: We ask that reviewers look at the trial and assess if they think blinding has been carried out reasonably well or not. They can then decide whether ‘Probably yes’ or ‘Probably no’ is appropriate.

Signalling question 2.3: Can you give an example of a deviation from the intended intervention that arose because of the trial context?

Answer: The answer is different depending on whether you are assessing the effect of being assigned to an intervention or the effect of adhering to an intervention.

For the effect of assignment to an intervention, we would be looking for variations in practice that come about because the participants are in a trial. If participants assigned to no intervention, went on to take the intervention, as a result of being recruited into the trial, then that would be a protocol violation. For example, in a trial of debt advice versus no intervention for improving financial circumstances, the intervention was a debt advice phone line that was nationally available to all citizens in that country. During the informed consent process all participants were made aware of the national debt advice helpline and many participants (10%) in the control group sought help from the debt advice helpline (Pleasance & Balmer 2007). There are more examples in the detailed RoB 2 guidance on the riskofbias.info website.

It can be helpful for those reviewers most familiar with the subject area to think through what might constitute potential deviations and to list these in the systematic review protocol.

For the effect of adherence to an intervention, there are more variations in intended intervention that are concerning with regards to bias than for ‘assignment to the intervention’. This is because we are interested in both those that arose because of the trial process (as we are for assignment) AND all the others that happen but are not due to the trial process. These centre around situations where participants stop taking the intervention. The reason why participants stop taking the intervention is key. For example, if participants were bored with their intervention, and stopped taking it, or stopped attending - this would not be considered a non-protocol violation. However, participants withdrawing from the interventions due to adverse events or side effects - would be a protocol violation. There are more examples and more information on this in section 5.1.2.2 and in Box 5 of the detailed guidance (riskofbias.info).

It can be helpful for those reviewers most familiar with the subject area to think through what might constitute potential deviations and to list these in the systematic review protocol.

Signalling question 2.4 (assessing the effect of adhering to the intervention): We are looking at a study in which a large proportion of clients in both the intervention and control groups were excluded from the analysis as they did not receive the intended intervention. Therefore, we have answered ‘Probably yes’ to signalling question 2.4, ‘could failures in implementing the intervention have affected the outcome?’. However, we have also answered ‘Yes’ to signalling question 3.1, ‘were data for this outcome available for all, or nearly all, participants randomized?’, for the same reason. It seems a little unfair to penalise the study in two domains for the same reason. What would your advice be?

Answer: People may be missing from the analysis for two reasons, and they are dealt with in different parts of the tool and so there should be no double counting of reasons. The first reason is exclusions by the trial investigators. This is often done, for example, to exclude people who did not receive the intervention when doing a naïve per-protocol analysis. These exclusions are addressed in signalling question 2.4. The second reason is when there were outcome data that should have been included but were missing (e.g. due to drop out or missed clinic visits). These missing data are addressed in Domain 3 (signalling question 3.1). The example above would be dealt with within Domain 2.

Signalling question 2.6 (assessing the effect of assignment to the intervention): Is this question about whether the analysis approach was appropriate to estimate the effect of assignment to intervention, and not whether this analysis was done appropriately to account for missing outcome data?

Answer: Correct. As the guidance explains,

· Both intention-to-treat (ITT) analyses and modified intention-to-treat (mITT) analyses excluding participants with missing outcome data would be considered appropriate.

· Both naïve ‘per-protocol’ analyses (excluding trial participants who did not receive their assigned intervention) and ‘as treated’ analyses (in which trial participants are grouped according to the intervention that they received, rather than according to their assigned intervention) are considered inappropriate.

· Analyses excluding eligible trial participants post-randomization should also be considered inappropriate, but post-randomization exclusions of ineligible participants (when eligibility was not confirmed until after randomization and could not have been influenced by intervention group assignment) can be considered appropriate.

Signalling question 2.6 (assessing the effect of adhering to the intervention): How do we deal with naïve per protocol analyses?

Answer: This is not an appropriate analysis to estimate the per protocol effect. It removes the benefit of randomisation since it removes people from one or more arms on the basis of characteristics (e.g. adherence) that are likely to be related to prognosis. The answer to signalling question 2.6 would be ‘No’.

Signalling question 2.6: How do we deal with a situation in which the trialists have reported adverse effects using only a per protocol analysis?

Answer: We would apply the signalling questions in the usual way. If you are assessing the effect of assignment (as advised for most reviews) then you will find that when you get to assessing the analysis, it will come out as high risk of bias, because the trialists have used a per-protocol analysis.

FAQs relating to Domain 3: Missing data

Signalling question 3.1: Is there a threshold that we can use to decide if we do not need to worry about missing data? The guidance suggests that the availability of 95% of the data for continuous outcomes will often be sufficient.

Answer: We do not want users to consider this example of 95% as a strict cut off. The potential impact of missing data upon the effect estimate will vary depending on the proportion of missing data, the type of outcome, and (for dichotomous outcomes) the risk of the event. There is a good example in the Handbook of a situation in which a small amount of missing data can make a large difference to a result, when an event is rare.

Signalling question 3.1: Do we interpret this question differently if we are considering the effect of adhering to an intervention, rather than the effect of being assigned to the intervention?

Answer: Failure to include all participants in the analysis for “Assessing the effect of adhering to the intervention” is covered in two domains:

1. Trial participants who were excluded by the trial investigators, for example, because they did not adhere to assignment are dealt with in Domain 2. “Deviations from intended interventions”. There are signalling questions specific to “Effects of adherence to intervention” for Domain 2. These are patients who switched treatment arm etc. See signalling questions 2.5 and 2.6.

2. For Domain 3 “Bias due to missing data” the signalling questions are the same whether we are assessing the “Effects of assignment to intervention” or “Effects of adherence to intervention”. This is because the circumstances in which missing data lead to bias are similar regardless of the effect of interest.

The guiding principle for Domain 3 is to consider “to estimate the effect I am interested in estimating (adherence or assignment) what data would I want to have AND are any missing in the analyses the trialists have presented?

For a per-protocol analyses e.g. an instrumental variable analysis is that it needs ALL the participants who were randomised in the trial. So, to estimate the per protocol effects, you need to check to see if there are any data missing from the analysis, as the trialists have presented it.

Signalling question 3.1: Does this question relate to data in general, or data from a given timepoint? (e.g. nine months post randomization)

Answer: The question relates to data missing at the specific time point at which the outcome you are assessing is measured.

Signalling question 3.1: Does the question relate to participants who were randomized but did not complete treatment (lost to follow up) or missing data from participants who completed treatment, or both?

Answer: It relates to both these types of missing data.

Signalling question 3.1: Is it acceptable for a trial to recruit more participants to compensate for participants who are lost to follow-up or will this cause bias?

Answer: It is common for trials to increase their planned recruitment size to allow for drop out. This has no specific implications for bias, although the reviewers will need to address that drop out in Domain 3.

Signalling question 3.1: How do you answer this question if you don’t know the extent of missing data (for example, if an effect estimate, but not the number of participants is reported?

Answer: It will be necessary to look at other sources of information to determine the answer, for example, the trial CONSORT diagram. A ‘Probably yes’ or ‘Probably no’ response may be the most appropriate.

Signalling question 3.2: Do you have lists of 1) analysis methods that correct for bias and 2) types of sensitivity analyses adequate to show that there is no relation between missingness in the outcomes and its true value?

Answer: It is not helpful to focus on methods per se. The important thing is to ensure that the sensitivity analysis entertains the full plausible range of outcomes that the missing participants could have experienced. This will always require judgement. There is no list of “analysis methods that correct for bias” since we cannot know if the analyses conducted will achieve this. There are a few that in general will not correct for bias (for example, last observation carried forward, and most multiple imputation approaches).

The trick is not to worry too much about the statistical methods used by the trialists – rather to focus on phrases like ‘we assumed that’ to see what assumptions they have made about people who were missing. You must be sure that they have looked at the sort of outcomes people might have had if they weren’t missing. This is where your knowledge of the area under review will be important: the trialists should tell you what assumptions they made about those people in a way that means something to you, even if you are not familiar with the statistical methods that are used.

FAQs relating to Domain 4: Outcome measurement

Overall Domain: Issues of participant blinding seem to be well addressed in Domain 2. However, what should we do if outcomes are self-reported (e.g.pain intensity) with the participant as the assessor: the same issues can be picked up in Domain 4. In the instance of unblinded participants where that might influence their experience with and engagement with an intervention (Domain 2) and their perception of the outcome and subsequent assessment of it (Domain 4) should we reflect that RoB in both domains?

Answer: Domain 2 covers deviations from intended interventions whereas Domain 4 is covering measurement of the outcome. These domains cover different aspects, so a trial should not be penalised twice for the same issue. Domain 2 addresses things that the participant does, or trial personnel do, that would change the true value of the outcome (e.g. the amount of pain present). Domain 4 addresses how the participant portrays the outcome (e.g. whether they misreport the pain they experience).

Overall Domain: How should we assess RoB when continuous outcomes such as "mechanical ventilation days" or "supplementary oxygen days" are measured only on subsets of participants, for example, only those requiring mechanical ventilation or only hospitalized participants? Is this related to bias due to missing outcome data or to bias in measurement of the outcome?

Answer: In an ideal world, the primary/critical outcomes should focus on effects within the randomised population, and the approach to subset outcomes should be specified at the protocol stage. This subset analysis is secondary: it does not include all the randomised patients and should be treated as such. You must present the results as exploratory, and they should be heavily caveated. It would also be important to assess this using all randomised participants – so a dichotomous outcome – did they need ventilation or not? Because you are breaking randomisation you might want to assess bias using a tool for non-randomised studies (e.g. ROBINS-I).

Signalling question 4.3: What do we do if the study paper says ‘double blind’ but we don’t have any other information?

Answer: This question is concerned with whether outcome assessors could be aware of the intervention. If there is no other information, then either use ‘NI’ or assume they were aware of the assignment.

Signalling question 4.5: Should I automatically answer ‘Yes’ to this question if outcome assessors were aware of the intervention received?

Answer: No. This question requires judgement as to whether outcome assessors were likely to be influenced by knowledge of what intervention a participant received. For example, were they a clinician investigating the effects of their own intervention?

FAQs relating to Domain 5: Selection of the reported result

Overall Domain: How do we approach this domain if we are using individual patient data (IPD) supplied by the triallists?

Answer: If reviewers have the IPD then Domain 5 will be low risk of bias. If neither the trialists nor the reviewers are selectively reporting the data and the trialists give the reviewers everything they ask for, there will be no risk of bias.

Overall Domain: Should we consider the availability/non-availability of a protocol more broadly in terms of bias, i.e. does it have wider significance that just Domain 5?

Answer: Unfortunately, not all trials have protocols available. The protocol is just a source of information about a trial, and often there are several sources of information. Sources of information do not always agree. They might not even agree between ‘Methods’ and ‘Results’ sections of a published report of the final trial results. Users of the RoB2 tool should make a judgement about each signalling question based on the totality of information available to them. There will obviously be less certainty in this judgement if a protocol is available and disagrees with the final report.

If you think non-availability of protocols is an issue within a particular review, you could always highlight it in the text for Risk of Bias and in the Discussion, and do a post-hoc sensitivity analysis by extra documents available versus trial report only.

Signalling question 5.1: How do we answer when we do not have enough details about the planned statistical analyses. For example, if we only have a trial registry entry or a protocol that does not include enough detail?

Answer: Details about analysis plans can be obtained from a variety of sources, but information may be insufficient. In this case the answer to this SQ will be ‘NI’. However, this may not necessarily mean that the answers to the other SQs in this domain should also be ‘NI’. We may have evidence that there was a problem in a trial, despite there being no analysis plan. Within the body of a trial report, we might see inconsistency between the methods reported and the results presented that reveals selective reporting, for example, trialists might state, “We only presented statistically significant results”. If we see evidence of bias, we should report it rather than stating 'NI'. In addition, it might work the other way and you may be confident enough, despite not seeing an analysis plan, to be very sure that this group of trialists would have followed through with their plan. However, if you want to make that judgement (that if there is no information for Signalling Question 5.1 and that all subsequent SQs for that domain are 'NI'), that's reasonable: use ‘Probably no’ and explain your reasoning.

Signalling question 5.1: How do we proceed if we know there was an SAP, but it is not available?

Answer: If you can’t access an analysis plan you can’t see whether the authors have followed it. So the fact that you know a plan was written does not help you to answer this question. Your judgement for this question will, by default (via algorithm rules), lead to 'Some Concerns' (or worse, if there is some evidence to support selective reporting). You could also contact the trialists to try to obtain the SAP. You will need to check the date stamps on the relevant documents to assess whether the plan was formulated before the result data were available to the trialists (and thus reporting could be influenced by knowledge of the result data).

Signalling question 5.1: We would like to answer ‘PN’ to this question as we assume that if there was a pre-specified protocol, this would be available. We are not using the information described in the statistics section of the trial reports for this judgement as this is likely not pre-specified and rather a description of what was done and not what was intended.

Answer: This is OK. If no protocol or pre-specified analysis plan is available to be found, you can answer 'NI' to 5.1, but your reasoning for ‘PN’ is also fine, given that NI/N/PN for 5.1 all lead to ‘Some Concerns’, so there is no difference.

Signalling question 5.1: Can we answer ‘PY’ to this question if we can infer that there was no bias, even if there was no plan available? For example, if we have a study that reports the outcome in a standard way, as described in the methods of an article, but there is no statistical analysis plan?

Answer: You could answer ‘PY’ if this is a standard way that you would expect in this clinical area, and you have no concerns regarding selective reporting. If it is obvious what the protocol/SAP was very likely to have said, you can make that call (as ‘Probably Yes’) even if you don’t have access to it.

Signalling question 5.1: We have specified that we would like to include a specific type of result from one of our outcomes in our meta-analyses. One of our included studies reports this result in exactly this way. However, the way that this result has been reported is at odds with the SAP for the trial. How should we answer the signalling questions?

Answer: It is important always to focus primarily on whether the result you have may be biased in relation to what you would like to have included in the synthesis from the study. If you have specific criteria for selecting a result, and perfect result is available to you, then it should not matter what else the authors have done and/or reported and you can assess the domain to be at low risk of bias. We address this in the last paragraph of Section 8.1.1 of the RoB 2 guidance document. In this situation you should answer ‘N’ to questions 5.2 and 5.3. The correct answer to question 5.1 is probably still ‘N’. However, this would be a legitimate situation in which to override the algorithm and call the risk of bias ‘Low’ for this domain.

Signalling Question 5.2: We have specified we are interested in the outcome ‘30% reduction in spasticity’ which is a distinct, well known, often used outcome in the Multiple Sclerosis field. In the protocol for an RCT the trialists state they will collect data on ‘30% reduction in spasticity’ and ‘50% reduction in spasticity’. However, in the write up of the review they present only on ‘30% reduction in spasticity’. How do we answer Signalling Question 5.2? Was the outcome selected from multiple eligible measurements?

Answer: For this question you would answer ‘No’ or ‘Probably no’. That is because the outcome of interest for you (as specified in the systematic review protocol is ‘30% reduction in spasticity’, and the trialists have provided this data. However, if you had wanted ‘50% reduction in spasticity’. then the answer to Signalling Question 5.2 would be ‘Yes’. The key to answering this question is consideration of what you have specified in your systematic review protocol as your outcome of interest. This is covered in the ‘Elaboration’ for SQ 5.2 in Box 11 of the detailed guidance.

Please see the Table for examples of how to think about bias in Domain 5. This general issue relates to the distinction between quality and bias, which we (reviewers) are used to in terms of tools to assess trials for systematic reviews. But now we are considering just bias. For bias we need to consider what are the results that we want and what are the results that we're taking and using, and what should we believe? If we get what we want, (what we pre-specified in our protocol) then we don't need to worry, and that's what the Table below portrays. For the RoB2 tool, it's about being much more targeted as reviewers in specifying what results/outcomes we want and assessing only that. We're interested in specific results, we are interested in bias in only those results and the fact that maybe the trialists have done some things that are less than ideal, like not reporting a sample size calculation or not reporting some of the results that we're not interested in, yes that is interesting, but not actually important for the results of our review, and not bias.

Systematic Reviewer wants (a priori)	Trial analyses	Trial reports	Reviewer uses	Is there a risk of bias?
Any measure of depression	HAM-D (P=0.01), Beck (P=0.1), mHAM-D (P=0.2)	HAM-D	HAM-D	Yes
HAM-D only	HAM-D (P=0.01), Beck (P=0.1), mHAM-D (P=0.2)	HAM-D	HAM-D	No
Beck only	HAM-D (P=0.01), Beck (P=0.1), mHAM-D (P=0.2)	HAM-D	nothing	RoB 2 not applicable (bias in M-A: covered by RoBME tool)
HAM-D if available, then Beck, then mHAM-D	HAM-D (P=0.01), Beck (P=0.1), mHAM-D (P=0.2)	HAM-D	HAM-D	No
Beck if available, then HAM-D, then mHAM-D	HAM-D (P=0.01), Beck (P=0.1), mHAM-D (P=0.2)	HAM-D	HAM-D	Yes

Table: Assessing selective outcome reporting bias/ Missing Evidence. Table by Julian Higgins, University of Bristol; originally presented during the RoB2 Web Clinic series, 2020.