What is an IPD meta-analysis?
Is an IPD meta-analysis a systematic review?
What are the advantages?
What are the disadvantages?
When is the collection of IPD appropriate?
Are IPD reviews used in all areas of healthcare?
Are IPD more advantageous for certain disease areas?
How are IPD meta-analyses organised?
How long does it take to do an IPD review?
What skills are required?
How much does it cost?
Do I need to write a protocol?
How do I invite trialists to collaborate?
What data should I ask the authors to provide?
What format should I ask for data to be in?
What if the IPD are not computerised?
Can you get IPD from unpublished trials?
How should I store the data: do I need special systems [software/hardware]?
What data checking should be done?
What if my analysis of a trial is different to the published result?
What software can I use to analyse IPD?
How do I include an IPD meta-analysis in RevMan?
How do I include subgroup analyses by participant characteristics derived from IPD in RevMan?
Can we use IPD to explore interactions involving treatment effect, such as interactions with time or with individual baseline characteristics?
What if IPD is only available from some trials?
Do I need to hold a collaborators’ meeting?
Why are IPD reviews often published in group names?
Are there issues concerning participant consent?
Do you need consent to use data for research other than review questions e.g. methodological research?
How does the IPD approach compare to extracting data from publications?
How does the IPD approach compare to collecting tabular summary data?
These projects involve the central collection, validation and re-analysis of “raw” data, from all clinical trials, world-wide, that have addressed a common research question; obtained from those responsible for the original trials.
Yes, a good quality IPD meta-analysis should be done in the context of a systematic review. The overall philosophy is the same as for other types of well-designed and well-conducted systematic review and the same basic methods should apply. The methodology should differ only in terms of the organisational structure, data collection and analysis.
However, be aware that not all meta-analyses that use IPD are part of a systematic review. There are some reported in the literature that simply pool IPD from trials in an ad hoc way.
There are advantages to be gained both from the nature of the data itself, and from the processes involved in reviewing evidence as part an international multi-disciplinary team. The main reasons for adopting the IPD approach by the various groups that have done so, relate mostly to improving quality of the data and/or the quality of the analysis. Data quality can be improved through the inclusion of all trials and all randomised participants and detailed checking. Participant level data also allows more comprehensive and appropriate analyses such as time-to-event and subgroup analyses. The collaborative nature of the projects may help achieve a more global and balanced interpretation of the meta-analysis results as well as providing a basis for future collaborations on primary research.
IPD reviews are likely to be more time consuming, because of the time and effort required to establish collaboration, negotiate the provision of data, collect and check data and organise a collaborators meeting. These same tasks require a greater range of skills than other forms of systematic review. However, technological progress has enabled some of the labour-intensive aspects of IPD meta-analysis to be done more easily, quickly and cheaply than in the past. In current projects the vast majority of data is sent in electronic format often by e-mail, which reduces both the time taken to transfer data and the effort involved in assembling the meta-analysis database. Furthermore, it easier to follow through any queries regarding the data by e-mail. Software advances have also meant that data is now much more easily transferred between different types of database package, such that the format in which data is supplied is seldom a problem. Hosting and funding a collaborators’ meeting is not usually a feature of other types of systematic review and adds to the cost of IPD reviews. Lack of commercially available software to carry out the full range of analyses and plotting is an additional barrier.
Whenever a systematic review that requires meta-analysis is appropriate, then an IPD approach can be considered.
Deciding whether to adopt the IPD approach in a particular situation will depend on the methodological factors that are likely to influence the results, together with any resource and time constraints. In any systematic review an active decision about which is the most appropriate approach should be made at the outset.
For some questions, the need to carry out time-to-event analyses, the potential value of participant subgroup analyses or the need to combine data that has been recorded in different formats, for example, will give a strong indication that an IPD meta-analysis would be the best way forward. In other cases where there are no such requirements, the IPD approach is less likely to be helpful and would probably not be cost-effective. For example, where the trials are comprehensively reported, outcomes are simple and well defined, planned analyses are univariate or simple or participant characteristics are not of particular interest.
In medicine, IPD meta-analyses have an established history in cardiovascular disease and cancer, where the methodology has been developing steadily since the late 1980’s. In cancer, for example, there are currently more than around 50 IPD meta-analyses of screening and treatment across a wide range of solid tumour sites and haematological malignancies. More recently, IPD has also been used in systematic reviews in a number of other fields including Alzheimers disease, dyspepsia, epilepsy, malaria, HIV infection, hernia and perinatal medicine.
IPD is particularly important for chronic and other diseases where treatment effects may depend on the length of follow-up. This is especially the case where there are risks and benefits that vary differently over time. For example, surgical treatments with short term high risk but long-term benefit.
They are usually carried out as collaborative projects whereby all trial investigators contributing information from their studies, together with those managing the project, become part of an active collaboration. The projects are usually managed on day-to-day basis by a small local organising group or secretariat, which may be aided in important and strategic decision making by a larger advisory group. Such an advisory group would usually comprise clinical and possibly statistical or methodological experts relevant to the question addressed in the meta-analysis. The secretariat will usually also organise a meeting of all collaborators, to them together to discuss the preliminary results. All publications and presentations are usually made in the name of the collaborative group.
Ultimately this will depend on the resources available and the size and complexity of the question being addressed. However, typically from initial development of the protocol through to final publication of the results takes 2-3 years. Usually there will be an intensive developmental phase, a less intensive and more protracted data collection phase, and a more intensive phase around the time of the analysis and collaborators’ meeting. Therefore, it is possible for a group with the required range of skills, to run several such projects concurrently.
A range of skills is required to carry out a good quality IPD meta-analysis. These include clinical expertise on the question posed and methodological expertise including knowledge of the IPD process. It is possible that individuals, acting in an advisory capacity, can provide some expertise. However, it is essential that the individual or team carrying out the project on a day-to-day basis include administrative, data handling, computing, statistical and scientific research skills. Perhaps most vital of all is that the team are excellent communicators.
It is very difficult to give a precise figure as costs vary greatly between projects, depending on size and complexity. The most expensive cost is the staff time to manage the project but budget is also required to cover the following costs.
Protocol production (> 1 per trial)
Travel to discuss project with reluctant trialists or to aid data retrieval
Secretariat/Advisory Group meetings
- Conference room hire / facilities
- Conference folders / packs, copies of results etc.
- Meals and refreshments
- Travel costs for trialists
- Reprints of publication (several copies for each collaborator).
A paper published in 1995 [link to Stewart & Clarke Stats in Med] estimated costs at about £1000 (sterling) per trial or £5-£10 per participant included in the IPD meta-analysis excluding the costs associated with a Collaborators’ meeting. It acknowledged however, that these figures were derived retrospectively, and so are very approximate and likely to vary greatly.
Yes, a detailed protocol should be prepared, setting out the specific questions to be addressed, trial inclusion and exclusion criteria, the methodological approach to be taken and analyses that are planned.
Initially, trialists are probably best approached by a letter inviting them to participate. This should summarise the background, aims, trial(s) your and interested in, the data you wish to collect and perhaps statements about data confidentiality and your policy for publishing the results. The IPD meta-analysis protocol should be included with the invitation and it is often useful to include a reply form(s) to obtain more detailed information on what data they are willing and able to provide.
At the very least, the following data are required along with a clear list of the coding system and variable definitions used. This is a very general guide that should be thought through as the data required will be different in different situations:
Date of randomisation
Date of event (if time to event outcome) or time taken to achieve the event
Censoring variable (if time to event outcome)
Covariate data of interest / Baseline measurements (e.g. stage of disease, type of epilepsy, age of participant, number of seizures at entry)
IPD from each trial should be requested in a format that is easiest for you to access and manage. The most appropriate format is likely to depend on the software that will be used for data management, data checking and analysis of the IPD. It is usually beneficial to provide trialists with a document setting out the coding that you would prefer for each variable collected. Unfortunately, it is unlikely that IPD will arrive in the requested format from every trial. For example, it may not be possible for some software packages to store the data in such a format or perhaps the trialist does not have the facility required to translate the data. You should always be prepared to accept data in another format and make this suggestion clear to the trialist. Although this can mean additional work, IPD in some format is more useful than no IPD at all! This should be emphasised, as some trialists may be unenthusiastic if additional work is required to reformat the data.
If IPD are not already computerised, the trialist may be willing to provide a copy of the data in paper format. The data required should then be entered into an appropriate data management package and double checked by a second person. This process may be lengthy and could be expensive.
The process of requesting and collecting IPD is still applicable to unpublished trials. However, obtaining IPD from recent unpublished trials is potentially less successful than obtaining IPD from published trials as the trialists may want to publish the results themselves first. Older trials may not have been published due to problems with the trial itself and the trialists may therefore be reluctant to part with the data. On the other hand some trialists welcome the opportunity for data from their (usually older) trials to enter the public domain through the meta-analysis and, in doing so, for them to gain a publication from the trial.
Data can be effectively managed using standard database management software such as Access or Foxpro or indeed in flexible statistical software such as SPSS, SAS or Stata. You are probably best to choose a system that you are familiar with. Checking the data involves descriptive, graphical and statistical analysis and so generally requires statistical packages software such as SPSS, SAS or Stata. RevMan is not suitable for this purpose.
The main aims of data checking procedures are to ensure the accuracy of data, integrity of randomisation and completeness of follow up. For any one trial the results of all the data checks should be considered together to build up an overall picture of that trial and any associated problems. Where there are concerns about the data supplied, these should be brought to the attention of the trialist and sympathetic efforts made to resolve them.
Range and consistency checks should be carried out for all data irrespective of whether they were supplied electronically or were entered manually into the meta-analysis database (in which case it is important to audit the data entry process). Any missing data, obvious errors, inconsistencies between variables or extreme values should be queried and rectified as necessary. If details of the trial have been published, these also should be checked against the raw data and any inconsistencies similarly queried.
To check the validity of the treatment assignment process, the distribution of participant-related variables can be checked for balance across treatment arms and across major baseline characteristics. It is, however, important to remember that imbalances may occur by chance alone especially for non-stratified variables and when trials are small. Other things that can be done include checking that the weekday of randomisation fits the expected pattern. For example, for UK cancer trials we would expect very few randomisations at the weekend. The pattern of randomisation can also be checked by producing simple plots of cumulative accrual, for example we would expect a trial with a 1:1 assignment to show the numbers allocated to each treatment to be close throughout and cross frequently. Where survival (or another time-dependent variable) is the primary outcome it may be important to check that trial follow up is as up-to-date as possible and that it is balanced across treatment arms. Balance can be checked by selecting all participants outcome-free and using the date of censoring as the event to carry out a "reverse survival" analysis. This produces censoring curves, which should be the same for all arms of the trial. Any imbalance should be brought to the attention of the trialist and updated information should be sought.
All of the changes made to the data originally supplied by the trialists, and the reasons for these changes, should be recorded. As a final stage of checking, each trial should be analysed individually and the trialist sent a copy of the analyses and tables of participant characteristics, together with a printout of their data as included in the meta-analysis database. This allows the trialist to verify that the data being used from their trial are indeed correct.
Any disagreement between the IPD result and the original published result should be investigated. One explanation could be that further data were collected for some participants after the original trial results were published. Alternatively, the original published analyses may be based on fewer participants because exclusions were made for some reason. Such discrepancies should always be followed-up with the trialists.
There is no commercially available software that provides the full range of analyses and plotting that are desirable for IPD meta-analyses. Most groups have therefore developed their own applications to analyse the results and produce the graphical output, usually by customising existing packages such as SAS. RevMan is not able to analyse the data on each individual participant.
The IPD project should be set up in RevMan in the same way as for any other systematic review. However, you cannot use RevMan to analyse the data on each individual participant. It has, however, two options for dealing with summary statistics derived from IPD. To use either requires prior processing and analysis of the IPD to provide the appropriate trial level summary statistics that can then be entered to RevMan.
The individual participant data outcome was historically developed to allow inclusion of cancer reviews and so is geared toward time-to-event analyses and the calculation of the appropriate effect measure, the hazard ratio (HR). You need to enter the log rank o-e and variance for each trial. Although you also enter the number of participants and events, these are for displaying on the plot and not for calculation. Note though, that when you have entered this data you need to select the “SAVE” button. This will ensure that RevMan uses the o-e and variance data that you have entered to generate individual trial and pooled HRs. Do NOT use the “CALCULATE” button! This will instead produce calculations of odds ratios, which are not appropriate for time-to-event data.
The newer generic inverse variance outcome type is more flexible in that it allows for a whole range of outcome types (binary, continuous, time-to-event) derived from summary data or IPD. In this case you need to indicate the type of effect measure and enter the estimate of the effect and the standard error of this estimate for each trial. Again you can enter the number of participants and events so that they are displayed on the plot.
Once again, prior processing of the IPD to provide the appropriate summary statistics for each participant subgroup (stratified by trial) is necessary. Then, you can use either the IPD or generic inverse variance outcome type. The next bit is fiddly in that you essentially need to treat each subgroup of participants as though it were a study. Therefore, in References to studies, Included studies create a ‘study’ for each subgroup of each covariate e.g. age <50 and age >50 for covariate age, but leave the citations empty. Then in Comparisons and data set up a comparison and call it e.g. subgroup analyses. Add the IPD or generic inverse variance outcome type and label appropriately e.g. survival by age. Under survival by age, add ‘studies’ (i.e. age <50 and age >50). Then, enter the summary statistics for each ‘study’ (i.e. age group) as described in 22. The subgroup analyses for each covariate will then be displayed on and individual forest plot. N.B. When adding the outcome, you can turn the sub-totals and totals off on the statistical tab, if you wish.
If you want all the subgroup analyses for a particular outcome to appear on the same plot then you need to treat all the covariates as though they were sub-categories. Create ‘studies’ for each subgroup of each covariate as before. Then in the Comparisons and data set up a comparison and call it e.g. subgroup analyses. Add the IPD or generic inverse variance outcome type and label appropriately e.g. survival. Under survival, add a sub-category age and under age add ‘studies’ (i.e. age <50 and age >50). Then, enter the summary statistics for each study (i.e. age group) as described in 22. N.B. As the overall total from all subgroup analyses will be meaningless, it is strongly advised that you turn this off when adding the outcome.
A particular problem of using RevMan in this way is that all the participant subgroups will appear in the table of included studies. However, because the table is in alphabetical order according to the study label, you can ensure at least that the subgroups are listed at the bottom of the table, by using a prefix from near the end of the alphabet for each subgroup label. For example, for age: xage <50, xage >50 and for sex: xmale male, xfemale.
Can we use IPD to explore interactions involving treatment effect, such as interactions with time or with individual baseline characteristics?
The power to detect an interaction involving treatment effect is typically low in a single RCT. One advantage of meta-analyses involving IPD is that the power to detect interactions is usually greater. It is possible to perform analyses stratified by some factor and for calculating the statistics required to test for the presence of an interaction. However, the usual problems of defining characteristics of clinical importance and the potential for ‘data dredging’ are still highly applicable in the meta-analysis situation. Interactions of potential clinical importance should be described and justified in the protocol. The assumption of constant relative treatment effect with time (the assumption of proportional hazards with time to event data) can be examined most reliably if IPD are available.
If IPD are only available from some trials, attempts should be made to extract appropriate aggregate data from those trials without IPD. These data can then be used to check whether the results from the unavailable trials differ from those that provided IPD. If sufficient aggregate data are not available and authors are not willing to provide tabulated data, steps should be taken to check whether those trials without IPD have similar characteristics to those for which IPD are available. For example, are the trials similar with respect to quality components and important clinical characteristics? Provided the proportion of participants in the eligible trials without IPD is not too large, and if characteristics are similar, then the meta-analysis results using IPD are likely to be representative of all eligible participants. The potential for bias should be discussed when interpreting the meta-analysis results. Reasons for lack of IPD should be given wherever known.
A collaborators’ meeting has been an integral and important feature of many of the IPD meta-analyses that have been completed to date. These meetings are where the results of IPD meta-analyses are first presented and discussed, and at which the trialists play a full and active role. Such meetings can act as an incentive to collaborate at the early stages of the project, serve to re-emphasise the collaborative nature of the project, provide valuable clinical input and give the whole group a chance to discuss or challenge the analyses. They also provide a great opportunity to develop new and collaborative projects and give the secretariat and trialist’ a deadline to work to. As well as these scientific functions the social aspects of these meetings cannot be underestimated in generating enthusiasm and commitment to the project.
The group name on publications is usually a banner for the whole collaborative group, with members of the group and participating organisations listed in an appendix to the text. This acknowledges the contribution of all the collaborators in the project.
In most cases participants will not have specifically consented to inclusion in the meta-analysis. However, as the meta-analysis is posing the same research question as, and is essentially updating the trial they did consent to, the usual view is that separate consent is not required. However, it is advisable that data received are anonymised.
Do you need consent to use data for research other than review questions e.g. methodological research?
If the IPD are to be used to address additional research questions, it is advisable to send a separate document outlining the proposed research project with a request for permission to use the data in this way.
IPD meta-analyses require the collaboration of trialists, are generally a great deal more work and take longer. However, this approach increases the reliability of results, and allows a much greater variety of analyses to be done. The interaction with the trialists helps in the identification of all relevant trials, improves the quality of information on each trial, and can provide additional follow-up.
Collecting tabular data from trialists gives some of the advantages of the IPD approach in terms of improved trial identification, potentially improved data and analyses, as well as the opportunity to collect information from unpublished trials. However, they do not afford the same variety and flexibility of analyses as IPD (this may not always be important), and scope for data checking is limited. Furthermore, it is often simpler for trialists to supply their whole database than a large number of special tabulations. There are unlikely to be major time or resource differences between these two approaches (other than those relating to the trialists’ meeting).