MS-DIAL provides a pipeline for untargeted metabolomics. Here we will discuss the process for generating intensity height tables from raw LC-MS data, and subsequent export for downstream analysis and secondary annotation methods.
As of MS-DIAL version 4.70, it is no longer a requirement to convert your .RAW LC-MS files into .ABF format, however if you are using an older version of MS-DIAL, you will need to download the ABF file converter.
- ABF File Converter (depending on MS-DIAL version).
- MS-DIAL software for Windows.
- MassBank MS/MS positive and negative database files.
- The LC-MS files all present in a single folder.
- The LC-MS standards file generated during data collection (should contain columns for: metabolite name, m/z, and retention time).
The instructions given here are for MS-DIAL version 5. If you are using version 4, please see the legacy instructions provided here.
Begin by opening MS-DIAL and starting a new project using the icon in the centre of the screen.
MS-DIAL will prompt you to enter a Project title
and a Project file path
. Please remember that the project file path must contain all of your raw LCMS files – you can navigate to this folder using the 'Browse'
button.
It is recommended that you change the default .mdproject
file name generated by MS-DIAL to something more easily recognisable in the future,
such as date_sampletype_ionisationmode.mtd
.
The next screen will ask you to define your 'Analysis file paths'
. Click the browse button, and select all of your LC-MS files.
Note that these must be located in the same folder as your project in order to be valid.
Once you have done this, you will need to change the 'Type'
column to indicate whether that sample is a sample, blank, QC, or standard.
It is also recommended, for quick analysis later on, to change the values in the 'Class ID'
column.
For example, in the 'Class ID'
column below, it is indicated what the sample type is (if you have multiple groups, you can specify these here),
and of note, while MS2
samples are given a 'Type'
of QC
, you can specify that they are their own class in this column.
If you were provided with information about the order in which your samples were actually run, you can input this in the Analytical order
column. This will allow retention time correction. However, as the majority of our annotations will be provided by the Human Metabolome Database (HMDB), which does not take retention time into account, this step will be less important downstream. One benefit however will be for the alignment of peaks in the event you have significant retention time drift over the course of data acquisition.
Once you have finished, click 'Next'
to continue.
The next screen will ask you to select various parameters regarding how your data was acquired.
Firstly, you can define a project name. It is recommended that you change the default .mddata
file name generated by MS-DIAL to something more easily recognisable in the future,
such as \date_sampletype_ionisationmode.mtd
.
For our data, most of the default options will be appropriate.
- Data type for both
MS1
andMS2
should be set toProfile data
for our data, acquired using a Thermo Fisher Scientific LCMS apparatus. - Ensure you choose the correct
'Ion mode'
at the bottom left, either positive or negative ionisation. and that you select the correct'Target omics'
from either metabolomics or lipidomics.
Click Next
once complete.
If you have a parameter configuration file, you can load it in via the Load parameter
button in the bottom-left of the window.
We begin the setting of analysis parameters by inputting the data collection parameters.
Mass Accuracy:
Metabolomics | Lipidomics | |
---|---|---|
MS1 tolerance | 0.002 Da | 0.002 Da |
MS2 tolerance | 0.002 Da | 0.002 Da |
Data collection parameters:
Metabolomics | Lipidomics | |
---|---|---|
Retention time begin | 0 min | 0 min |
Retention time end | 40 min | 40 min |
MS1 mass range begin | 50 Da | 200 Da |
MS1 mass range end | 1000 Da | 1300 Da |
MS/MS mass range begin | 50 Da | 200 Da |
MS/MS mass range end | 1000 Da | 1300 Da |
Execute retention time correction | FALSE |
You can choose here to perform retention time correction. This should be set to FALSE otherwise, and set to FALSE if you do not have information about the analytical order of your samples.
Isotope recognition:
Metabolomics | Lipidomics | |
---|---|---|
Maximum charged number | 2 | 2 |
Consider Cl and Br elements | FALSE | FALSE |
Multithreading (will depend on your machine):
Metabolomics | Lipidomics | |
---|---|---|
Number of threads | 8 | 8 |
Execute retention time corrections | FALSE | FALSE |
Next we select the minimum peak height threshold. Peaks below this threshold will not be retained.
A value of 100,000
is recommended for data acquired by Thermo Fisher Scientific machines.
However, this will vary by apparatus, and may require data-dependant tuning.
We will leave the 'Mass slice width'
value to the default, along with all options in the drop-down 'Advanced'
menu.
Metabolomics | Lipidomics | |
---|---|---|
Minimum peak height | 100000 | 100000 |
Mass slice width | 0.05 Da | 0.05 Da |
The default values are suitable.
Metabolomics | Lipidomics | |
---|---|---|
Sigma window value | 0.5 | 0.5 |
MS/MS abundance cut off | 0 | 0 |
Exclude after precursor ion | TRUE | TRUE |
Keep the isotopic ions until | 5 Da | 5 Da |
Keep the isotopic ions w/o MS2Dec | FALSE | FALSE |
In this tab, you will add your databases as appropriate. One of these will be the appropriate .msp
MassBank database file (either positive or negative depending on the ionisation mode you are currently running).
Click the green tick next the Database setting
heading. In the DataBase
box, select Msp
as your database type and then navigate and select the appropriate MassBank file using the Browse
button. The Database name
and other values will be automatically populated.
We will now adjust a number of parameters (the lipidomics columns are left blank below as you will not need to input or change anything).
Keep in mind that the Retention time tolerance
parameter can be adjusted in a project-specific manner. Using a value of about 2 minutes has worked well in the past, but may require fine-tuning. Setting the value lower will result in a greater number of highly similar features, and may result in issues downstream during manual curation steps.
MS/MS identification setting:
Metabolomics | Lipidomics | |
---|---|---|
Accurate mass tolerance (MS1) | 0.002 Da | |
Accurate mass tolerance (MS2) | 0.002 Da | |
Retention time tolerance | 2 min |
We will leave the other drop-down menus with their default settings.
In addition, you can select a text file containing the standards run during data acquisition.
This file should contain three columns, including the metabolite name, m/z, and retention time (in that order).
These columns should be named Metabolite
, MZ
, and RT
.
Click the green tick again to add another database. This time, choose Text
as your database type. Navigate and select your standards file, and the other fields will be populated.
Once again, we will change a few parameters:
Tolerance:
Metabolomics | Lipidomics | |
---|---|---|
Accurate mass tolerance (MS1) | 0.002 Da | |
Retention time tolerance | 2 min |
Here we will select the appropriate adduct ion settings for our runs.
Metabolomics Positive | Metabolomics Negative | Lipidomics Positive | Lipidomics Negative |
---|---|---|---|
[M+H]+ | [M-H]- | [M+H]+ | [M-H]- |
[M+NH4]+ | [M+Na-2H]- | [M+NH4]+ | |
[M+Na]+ | [M+Cl]- |
Here, you should rename the 'Result name:'
to something more recognisable.
For example, alignment_data_sampletype_ionisation
.
Set the 'Reference file:'
to your second QC sample. The first QC sample is typically different than the others,
and this will affect your results.
The 'Retention time tolerance:'
parameter will be data-dependant, however a good start is to try either 2 min
or 1 min
.
Metabolomics | Lipidomics | |
---|---|---|
Retention time tolerance | 2 min | 2 min |
MS1 tolerance | 0.002 Da | 0.002 Da |
There are also some other parameters we need to check in the 'Advanced'
drop-down menu.
Because we don't remove the features based on blank information, the following three tick boxes are 'greyed out', and not available to
change. You do not need to change the default options for these.
We will use Gap filling by compulsion
, as it improves peak alignment (often resulting in fewer highly similar features), but you may choose to skip this step. The choice will largely be data-dependent, and with a higher retention time tolerance, you may not see much of a difference.
Metabolomics | Lipidomics | |
---|---|---|
Retention time factor | 0.5 | 0.5 |
MS1 factor | 0.5 | 0.5 |
Peak count filter | 0 % | 0 % |
N% detected in at least one group | 0 % | 0 % |
Remove features based on blank information | FALSE | FALSE |
Sample max / blank average | 5 | 5 |
Keep 'reference matched' metabolite features | FALSE | FALSE |
Keep 'suggested (w/o MS2)' metabolite features | FALSE | FALSE |
Keep removable features and assign the tag | FALSE | FALSE |
Gap filling by compulsion | TRUE | TRUE |
Once you have finished setting up the analysis parameters, click 'Finish'
to begin the run.
When the run finishes, the data will appear on the screen. On the left-hand side of the screen, you will see an 'Alignment navigator'
box.
Double click the file that is inside to load all of your data at once.
The 'Show ion table'
button in the middle of the screen is a good place to start to see how many metabolites/lipids (features) were found during the run.
You can click the 'Metabolite name'
column to group all features that were annotated.
If you want to view more information about a specific feature, you can single click on the row within the ion table list.
At the top of the screen, the default image shown is the 'Bar chart of aligned spot (OH)'
. This will show you the average intensities
of each of your 'Class IDs'
.
To view the actual peak itself, you can click the adjacent 'EIC of aligned spot'
tab to view and assess the quality of the peak.
Good quality peaks, especially for smaller weight features, are tight and well-aligned (see the image below).
Next, if you want to see the individual peaks for each of your samples, you can right-click on the peak window and click the
Peak curation (Sample table)
option. This will show you the peak for that feature for each sample so you can further
assess their quality. This will be necessary for manual curation of features later to confirm confidence in the annotation of significant findings.
After downstream pre-processing, you will come back to MS-DIAL and manually curate each of the features, removing those with poor-quality spectra. It should be noted that without manual curation, your ordination plots (e.g. PCoA) will be affected by remaining poor quality features.
There are a few useful things we can routinely export:
- The raw data height matrix (contains all of our intensity values and feature information).
- Analysis parameters (to streamline future analysis set-up).
For consistency and easy import into R, best practice is to save the height matrix output in .csv
format.
To export the raw height matrix table, navigate to the 'Alignment result'
option within the 'Export'
tab in the top menu.
Select the directory for export and leave all other options to their defaults. Change the Export format
to csv
, and click 'Export'
.
To export the analysis parameters, open the green tab at the top-left of the screen with the new project icon, select Save
, and then select the Save parameter option
.
To pre-process our raw height data within R, we will use tools from the pmp
package.
These steps are explained in depth within our pmp preprocessing guide.
A selection of custom scripts are also available to handle the pre-processing steps for you, with all of the customisation and fine-tuning normally available when running the steps individually.
We can use additional tools to annotate more of our metabolite features than MS-DIAL can achieve alone.
Similarly we can acquire secondary annotation of our MS1 data using the Human Metabolome Database (HMDB). Unlike secondary MS/MS annotation with GNPS, this step occurs following pre-preprocessing within R.
See our guide for annotation of your MS-DIAL data in R using the HMDB database here.
Now that we have a SummarizedExperiment
object, filtered for features with at least one annotation, we need to manually curate the spectra within MS-DIAL. While the pmp
pre-processing pipeline does a good job at filtering your LCMS datasets for the best quality data, it is not able to discern the quality of the spectra directly. It is important to manually curate the data before continuing with downstream analysis, as the poor quality data will affect ordination and statistical tests.
The most efficient method for manual curation is to save the alignment ID, names, ionisation mode, Fill %, and signal/noise ratio information element from your SummarizedExperiment
object, for example (using the function save_curation_table()
):
# Save the curation table
save_curation_table(metab_stool_glog, here::here('data', 'manual_curation', 'stool_curation.csv'))
Now that we have our stool_curation.csv
file, we can import it into Google Sheets and add an additional column named quality_peak
that is filled with tickboxes.
Now, when we return to MS-DIAL, we can check the peaks using the Alignment_ID
values as our guide, and check the tickboxes of those that are accepted.
Once we have completed this task for both the positive and negative ionisation modes, we can save the Google sheet as a .csv
file, and import it into R. Thankfully, it is very easy to filter SummarizedExperiment
objects, and we can do this using our quality_peak
column.
# Load quality data
metab_stool_quality <- read_csv(here::here('data', 'manual_curation', 'stool_curated.csv'))
# Filter the features using the TRUE/FALSE quality_peak column
metab_stool_glog <- metab_stool_glog[metab_stool_quality$quality_peak,]
Once you have reached this point, you should have a SummarizedExperiment
object (complete with sample and feature metadata) that contains only annotated features, with intensity values that have been normalised and transformed, and are ready for downstream analysis in R.
Please visit the downstream_processing
page for tools and a starting point for downstream analysis.
- Copyright (c) 2021 Respiratory Immunology lab, Monash University, Melbourne, Australia.
- MS-DIAL: link
- License: This pipeline is provided under the MIT license (See LICENSE.txt for details)
- Authors: M. Macowan, C. Pattaroni, A. Butler, and G. Iacono.