Loading Spectrum Data

Spectrum data loaders implement the same methods as Chromatogram Data Loaders as well as some additional methods since more information can be gathered from spectrum data loaders. Fetching raw data with spectrum loaders takes more time since data is extracted on the fly. Additionally TargetedDIAConfig must be specified to instruct how the peptide should be extracted.

Initiating a Spectrum Data Loader

Most Spectrum Loaders require the following inputs.

  1. dataFiles - a list of raw data files

  2. rsltsFile - a .osw or DIA-NN .tsv file containing the features

  3. libraryFile - a .tsv/.osw/.pqp file contaning the library (m/z and annotations of all transitions)

We can initiate a MzMLDataLoader object with follows.

[3]:
from massdash.loaders import MzMLDataLoader
loader = MzMLDataLoader(dataFiles="mzml/ionMobilityTest.mzML",
                        rsltsFile=["osw/ionMobilityTest.osw", "diann/ionMobilityTest-diannReport.tsv"])
Initializing valid scores for selection
[2024-09-30 17:29:26,200] MzMLDataAccess - INFO - Opening mzml/ionMobilityTest.mzML file...: Elapsed 0.08319735527038574 ms
[2024-09-30 17:29:26,201] MzMLDataAccess - INFO - There are 50 spectra and 0 chromatograms.
[2024-09-30 17:29:26,202] MzMLDataAccess - INFO - There are 25 MS1 spectra and 25 MS2 spectra.

Note

If only a DIA-NN file is provided, a library must also be provided. If no library file is provided, MassDash will assume the .osw file should also be used as the library. The library is required for determining the extraciton coordinates.

For the purpose of this tutorial we will be using the OpenSwath results this approach will work with any properly initiated MzMLDataLoader.

Note

If a .osw file is provided as a rslts file and no library file is provided, MassDash will assume the .osw file should also be used as the library.

Loading a Transition Group

To fetch the chromatograms for a particular transitionGroup, we can call the loadTransitionGroups() method. In addition to the modified peptide sequence and charge state, this method also requires a TargetedDIAConfig which specifies the extraction parameters and will load the transition groups across all runs. This method can take a while since it is fetching the data across all experiments from disk.

In this example we will visualize the peptide NKESPT(UniMod:21)KAIVR(UniMod:267) with a charge state of 3

First, we can create a TargetedDIAConfig.

[4]:
from massdash.structs.TargetedDIAConfig import TargetedDIAConfig
extraction_config = TargetedDIAConfig()
extraction_config.im_window = 0.2
extraction_config.rt_window = 50
extraction_config.mz_tol = 20

Then we can invoke the loadTransitionGroups() method with the target sequence, charge and extraction config. Note: the extraction will always occur

[5]:
transitionGroup = loader.loadTransitionGroups("AFVDFLSDEIK", 2, extraction_config)
transitionGroup
[5]:
TransitionGroupCollection
ionMobilityTest: -------- TransitionGroup --------
precursor data: 1
transition data: 6
data type: Chromatogram
[6]:
type(transitionGroup)
[6]:
massdash.structs.TransitionGroupCollection.TransitionGroupCollection

Like the ChromatogramLoaders, a :py:class`~structs.TransitionGroupCollection` is loaded which is a dictionary is returned where the file keys are the the runname and the values are a TransitionGroup. The TransitionGroup holds a series of chromatograms belonging to the same precursor. This TransitionGroup object can be used for plotting.

Loading Chromatogram Data as a Pandas DataFrame

Like Chromatogram Data Loaders, data can be loaded into a pandas dataframe using the loadTransitionGroupsDf().

[7]:
transitionGroupDf = loader.loadTransitionGroupsDf("AFVDFLSDEIK", 2, extraction_config )
transitionGroupDf
[7]:
run Annotation rt int
0 ionMobilityTest prec 6225.005106 229.011734
1 ionMobilityTest prec 6226.792950 26.001631
2 ionMobilityTest prec 6228.580932 57.999416
3 ionMobilityTest prec 6230.367189 826.008179
4 ionMobilityTest prec 6232.156436 1589.015259
... ... ... ... ...
163 ionMobilityTest y9^1 6259.292755 4355.988281
164 ionMobilityTest y9^1 6261.101406 1168.029907
165 ionMobilityTest y9^1 6262.909095 1286.014038
166 ionMobilityTest y9^1 6264.711573 413.995209
167 ionMobilityTest y9^1 6266.515136 1217.012207

168 rows × 4 columns

This dataframe has all of the intensities and retention times for all of the files across all transitions. Transitions can be diffrentiated by the annotation column and the run column diffrentiates the run in which the chromatograms originate from. If ion mobility was present in the original file, intensities are summed across all values of ion mobility.

Note

If a pandas dataframe is required it is recomended to use the FeatureMap object directly as described below.

Loading a Feature Map

The primary datatype that is fetched from a FeatureMap which is contains a pandas dataframe of the extracted chromatogram across all precursors and transitions. Thus, under the hood, the loadTransitionGroups() method is fetching a TransitionGroup and converting it into a FeatureMap. Due to this conversion step, if a pandas dataframe is required, it is generally faster to work with the FeatureMap directly.

The FeatureMap object can be loaded using the loadFeatureMaps() method as demonstrated below.

[8]:
featureMap = loader.loadFeatureMaps("AFVDFLSDEIK", 2, extraction_config)
print(type(featureMap))
featureMap
<class 'massdash.structs.FeatureMapCollection.FeatureMapCollection'>
[8]:
{'ionMobilityTest': <massdash.structs.FeatureMap.FeatureMap at 0x75219b42d430>}

Simmilar to the loadTransitionGroup() method this method returns a :py:class`~structs.FeatureMapCollection` where the keys are the runnames and the values are the corresponding :py:class`~structs.FeatureMap`

The FeatureMap object has two important properties:

  1. .feature_df property which returns the dataframe

  2. .config property which returns the TargetedDIAExtraction that was used to generate this FeatureMap

[9]:
featureMap['ionMobilityTest'].feature_df
[9]:
native_id ms_level precursor_mz mz rt im int Annotation product_mz
0 1 642.3295 642.334187 6225.005106 0.900254 76.000458 prec 642.3295
1 1 642.3295 642.334187 6225.005106 0.969271 153.011276 prec 642.3295
2 2 642.3295 504.262011 6225.110817 0.935281 68.001518 y4^1 504.2664
3 2 642.3295 504.262011 6225.110817 1.025902 41.000328 y4^1 504.2664
4 2 642.3295 504.262011 6225.110817 0.926001 43.000782 y4^1 504.2664
... ... ... ... ... ... ... ... ... ...
6812 2 642.3295 1065.546118 6266.515136 0.975441 8.999968 y9^1 1065.5463
6813 2 642.3295 1065.551224 6266.515136 0.986777 33.001766 y9^1 1065.5463
6814 2 642.3295 1065.551224 6266.515136 0.923945 84.003464 y9^1 1065.5463
6815 2 642.3295 1065.556331 6266.515136 0.910546 63.997871 y9^1 1065.5463
6816 2 642.3295 1065.556331 6266.515136 0.921891 54.000694 y9^1 1065.5463

6817 rows × 9 columns

[10]:
featureMap['ionMobilityTest'].config
[10]:
<massdash.structs.TargetedDIAConfig.TargetedDIAConfig at 0x75217a365bb0>

Converting a FeatureMap to 1D data

A :py:class:`~structs.FeatureMap` be difficult to work with due to its high dimensionality. Thus, massDash has built in methods to convert a :py:class:`~structs.FeatureMap` into a :py:class:`~structs.Chromatogram` (retention time vs intensity), :py:class:`~structs.Spectrum` (m/z vs intensity) or, if ion mobility is present a :py:class:`~structs.Mobilogram` (intensity vs ion mobility)To accomplish this we can use the :py:func:`~structs.FeatureMap.to_chromatogram`, :py:func:`~structs.FeatureMap.to_spectra`, :py:func:`~structs.FeatureMap.to_mobilograms` methods respectively
[11]:
chromatograms = featureMap['ionMobilityTest'].to_chromatograms()
chromatograms
[11]:
<massdash.structs.TransitionGroup.TransitionGroup at 0x752179ff22b0>
[12]:
spectra = featureMap['ionMobilityTest'].to_spectra()
spectra
[12]:
<massdash.structs.TransitionGroup.TransitionGroup at 0x75217a365100>
[13]:
mobilograms = featureMap['ionMobilityTest'].to_mobilograms()
mobilograms
[13]:
<massdash.structs.TransitionGroup.TransitionGroup at 0x75219aaa41f0>

Note

When converting a FeatureMap a TransitionGroup is always returned however the underlying data type is different based on the conversion method used.