Loading Chromatogram Data

A Chromatogram Loader loads raw data from chromatograms. This leads to faster data loading since no on the fly extraction is needed however this leads to less flexibility.

Initiating a Chromatogram Loader

All Chromatogram Loaders require the following inputs.

  1. dataFiles - a list of raw data files

  2. rsltsFile - a file containing the features

In the case of an SqMassLoader the data files must be a list of .sqMass files and the rsltsFile must be a .osw merged file output from pyprophet. This output is useful to visualize because it shows the exact Chromatogram which the OpenSwath peak picking uses for peak picking.

We can initiate a SqMassLoader object with multiple sqMass files as follows.

[3]:
from massdash.loaders import SqMassLoader
loader = SqMassLoader(dataFiles=["xics/test_chrom_1.sqMass", "xics/test_chrom_2.sqMass"],
                      rsltsFile="osw/test_data.osw")

Note

The provided .osw file must contain information for all runs.

Priting the loader objects shows the file paths for all of the linked files

[4]:
loader
[4]:
SqMassLoader(rsltsFile=osw/test_data.osw, dataFiles=['xics/test_chrom_1.sqMass', 'xics/test_chrom_2.sqMass']

Loading a Transition Group

To fetch the chromatograms for a particular transitionGroup, we can call the loadTransitionGroups() method. This method requires a modified peptide sequence and a charge state and will load the transition groups across all runs. This method can take a while since it is fetching the data across all experiments from disk.

In this example we will visualize the peptide NKESPT(UniMod:21)KAIVR(UniMod:267) with a charge state of 3

[16]:
loader.loadTransitionGroups("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)
[16]:
TransitionGroupCollection
test_chrom_1: -------- TransitionGroup --------
precursor data: 1
transition data: 6
data type: Chromatogram
test_chrom_2: -------- TransitionGroup --------
precursor data: 1
transition data: 6
data type: Chromatogram

Here we have a TransitionGroupCollection which is a dictionary where the file keys are the runnames (each one corresponding with a different SQMass connector) and the values are a TransitionGroup. The TransitionGroup object holds a series of chromatograms belonging to the same precursor.

Loading Chromatogram Data as a Pandas DataFrame

Alternatively for analysis directly in python the loaders can return a pandas dataframe object contianing all of the points for this peptide by using the loadTransitionGroups() method instead.

[17]:
transitionGroupDf = loader.loadTransitionGroupsDf("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)
transitionGroupDf
[17]:
run rt intensity annotation
0 test_chrom_1 512.8 1069.051908 2274_Precursor_i0
1 test_chrom_1 516.4 2230.982597 2274_Precursor_i0
2 test_chrom_1 520.0 2583.056921 2274_Precursor_i0
3 test_chrom_1 523.7 1876.955276 2274_Precursor_i0
4 test_chrom_1 527.3 1862.126603 2274_Precursor_i0
... ... ... ... ...
2697 test_chrom_2 1251.0 0.000000 b4^1
2698 test_chrom_2 1254.7 42.001872 b4^1
2699 test_chrom_2 1258.3 20.999608 b4^1
2700 test_chrom_2 1261.9 20.999608 b4^1
2701 test_chrom_2 1265.6 0.000000 b4^1

2702 rows × 4 columns

This dataframe has all of the intensities and retention times for all of the files across all transitions. Transitions can be diffrentiated by the annotation column and the filename column diffrentiates the file/run in which the chromatograms originate from.

For example to get the total intensity across the intensities we can use the pandas groupby() functions

[18]:
transitionGroupDf[['intensity', 'run', 'annotation']].groupby(['run', 'annotation']).sum()
[18]:
intensity
run annotation
test_chrom_1 2274_Precursor_i0 2.139805e+06
b4^1 3.000697e+04
y1^1 1.300780e+05
y2^1 2.837481e+04
y3^1 3.879062e+05
y4^1 1.295312e+05
y5^1 5.707377e+04
test_chrom_2 2274_Precursor_i0 5.931736e+05
b4^1 7.226959e+03
y1^1 1.137597e+04
y2^1 1.631975e+05
y3^1 4.025936e+04
y4^1 3.567035e+03
y5^1 1.758796e+04