Loading Chromatogram Data
A Chromatogram Loader loads raw data from chromatograms. This leads to faster data loading since no on the fly extraction is needed however this leads to less flexibility.
Initiating a Chromatogram Loader
All Chromatogram Loaders require the following inputs.
dataFiles - a list of raw data files
rsltsFile - a file containing the features
In the case of an SqMassLoader the data files must be a list of .sqMass files and the rsltsFile must be a .osw merged file output from pyprophet. This output is useful to visualize because it shows the exact Chromatogram which the OpenSwath peak picking uses for peak picking.
We can initiate a SqMassLoader object with multiple sqMass files as follows.
[3]:
from massdash.loaders import SqMassLoader
loader = SqMassLoader(dataFiles=["xics/test_chrom_1.sqMass", "xics/test_chrom_2.sqMass"],
rsltsFile="osw/test_data.osw")
Note
The provided .osw file must contain information for all runs.
Priting the loader objects shows the file paths for all of the linked files
[4]:
loader
[4]:
SqMassLoader(rsltsFile=osw/test_data.osw, dataFiles=['xics/test_chrom_1.sqMass', 'xics/test_chrom_2.sqMass']
Loading a Transition Group
To fetch the chromatograms for a particular transitionGroup, we can call the loadTransitionGroups() method. This method requires a modified peptide sequence and a charge state and will load the transition groups across all runs. This method can take a while since it is fetching the data across all experiments from disk.
In this example we will visualize the peptide NKESPT(UniMod:21)KAIVR(UniMod:267) with a charge state of 3
[16]:
loader.loadTransitionGroups("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)
[16]:
TransitionGroupCollection
test_chrom_1: -------- TransitionGroup --------
precursor data: 1
transition data: 6
data type: Chromatogram
test_chrom_2: -------- TransitionGroup --------
precursor data: 1
transition data: 6
data type: Chromatogram
Here we have a TransitionGroupCollection which is a dictionary where the file keys are the runnames (each one corresponding with a different SQMass connector) and the values are a TransitionGroup. The TransitionGroup object holds a series of chromatograms belonging to the same precursor.
Loading Chromatogram Data as a Pandas DataFrame
Alternatively for analysis directly in python the loaders can return a pandas dataframe object contianing all of the points for this peptide by using the loadTransitionGroups() method instead.
[17]:
transitionGroupDf = loader.loadTransitionGroupsDf("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)
transitionGroupDf
[17]:
| run | rt | intensity | annotation | |
|---|---|---|---|---|
| 0 | test_chrom_1 | 512.8 | 1069.051908 | 2274_Precursor_i0 |
| 1 | test_chrom_1 | 516.4 | 2230.982597 | 2274_Precursor_i0 |
| 2 | test_chrom_1 | 520.0 | 2583.056921 | 2274_Precursor_i0 |
| 3 | test_chrom_1 | 523.7 | 1876.955276 | 2274_Precursor_i0 |
| 4 | test_chrom_1 | 527.3 | 1862.126603 | 2274_Precursor_i0 |
| ... | ... | ... | ... | ... |
| 2697 | test_chrom_2 | 1251.0 | 0.000000 | b4^1 |
| 2698 | test_chrom_2 | 1254.7 | 42.001872 | b4^1 |
| 2699 | test_chrom_2 | 1258.3 | 20.999608 | b4^1 |
| 2700 | test_chrom_2 | 1261.9 | 20.999608 | b4^1 |
| 2701 | test_chrom_2 | 1265.6 | 0.000000 | b4^1 |
2702 rows × 4 columns
This dataframe has all of the intensities and retention times for all of the files across all transitions. Transitions can be diffrentiated by the annotation column and the filename column diffrentiates the file/run in which the chromatograms originate from.
For example to get the total intensity across the intensities we can use the pandas groupby() functions
[18]:
transitionGroupDf[['intensity', 'run', 'annotation']].groupby(['run', 'annotation']).sum()
[18]:
| intensity | ||
|---|---|---|
| run | annotation | |
| test_chrom_1 | 2274_Precursor_i0 | 2.139805e+06 |
| b4^1 | 3.000697e+04 | |
| y1^1 | 1.300780e+05 | |
| y2^1 | 2.837481e+04 | |
| y3^1 | 3.879062e+05 | |
| y4^1 | 1.295312e+05 | |
| y5^1 | 5.707377e+04 | |
| test_chrom_2 | 2274_Precursor_i0 | 5.931736e+05 |
| b4^1 | 7.226959e+03 | |
| y1^1 | 1.137597e+04 | |
| y2^1 | 1.631975e+05 | |
| y3^1 | 4.025936e+04 | |
| y4^1 | 3.567035e+03 | |
| y5^1 | 1.758796e+04 |