Loading Chromatogram Data

A Chromatogram Loader loads raw data from chromatograms. This leads to faster data loading since no on the fly extraction is needed however this leads to less flexibility.

Initiating a Chromatogram Loader

All Chromatogram Loaders require the following inputs.

dataFiles - a list of raw data files
rsltsFile - a file containing the features

In the case of an SqMassLoader the data files must be a list of .sqMass files and the rsltsFile must be a .osw merged file output from pyprophet. This output is useful to visualize because it shows the exact Chromatogram which the OpenSwath peak picking uses for peak picking.

We can initiate a SqMassLoader object with multiple sqMass files as follows.

[3]:

from massdash.loaders import SqMassLoader
loader = SqMassLoader(dataFiles=["xics/test_chrom_1.sqMass", "xics/test_chrom_2.sqMass"],
                      rsltsFile="osw/test_data.osw")

Note

The provided .osw file must contain information for all runs.

Priting the loader objects shows the file paths for all of the linked files

[4]:

loader

[4]:

SqMassLoader(rsltsFile=osw/test_data.osw, dataFiles=['xics/test_chrom_1.sqMass', 'xics/test_chrom_2.sqMass']

Loading a Transition Group

To fetch the chromatograms for a particular transitionGroup, we can call the loadTransitionGroups() method. This method requires a modified peptide sequence and a charge state and will load the transition groups across all runs. This method can take a while since it is fetching the data across all experiments from disk.

In this example we will visualize the peptide NKESPT(UniMod:21)KAIVR(UniMod:267) with a charge state of 3

[16]:

loader.loadTransitionGroups("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)

[16]:

TransitionGroupCollection
test_chrom_1: -------- TransitionGroup --------
precursor data: 1
transition data: 6
data type: Chromatogram
test_chrom_2: -------- TransitionGroup --------
precursor data: 1
transition data: 6
data type: Chromatogram

Here we have a TransitionGroupCollection which is a dictionary where the file keys are the runnames (each one corresponding with a different SQMass connector) and the values are a TransitionGroup. The TransitionGroup object holds a series of chromatograms belonging to the same precursor.

Loading Chromatogram Data as a Pandas DataFrame

Alternatively for analysis directly in python the loaders can return a pandas dataframe object contianing all of the points for this peptide by using the loadTransitionGroups() method instead.

[17]:

transitionGroupDf = loader.loadTransitionGroupsDf("NKESPT(UniMod:21)KAIVR(UniMod:267)", 3)
transitionGroupDf

[17]:

	run	rt	intensity	annotation
0	test_chrom_1	512.8	1069.051908	2274_Precursor_i0
1	test_chrom_1	516.4	2230.982597	2274_Precursor_i0
2	test_chrom_1	520.0	2583.056921	2274_Precursor_i0
3	test_chrom_1	523.7	1876.955276	2274_Precursor_i0
4	test_chrom_1	527.3	1862.126603	2274_Precursor_i0
...	...	...	...	...
2697	test_chrom_2	1251.0	0.000000	b4^1
2698	test_chrom_2	1254.7	42.001872	b4^1
2699	test_chrom_2	1258.3	20.999608	b4^1
2700	test_chrom_2	1261.9	20.999608	b4^1
2701	test_chrom_2	1265.6	0.000000	b4^1

2702 rows × 4 columns

This dataframe has all of the intensities and retention times for all of the files across all transitions. Transitions can be diffrentiated by the annotation column and the filename column diffrentiates the file/run in which the chromatograms originate from.

For example to get the total intensity across the intensities we can use the pandas groupby() functions

[18]:

transitionGroupDf[['intensity', 'run', 'annotation']].groupby(['run', 'annotation']).sum()

[18]:

		intensity
run	annotation
test_chrom_1	2274_Precursor_i0	2.139805e+06
	b4^1	3.000697e+04
	y1^1	1.300780e+05
	y2^1	2.837481e+04
	y3^1	3.879062e+05
	y4^1	1.295312e+05
	y5^1	5.707377e+04
test_chrom_2	2274_Precursor_i0	5.931736e+05
	b4^1	7.226959e+03
	y1^1	1.137597e+04
	y2^1	1.631975e+05
	y3^1	4.025936e+04
	y4^1	3.567035e+03
	y5^1	1.758796e+04