Loading Feature Information

Regardless of whether a ChromatogramLoader or SpectrumLoader is used, MassDash can read information from the OpenSwath or DIA-NN output on the features found for a particular peptide.

As a demonstration, we will compare the features found for the peptide AFVDFLSDEIK with a charge state of 2

Loading Transition Group Features

For this example, lets initialize an MzMLDataLoader with both DIA-NN outputs and OpenSWATH outputs so that we can compare their features. Note we can initialize multiple results files in the same loader by supplying a list of all the paths to the results files.

[3]:
from massdash.loaders import MzMLDataLoader
loader = MzMLDataLoader(dataFiles="mzml/ionMobilityTest.mzML",
                        rsltsFile=["osw/ionMobilityTest.osw", "diann/ionMobilityTest-diannReport.tsv"])
Initializing valid scores for selection
[2024-10-10 09:15:32,714] MzMLDataAccess - INFO - Opening mzml/ionMobilityTest.mzML file...: Elapsed 0.08444595336914062 ms
[2024-10-10 09:15:32,715] MzMLDataAccess - INFO - There are 50 spectra and 0 chromatograms.
[2024-10-10 09:15:32,715] MzMLDataAccess - INFO - There are 25 MS1 spectra and 25 MS2 spectra.

MassDash uses TransitionGroupFeature information to store metadata on the feature detected. At minimum this contains the retention time boundaries detected by the software tool but other information as described below may also be found.

class massdash.structs.TransitionGroupFeature(leftBoundary: float, rightBoundary: float, areaIntensity: float | None = None, qvalue: float | None = None, consensusApex: float | None = None, consensusApexIntensity: float | None = None, consensusApexIM: float | None = None, precursor_mz: float | None = None, precursor_charge: int | None = None, product_annotations: List[str] | None = None, product_mz: List[float] | None = None, sequence: str | None = None, software: Literal[‘DIA-NN’, ‘OpenSWATH’, ‘DreamDIA’] | None = None)

An object storing attributes on the detected feature in a TransitionGroup. All Peak Picking algorithms should output an object of this class

leftBoundary

The left boundary of the feature

Type:

float

rightBoundary

The right boundary of the feature

Type:

float

areaIntensity

The area intensity of the feature

Type:

float

qvalue

The qvalue of the feature

Type:

float

consensusApex

The consensus apex of the feature

Type:

float

consensusApexIntensity

The consensus apex intensity of the feature

Type:

float

consensusApexIM

The consensus apex IM of the feature

Type:

float

precursor_mz

The precursor mz of the feature

Type:

float

precursor_charge

The precursor charge of the feature

Type:

int

product_annotations

The product annotations of the feature

Type:

List[str]

product_mz

The product mz of the feature

Type:

List[float]

sequence

The sequence of the feature

Type:

str

To load the top TransitionGroupFeature we can use the loadTopTransitionGroupFeature() method.

[5]:
features = loader.loadTopTransitionGroupFeature("AFVDFLSDEIK", 2)
features
[5]:
TransitionGroupFeatureCollection
ionMobilityTest: [-------- TransitionGroupFeature --------
leftBoundary: 6235.8486328125
rightBoundary: 6248.42822265625
areaIntensity: 352642.16135025
consensusApex: 6242.15
consensusApexIntensity: 352642.16135025
qvalue: 3.5084067486223456e-05
consensusApexIM: 0.978579389257473
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: OpenSWATH, -------- TransitionGroupFeature --------
leftBoundary: 6236.052702
rightBoundary: 6248.63205
areaIntensity: 1137201.5
consensusApex: 6241.44882
consensusApexIntensity: None
qvalue: 7.964108227e-05
consensusApexIM: 0.9800000191
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: DIA-NN]

This method returns a dictionary where the keys are the runnames and the value is a list of TransitionGroupFeatures. We can see which software this feature was found in by the “software” tag.

Here, we can see that both OpenSwath and DIA-NN are detecting the same feature since the left and right boundaries and consensusApex are approximately equal. The intensities are different due to the different strategies that OpenSWATH and DIA-NN use to compute intensity. OpenSWATH sums up the intensity across all fragments while DIA-NN sums up the intensity across the top 3 fragment ions.

Loading The Top Transition Group Features as a Pandas DataFrame

For easier data manipulation, MassDash can also load feature information directly into a pandas dataframe using the loadTopTransitionGroupFeatureDf() method. However, this limits the usage with downstream MassDash tools.

[8]:
loader.loadTopTransitionGroupFeatureDf("AFVDFLSDEIK", 2)
[8]:
runName leftBoundary rightBoundary areaIntensity qvalue consensusApex consensusApexIntensity sequence precursor_charge software consensusApexIM
0 ionMobilityTest 6235.848633 6248.428223 2848190.0 0.000035 6242.15000 352642.16135 AFVDFLSDEIK 2 OpenSWATH NaN
1 ionMobilityTest 6236.052702 6248.632050 1137201.5 0.000080 6241.44882 NaN AFVDFLSDEIK 2 DIA-NN 0.98

Loading All TransitionGroupFeatures

By default, OpenSWATH computes several features for a target peptide and the top feature is determined by PyProphet to load all of the features identified by OpenSwath for a particular precursor, we can use the:py:func:~loaders.GenericLoader.loadTransitionGroupFeatures method.

[9]:
loader.loadTransitionGroupFeatures("AFVDFLSDEIK", 2)
[9]:
TransitionGroupFeatureCollection
ionMobilityTest: [-------- TransitionGroupFeature --------
leftBoundary: 6235.8486328125
rightBoundary: 6248.42822265625
areaIntensity: 2848190.0
consensusApex: 6242.15
consensusApexIntensity: 352642.16135025
qvalue: 3.5084067486223456e-05
consensusApexIM: 0.978579389257473
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: OpenSWATH, -------- TransitionGroupFeature --------
leftBoundary: 6255.64599609375
rightBoundary: 6266.51513671875
areaIntensity: 35433.9
consensusApex: 6256.67
consensusApexIntensity: 2888.98703575134
qvalue: 0.0001776217262565
consensusApexIM: 0.981810896830182
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: OpenSWATH, -------- TransitionGroupFeature --------
leftBoundary: 6236.052702
rightBoundary: 6248.63205
areaIntensity: 1137201.5
consensusApex: 6241.44882
consensusApexIntensity: None
qvalue: 7.964108227e-05
consensusApexIM: 0.9800000191
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: DIA-NN]

Note:

DIA-NN only outputs one feature per precursor so calling the loadTransitionGroupFeatures() method will output esentially the same as loadTopTransitionGroupFeature() (only difference is that loadTransitionGroupFeatures() outputs a list of TransitionGroupFeatures)

This method returns a :py:class:`~structs.TransitionGroupFeatureCollection` which is a dictionary where the keys are the run names and the values are a list of all the :py:class:`~structs.TransitionGroupFeature` objects found. Here we can see that `OpenSWATH` finds an additional feature however in general this should be ignore because the top :py:class:`~structs.TransitionGroupFeature` has a lower `q-value`

Loading All TransitionGroupFeatures In a Pandas DataFrame

To load all TransitionGroupFeature objects in a pandas dataframe, the loadTopTransitionGroupFeaturesDf() method can be used.

[10]:
features_df = loader.loadTransitionGroupFeaturesDf("AFVDFLSDEIK", 2)
features_df
[10]:
runname leftBoundary rightBoundary areaIntensity qvalue consensusApex consensusApexIntensity precursor_charge sequence software consensusApexIM
0 ionMobilityTest 6235.848633 6248.428223 2848190.0 0.000035 6242.15000 352642.161350 2 AFVDFLSDEIK OpenSWATH 0.978579
1 ionMobilityTest 6255.645996 6266.515137 35433.9 0.000178 6256.67000 2888.987036 2 AFVDFLSDEIK OpenSWATH 0.981811
2 ionMobilityTest 6236.052702 6248.632050 1137201.5 0.000080 6241.44882 NaN 2 AFVDFLSDEIK DIA-NN 0.980000

Althuogh the Pandas DataFrame output is incompatible with further Masseer analysis, the pandas dataframe allows for greater flexibity with alternative analysis. For example, we can calculate the peakWidth of all features as shown below.

[11]:
features_df['peakWidth'] = features_df['rightBoundary'] - features_df['leftBoundary']
features_df
[11]:
runname leftBoundary rightBoundary areaIntensity qvalue consensusApex consensusApexIntensity precursor_charge sequence software consensusApexIM peakWidth
0 ionMobilityTest 6235.848633 6248.428223 2848190.0 0.000035 6242.15000 352642.161350 2 AFVDFLSDEIK OpenSWATH 0.978579 12.579590
1 ionMobilityTest 6255.645996 6266.515137 35433.9 0.000178 6256.67000 2888.987036 2 AFVDFLSDEIK OpenSWATH 0.981811 10.869141
2 ionMobilityTest 6236.052702 6248.632050 1137201.5 0.000080 6241.44882 NaN 2 AFVDFLSDEIK DIA-NN 0.980000 12.579348