Loading Feature Information
Regardless of whether a ChromatogramLoader or SpectrumLoader is used, MassDash can read information from the OpenSwath or DIA-NN output on the features found for a particular peptide.
As a demonstration, we will compare the features found for the peptide AFVDFLSDEIK with a charge state of 2
Loading Transition Group Features
For this example, lets initialize an MzMLDataLoader with both DIA-NN outputs and OpenSWATH outputs so that we can compare their features. Note we can initialize multiple results files in the same loader by supplying a list of all the paths to the results files.
[3]:
from massdash.loaders import MzMLDataLoader
loader = MzMLDataLoader(dataFiles="mzml/ionMobilityTest.mzML",
rsltsFile=["osw/ionMobilityTest.osw", "diann/ionMobilityTest-diannReport.tsv"])
Initializing valid scores for selection
[2024-10-10 09:15:32,714] MzMLDataAccess - INFO - Opening mzml/ionMobilityTest.mzML file...: Elapsed 0.08444595336914062 ms
[2024-10-10 09:15:32,715] MzMLDataAccess - INFO - There are 50 spectra and 0 chromatograms.
[2024-10-10 09:15:32,715] MzMLDataAccess - INFO - There are 25 MS1 spectra and 25 MS2 spectra.
MassDash uses TransitionGroupFeature information to store metadata on the feature detected. At minimum this contains the retention time boundaries detected by the software tool but other information as described below may also be found.
- class massdash.structs.TransitionGroupFeature(leftBoundary: float, rightBoundary: float, areaIntensity: float | None = None, qvalue: float | None = None, consensusApex: float | None = None, consensusApexIntensity: float | None = None, consensusApexIM: float | None = None, precursor_mz: float | None = None, precursor_charge: int | None = None, product_annotations: List[str] | None = None, product_mz: List[float] | None = None, sequence: str | None = None, software: Literal[‘DIA-NN’, ‘OpenSWATH’, ‘DreamDIA’] | None = None)
An object storing attributes on the detected feature in a TransitionGroup. All Peak Picking algorithms should output an object of this class
- leftBoundary
The left boundary of the feature
- Type:
float
- rightBoundary
The right boundary of the feature
- Type:
float
- areaIntensity
The area intensity of the feature
- Type:
float
- qvalue
The qvalue of the feature
- Type:
float
- consensusApex
The consensus apex of the feature
- Type:
float
- consensusApexIntensity
The consensus apex intensity of the feature
- Type:
float
- consensusApexIM
The consensus apex IM of the feature
- Type:
float
- precursor_mz
The precursor mz of the feature
- Type:
float
- precursor_charge
The precursor charge of the feature
- Type:
int
- product_annotations
The product annotations of the feature
- Type:
List[str]
- product_mz
The product mz of the feature
- Type:
List[float]
- sequence
The sequence of the feature
- Type:
str
To load the top TransitionGroupFeature we can use the loadTopTransitionGroupFeature() method.
[5]:
features = loader.loadTopTransitionGroupFeature("AFVDFLSDEIK", 2)
features
[5]:
TransitionGroupFeatureCollection
ionMobilityTest: [-------- TransitionGroupFeature --------
leftBoundary: 6235.8486328125
rightBoundary: 6248.42822265625
areaIntensity: 352642.16135025
consensusApex: 6242.15
consensusApexIntensity: 352642.16135025
qvalue: 3.5084067486223456e-05
consensusApexIM: 0.978579389257473
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: OpenSWATH, -------- TransitionGroupFeature --------
leftBoundary: 6236.052702
rightBoundary: 6248.63205
areaIntensity: 1137201.5
consensusApex: 6241.44882
consensusApexIntensity: None
qvalue: 7.964108227e-05
consensusApexIM: 0.9800000191
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: DIA-NN]
This method returns a dictionary where the keys are the runnames and the value is a list of TransitionGroupFeatures. We can see which software this feature was found in by the “software” tag.
Here, we can see that both OpenSwath and DIA-NN are detecting the same feature since the left and right boundaries and consensusApex are approximately equal. The intensities are different due to the different strategies that OpenSWATH and DIA-NN use to compute intensity. OpenSWATH sums up the intensity across all fragments while DIA-NN sums up the intensity across the top 3 fragment ions.
Loading The Top Transition Group Features as a Pandas DataFrame
For easier data manipulation, MassDash can also load feature information directly into a pandas dataframe using the loadTopTransitionGroupFeatureDf() method. However, this limits the usage with downstream MassDash tools.
[8]:
loader.loadTopTransitionGroupFeatureDf("AFVDFLSDEIK", 2)
[8]:
| runName | leftBoundary | rightBoundary | areaIntensity | qvalue | consensusApex | consensusApexIntensity | sequence | precursor_charge | software | consensusApexIM | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ionMobilityTest | 6235.848633 | 6248.428223 | 2848190.0 | 0.000035 | 6242.15000 | 352642.16135 | AFVDFLSDEIK | 2 | OpenSWATH | NaN |
| 1 | ionMobilityTest | 6236.052702 | 6248.632050 | 1137201.5 | 0.000080 | 6241.44882 | NaN | AFVDFLSDEIK | 2 | DIA-NN | 0.98 |
Loading All TransitionGroupFeatures
By default, OpenSWATH computes several features for a target peptide and the top feature is determined by PyProphet to load all of the features identified by OpenSwath for a particular precursor, we can use the:py:func:~loaders.GenericLoader.loadTransitionGroupFeatures method.
[9]:
loader.loadTransitionGroupFeatures("AFVDFLSDEIK", 2)
[9]:
TransitionGroupFeatureCollection
ionMobilityTest: [-------- TransitionGroupFeature --------
leftBoundary: 6235.8486328125
rightBoundary: 6248.42822265625
areaIntensity: 2848190.0
consensusApex: 6242.15
consensusApexIntensity: 352642.16135025
qvalue: 3.5084067486223456e-05
consensusApexIM: 0.978579389257473
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: OpenSWATH, -------- TransitionGroupFeature --------
leftBoundary: 6255.64599609375
rightBoundary: 6266.51513671875
areaIntensity: 35433.9
consensusApex: 6256.67
consensusApexIntensity: 2888.98703575134
qvalue: 0.0001776217262565
consensusApexIM: 0.981810896830182
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: OpenSWATH, -------- TransitionGroupFeature --------
leftBoundary: 6236.052702
rightBoundary: 6248.63205
areaIntensity: 1137201.5
consensusApex: 6241.44882
consensusApexIntensity: None
qvalue: 7.964108227e-05
consensusApexIM: 0.9800000191
precursor_mz: None
precursor_charge: 2
product_annotations: None
product_mz: None
sequence: AFVDFLSDEIK
software: DIA-NN]
Note:
DIA-NN only outputs one feature per precursor so calling the loadTransitionGroupFeatures() method will output esentially the same as loadTopTransitionGroupFeature() (only difference is that loadTransitionGroupFeatures() outputs a list of TransitionGroupFeatures)
Loading All TransitionGroupFeatures In a Pandas DataFrame
To load all TransitionGroupFeature objects in a pandas dataframe, the loadTopTransitionGroupFeaturesDf() method can be used.
[10]:
features_df = loader.loadTransitionGroupFeaturesDf("AFVDFLSDEIK", 2)
features_df
[10]:
| runname | leftBoundary | rightBoundary | areaIntensity | qvalue | consensusApex | consensusApexIntensity | precursor_charge | sequence | software | consensusApexIM | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ionMobilityTest | 6235.848633 | 6248.428223 | 2848190.0 | 0.000035 | 6242.15000 | 352642.161350 | 2 | AFVDFLSDEIK | OpenSWATH | 0.978579 |
| 1 | ionMobilityTest | 6255.645996 | 6266.515137 | 35433.9 | 0.000178 | 6256.67000 | 2888.987036 | 2 | AFVDFLSDEIK | OpenSWATH | 0.981811 |
| 2 | ionMobilityTest | 6236.052702 | 6248.632050 | 1137201.5 | 0.000080 | 6241.44882 | NaN | 2 | AFVDFLSDEIK | DIA-NN | 0.980000 |
Althuogh the Pandas DataFrame output is incompatible with further Masseer analysis, the pandas dataframe allows for greater flexibity with alternative analysis. For example, we can calculate the peakWidth of all features as shown below.
[11]:
features_df['peakWidth'] = features_df['rightBoundary'] - features_df['leftBoundary']
features_df
[11]:
| runname | leftBoundary | rightBoundary | areaIntensity | qvalue | consensusApex | consensusApexIntensity | precursor_charge | sequence | software | consensusApexIM | peakWidth | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ionMobilityTest | 6235.848633 | 6248.428223 | 2848190.0 | 0.000035 | 6242.15000 | 352642.161350 | 2 | AFVDFLSDEIK | OpenSWATH | 0.978579 | 12.579590 |
| 1 | ionMobilityTest | 6255.645996 | 6266.515137 | 35433.9 | 0.000178 | 6256.67000 | 2888.987036 | 2 | AFVDFLSDEIK | OpenSWATH | 0.981811 | 10.869141 |
| 2 | ionMobilityTest | 6236.052702 | 6248.632050 | 1137201.5 | 0.000080 | 6241.44882 | NaN | 2 | AFVDFLSDEIK | DIA-NN | 0.980000 | 12.579348 |