`massdash.loaders.access`.OSWDataAccess

class massdash.loaders.access.OSWDataAccess(*args, mode: Literal['module', 'gui'] = 'module', **kwargs)

Bases: GenericResultsAccess

A class for accessing data from an OpenSWATH SQLite database.

conn

A connection to the SQLite database.

Type:: sqlite3.Connection

c

A cursor for executing SQL statements on the database.

Type:: sqlite3.Cursor

verbose

Whether to print verbose output.

Type:: bool

mode

The mode to use when intiating the data access object, to control which attributes get initialized.

Type:: str

getAllTopTransitionGroupFeaturesDf() → DataFrame

Retrieves all the top ranking features from the database.

Returns:: The top ranking features per assay.
Return type:: pandas.DataFrame

getIdentifiedPeptides(qvalue: float = 0.01, run: str | None = None) → set | Dict[str, set]

Get the identified peptides at a certain q-value.

Parameters:

qvalue – (float) The q-value threshold for identification
run – (str) The run name for which to get the identified peptides, if None, get for all runs

Returns:

The identified peptides across all runs (Dict[str, set]) or for a single run (set)

getIdentifiedPrecursorIntensities(qvalue: float = 0.01, run: str | None = None, precursorLevel=False)

Get the identified precursor intensities at a certain q-value.

Parameters:: **kwargs (dict) – Additional arguments to be passed to the getIdentifiedPrecursor function
Returns:: Precursor, runName, Intensity) or for a single run (DataFrame with columns: Precursor, Intensity)
Return type:: The identified precursor intensities across all runs (DataFrame with columns

getIdentifiedPrecursors(qvalue: float = 0.01, run: str | None = None, precursorLevel=False)

Retrives a set of identified precursors

Parameters:

run (str) – The run name.
qvalue (float) – The q-value threshold.
precursorLevel (bool) – True indicates q-value filtering only done on the precursor level

getIdentifiedProteins(qvalue: float = 0.01, run: str | None = None) → set | Dict[str, set]

Get the identified proteins at a certain q-value.

Parameters:

qvalue – (float) The q-value threshold for identification
run – (str) The run name for which to get the identified proteins, if None, get for all runs

Returns:

The identified proteins across all runs (Dict[str, set]) or for a single run (set)

getPeptideTable(remove_ipf_peptides=True)

Retrieves the peptide table from the database.

Parameters:: remove_ipf_peptides (bool) – Whether to remove IPF peptides from the table.
Returns:: The peptide table.
Return type:: pandas.DataFrame

getPeptideTableFromProteinID(protein_id, remove_ipf_peptide=True)

Retrieves the peptide table from the database for a given protein ID.

Parameters:

protein_id (int) – The protein ID.
remove_ipf_peptides (bool) – Whether to remove IPF peptides from the table.

Returns:

The peptide table.

Return type:

pandas.DataFrame

getPeptideTransitionInfo(fullpeptidename, charge)

Retrieves transition information for a given peptide and charge.

Parameters:

fullpeptidename (str) – The full modified sequence of the peptide.
charge (int) – The precursor charge.

Returns:

The transition information.

Return type:

pandas.DataFrame

getPrecursorCharges(fullpeptidename)

Retrieves the precursor charges for a given peptide.

Parameters:: fullpeptidename (str) – The full modified sequence of the peptide.
Returns:: The precursor charges.
Return type:: pandas.DataFrame

getProteinTable(include_decoys=False)

Retrieves the protein table from the database.

Parameters:: include_decoys (bool) – Whether to include decoy proteins in the table.
Returns:: The protein table.
Return type:: pandas.DataFrame

getRunNames() → List[str]

Infer the run names from the results file, extensions are removed

Returns:: The run names
Return type:: list

getScoreTable(score_table: Literal['SCORE_MS2', 'SCORE_MS1', 'SCORE_TRANSITION', 'SCORE_PEPTIDE', 'SCORE_PROTEIN', 'SCORE_IPF', 'FEATURE_MS2', 'FEATURE_MS1'], score: str, context: Literal['run-specific', 'experiment-wide', 'global'] = None) → DataFrame

Get a Pandas DataFrame of target and decoy scores for a given score table and score.

Parameters:

Literal["SCORE_MS2" (score_table) – Table which score is found in
"SCORE_MS1" (str) – Table which score is found in
"SCORE_TRANSITION" (str) – Table which score is found in
"SCORE_PEPTIDE" (str) – Table which score is found in
"SCORE_PROTEIN" (str) – Table which score is found in
"SCORE_IPF" (str) – Table which score is found in
"FEATURE_MS2" (str) – Table which score is found in
"FEATURE_MS1"]] (str) – Table which score is found in
score (str) – The score to retrieve

Raises:

ValueError – Score is not valid score for plotting

Returns:

A pandas DataFrame with 3 columns: Decoy, Score, and Run Name

Return type:

pd.DataFrame

getTransitionIDAnnotationFromSequence(fullpeptidename, charge)

Retrieves transition information for a given peptide and charge.

Parameters:

fullpeptidename (str) – The full modified sequence of the peptide.
charge (int) – The precursor charge.

Returns:

The transition information.

Return type:

pandas.DataFrame

get_score_distribution(score_table: str, context: Literal['run-specific', 'experiment-wide', 'global'] = None)

Retrieves the score distribution for a given score table.

Parameters:: score_table (str) – The score table.
Returns:: The score distribution.
Return type:: pandas.DataFrame

get_score_table_contexts(score_table: str)

Retrieves the score contexts from the database.

Returns:: The score contexts.
Return type:: list

get_score_tables()

Retrieves the score tables from the database.

Returns:: The score tables.
Return type:: list

get_top_rank_precursor_feature(fullpeptidename, charge)

Retrieves the top ranking precursor feature for a given peptide and charge.

Parameters:

fullpeptidename (str) – The full modified sequence of the peptide.
charge (int) – The precursor charge.

Returns:

The top ranking precursor feature.

Return type:

pandas.DataFrame

get_top_rank_precursor_features_across_runs()

Retrieves the top ranking precursor features across runs from the database.

Returns:: The top ranking precursor features.
Return type:: pandas.DataFrame

load_data() → DataFrame

Retrieves all the top ranking features from the database.

Returns:: The top ranking features per assay.
Return type:: pandas.DataFrame

validateSQL()

Validate that connection is a true SQLite connection.

Returns:: None - throws an error if connection is not valid.

massdash.loaders.access.OSWDataAccess

`massdash.loaders.access`.OSWDataAccess