massdash.loaders.access.OSWDataAccess

class massdash.loaders.access.OSWDataAccess(*args, mode: Literal['module', 'gui'] = 'module', **kwargs)

Bases: GenericResultsAccess

A class for accessing data from an OpenSWATH SQLite database.

conn

A connection to the SQLite database.

Type:

sqlite3.Connection

c

A cursor for executing SQL statements on the database.

Type:

sqlite3.Cursor

verbose

Whether to print verbose output.

Type:

bool

mode

The mode to use when intiating the data access object, to control which attributes get initialized.

Type:

str

getAllTopTransitionGroupFeaturesDf() DataFrame

Retrieves all the top ranking features from the database.

Returns:

The top ranking features per assay.

Return type:

pandas.DataFrame

getIdentifiedPeptides(qvalue: float = 0.01, run: str | None = None) set | Dict[str, set]

Get the identified peptides at a certain q-value.

Parameters:
  • qvalue – (float) The q-value threshold for identification

  • run – (str) The run name for which to get the identified peptides, if None, get for all runs

Returns:

The identified peptides across all runs (Dict[str, set]) or for a single run (set)

getIdentifiedPrecursorIntensities(qvalue: float = 0.01, run: str | None = None, precursorLevel=False)

Get the identified precursor intensities at a certain q-value.

Parameters:

**kwargs (dict) – Additional arguments to be passed to the getIdentifiedPrecursor function

Returns:

Precursor, runName, Intensity) or for a single run (DataFrame with columns: Precursor, Intensity)

Return type:

The identified precursor intensities across all runs (DataFrame with columns

getIdentifiedPrecursors(qvalue: float = 0.01, run: str | None = None, precursorLevel=False)

Retrives a set of identified precursors

Parameters:
  • run (str) – The run name.

  • qvalue (float) – The q-value threshold.

  • precursorLevel (bool) – True indicates q-value filtering only done on the precursor level

getIdentifiedProteins(qvalue: float = 0.01, run: str | None = None) set | Dict[str, set]

Get the identified proteins at a certain q-value.

Parameters:
  • qvalue – (float) The q-value threshold for identification

  • run – (str) The run name for which to get the identified proteins, if None, get for all runs

Returns:

The identified proteins across all runs (Dict[str, set]) or for a single run (set)

getPeptideTable(remove_ipf_peptides=True)

Retrieves the peptide table from the database.

Parameters:

remove_ipf_peptides (bool) – Whether to remove IPF peptides from the table.

Returns:

The peptide table.

Return type:

pandas.DataFrame

getPeptideTableFromProteinID(protein_id, remove_ipf_peptide=True)

Retrieves the peptide table from the database for a given protein ID.

Parameters:
  • protein_id (int) – The protein ID.

  • remove_ipf_peptides (bool) – Whether to remove IPF peptides from the table.

Returns:

The peptide table.

Return type:

pandas.DataFrame

getPeptideTransitionInfo(fullpeptidename, charge)

Retrieves transition information for a given peptide and charge.

Parameters:
  • fullpeptidename (str) – The full modified sequence of the peptide.

  • charge (int) – The precursor charge.

Returns:

The transition information.

Return type:

pandas.DataFrame

getPrecursorCharges(fullpeptidename)

Retrieves the precursor charges for a given peptide.

Parameters:

fullpeptidename (str) – The full modified sequence of the peptide.

Returns:

The precursor charges.

Return type:

pandas.DataFrame

getProteinTable(include_decoys=False)

Retrieves the protein table from the database.

Parameters:

include_decoys (bool) – Whether to include decoy proteins in the table.

Returns:

The protein table.

Return type:

pandas.DataFrame

getRunNames() List[str]

Infer the run names from the results file, extensions are removed

Returns:

The run names

Return type:

list

getScoreTable(score_table: Literal['SCORE_MS2', 'SCORE_MS1', 'SCORE_TRANSITION', 'SCORE_PEPTIDE', 'SCORE_PROTEIN', 'SCORE_IPF', 'FEATURE_MS2', 'FEATURE_MS1'], score: str, context: Literal['run-specific', 'experiment-wide', 'global'] = None) DataFrame

Get a Pandas DataFrame of target and decoy scores for a given score table and score.

Parameters:
  • Literal["SCORE_MS2" (score_table) – Table which score is found in

  • "SCORE_MS1" (str) – Table which score is found in

  • "SCORE_TRANSITION" (str) – Table which score is found in

  • "SCORE_PEPTIDE" (str) – Table which score is found in

  • "SCORE_PROTEIN" (str) – Table which score is found in

  • "SCORE_IPF" (str) – Table which score is found in

  • "FEATURE_MS2" (str) – Table which score is found in

  • "FEATURE_MS1"]] (str) – Table which score is found in

  • score (str) – The score to retrieve

Raises:

ValueError – Score is not valid score for plotting

Returns:

A pandas DataFrame with 3 columns: Decoy, Score, and Run Name

Return type:

pd.DataFrame

getTransitionIDAnnotationFromSequence(fullpeptidename, charge)

Retrieves transition information for a given peptide and charge.

Parameters:
  • fullpeptidename (str) – The full modified sequence of the peptide.

  • charge (int) – The precursor charge.

Returns:

The transition information.

Return type:

pandas.DataFrame

get_score_distribution(score_table: str, context: Literal['run-specific', 'experiment-wide', 'global'] = None)

Retrieves the score distribution for a given score table.

Parameters:

score_table (str) – The score table.

Returns:

The score distribution.

Return type:

pandas.DataFrame

get_score_table_contexts(score_table: str)

Retrieves the score contexts from the database.

Returns:

The score contexts.

Return type:

list

get_score_tables()

Retrieves the score tables from the database.

Returns:

The score tables.

Return type:

list

get_top_rank_precursor_feature(fullpeptidename, charge)

Retrieves the top ranking precursor feature for a given peptide and charge.

Parameters:
  • fullpeptidename (str) – The full modified sequence of the peptide.

  • charge (int) – The precursor charge.

Returns:

The top ranking precursor feature.

Return type:

pandas.DataFrame

get_top_rank_precursor_features_across_runs()

Retrieves the top ranking precursor features across runs from the database.

Returns:

The top ranking precursor features.

Return type:

pandas.DataFrame

load_data() DataFrame

Retrieves all the top ranking features from the database.

Returns:

The top ranking features per assay.

Return type:

pandas.DataFrame

validateSQL()

Validate that connection is a true SQLite connection.

Returns:

None - throws an error if connection is not valid.