Data at the UCLA Graeber lab was traditionally visualized in 2D. Might new patterns and insights emerge when visualized in 3D?
I authored a 3D Principal Component Analysis script that allowed UCLA biologists to visualize the first 3 components in a 3D space. For reference, Principal component analysis, or PCA, is a statistical procedure that allows you to summarize the information content in large data tables by means of a smaller set of “summary indices” that can be more easily visualized and analyzed*. *https://tinyurl.com/zef3dt85
Most of the time, in Principal Component Analysis of gene expression data, lipidomics, etc., the first two components are the most significant out of all that are generated. But in some cases, such as with the Uveal Melanoma data, the first 3 components show significant gene expression behavioral trends and the only way to view these trends were to clumsily view two 2D PCA plots, one for x=PC1 vs. y=PC2, and x=PC2 vs. y=PC3.
As it turned out, other teams in the Graeber lab faced this same problem of not being able to visualize all 3 components at once. So it became clear to me, that it was necessary to make a 3D plotting function that was applicable to all projects within the lab that require Principal Component Analysis.
To solve this problem, I determined that the user needed to be able to view the 3D plot as well as rotate it via the computer mouse to get the most out of their results. It became clear to me that data point labels were necessary as well as being able to color specific data point groupings uniquely. In addition, I added the option to connect the dots and draw lines between groupings in the case that a 3D shape existed, in order to validate the use of 3 components.
This project became an R function that is used across all teams at the Graeber lab and has been transformative in interpreting Uveal Melanoma, Prostate Cancer, and Ovarian Cancer data. Output data plots have been showcased at the UCLA Melanoma P01 Retreat 2022 and in an upcoming Covid research paper.
1. Can Viewing Data in 3D Reveal
2. A Data Pipeline Reworked for Efficient and Effective Transfer of Information.
Streamlining the existing Metabolomics pipeline to improve the reliability of the R scripts and increase turnaround for clients from labs across UCLA, Caltech, Stanford, UCSD, etc.
A data pipeline is a series of data processing steps that insures the efficient sharing and communication of information. At the UCLA Metabolomics Lab, biological samples (such as viruses, cancer cells, etc.) get analyzed via mass spectrometry where key metabolites that are present in samples get measured for significance. Metabolites that are measured include Tryptophan, Fructose, Aspartame, etc. Metabolites or their isotopologues with insufficient data values get filtered out. I focused on improving the metabolite filtering among other improvements to the scripts.
Both over and under filtering can be erroneous and has the potential to mislead the interpretation of results. Prior to my improvements the scripts were under filtering metabolites, which can especially make interpretation confusing and possibly incorrect when the Metabolites for a set are labeled and have isotopologues . Metabolites that only show data on one of the 3 replicated samples should be removed initially as well as after natural isotope correction (where the mass spectrometer value gets adjusted for natural abundance of Carbon and Nitrogen in the atmosphere).
Filtering out insignificant metabolites is used across all UCLA Metabolomics Center datasets.
Collaborated with coworkers to add 3 types of data filtering:
1. Initial data filtering on labeled datasets (Metabolites with isotopologues) that removes insignificant metabolite isotopologues.
2. Refined existing filtering to not apply to the M0 isotopologue and the standards (metabolites of a standardized amount, independent from data).
3. Repeated data filtering on labeled datasets after natural isotope correction is applied.
This initiative has been transformative for the accuracy of the metabolomics analysis. The output data is more succinct and more reliable for interpretation. In addition to metabolomics pipeline filtering improvements, I adapted a new natural isotope correction algorithm that includes Carbon, Nitrogen, and Deuterium correction to the pipeline. With all the new improvements, the metabolomics center can process data faster and more reliably.