Gene Expression of Integrins¶
The sequence of nucleotides in the integrin genes code for integrin proteins. Integrin proteins facilitate interactions between cells and the extracellular matrix, allowing the cell's adhesion, migration, signaling, and differentiation, as well as cell-to-cell communication.
In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
In [2]:
integrins = pd.read_excel(r"C:\Users\QBPAM\Downloads\'25 summer BigData AI Cancer class by Yongmei Wang\gtex_integrin_7_organs.xlsx")
integrins
Out[2]:
primary_site | ITGA10 | ITGAD | ITGAM | ITGA3 | ITGBL1 | ITGAE | ITGA2 | ITGB3 | ITGA7 | ... | ITGA6 | ITGA2B | ITGB1 | ITGAL | ITGA9 | ITGB5 | ITGA8 | ITGA4 | ITGA1 | ITGA11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Brain | 0.5763 | -6.5064 | 2.2573 | 0.7832 | 1.0363 | 4.6035 | 2.5731 | -2.8262 | 4.9663 | ... | 2.8562 | 1.3846 | 5.8430 | 1.1316 | -0.7108 | 3.5387 | -0.0725 | -0.4521 | 0.2029 | -2.8262 |
1 | Lung | 4.9137 | -3.6259 | 4.7307 | 7.1584 | 1.7702 | 4.9556 | 1.9149 | 2.6067 | 3.9270 | ... | 4.2412 | 4.1211 | 7.7256 | 4.4900 | 2.9281 | 6.1483 | 5.1867 | 2.6185 | 4.7856 | -0.0277 |
2 | Ovary | 2.3953 | -5.0116 | 1.4547 | 4.2593 | -0.7346 | 4.4149 | 0.2642 | 1.5216 | 4.3492 | ... | 3.6816 | 1.5465 | 7.2964 | -0.9406 | 2.7742 | 5.0414 | 2.0325 | 0.7579 | 2.2573 | 1.2516 |
3 | Lung | 4.0541 | -2.3147 | 4.5053 | 7.5651 | 4.1788 | 4.1772 | 5.3695 | 1.8444 | 4.5355 | ... | 4.9631 | 1.9149 | 7.9947 | 3.3911 | 2.8462 | 6.7683 | 4.1636 | 2.7951 | 5.3284 | 1.2147 |
4 | Breast | 2.0569 | -2.4659 | 3.3993 | 3.1311 | 3.0074 | 4.4977 | -1.7809 | 2.7139 | 7.8698 | ... | 4.7340 | 0.6332 | 7.3496 | -0.9406 | 2.5338 | 6.5696 | 1.7229 | -0.6416 | 3.1195 | 1.1050 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1982 | Lung | 5.3067 | -3.8160 | 4.9065 | 7.5810 | 5.8714 | 4.7345 | 2.6185 | 3.1095 | 5.2032 | ... | 5.6080 | 3.7324 | 8.2849 | 4.6201 | 3.6440 | 6.7052 | 5.1094 | 3.3364 | 5.8153 | 1.6604 |
1983 | Prostate | 2.9581 | -4.6082 | 1.1641 | 4.6938 | 1.5902 | 5.8625 | -0.5125 | 1.7617 | 7.4152 | ... | 3.8798 | -1.4699 | 7.5163 | -0.3752 | 2.9562 | 5.3035 | 4.4304 | -0.9406 | 3.6136 | 0.4233 |
1984 | Breast | 4.3184 | -6.5064 | 1.0433 | 4.8440 | 3.5498 | 4.6809 | 1.0293 | 3.3478 | 6.2136 | ... | 5.3256 | -0.0725 | 7.7516 | 1.1382 | 2.1411 | 7.1132 | 0.3796 | 0.0854 | 3.8650 | 1.0151 |
1985 | Brain | 3.4622 | -5.5735 | 1.5013 | 5.4835 | 1.7702 | 4.7517 | 0.6790 | -3.1714 | 5.3597 | ... | 1.1960 | 4.1740 | 4.3002 | 0.5470 | -0.9971 | 3.7982 | -0.2498 | 1.4808 | -0.5125 | -0.5125 |
1986 | Lung | 2.5585 | -1.7809 | 6.7916 | 6.5865 | 2.7051 | 4.9519 | 4.3618 | 3.1892 | 7.7121 | ... | 3.5779 | 2.8974 | 7.7685 | 4.8294 | 1.9149 | 5.9989 | 2.4117 | 2.4198 | 4.2080 | 1.0007 |
1987 rows × 28 columns
In [3]:
#pd.set_option('display.max_rows', None) #shows all rows, no set maximum to the number of rows displayed
#pd.set_option('display.max_columns', None) #shows all rows, no set maximum to the number of rows displayed
#pd.reset_option('display.max_rows') #back to default settings for rows displayed
#pd.reset_option('display.max_columns') #back to default settings for columns displayed
brain_integrins = integrins[integrins['primary_site'] == 'Brain']
brain_integrins
Out[3]:
primary_site | ITGA10 | ITGAD | ITGAM | ITGA3 | ITGBL1 | ITGAE | ITGA2 | ITGB3 | ITGA7 | ... | ITGA6 | ITGA2B | ITGB1 | ITGAL | ITGA9 | ITGB5 | ITGA8 | ITGA4 | ITGA1 | ITGA11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Brain | 0.5763 | -6.5064 | 2.2573 | 0.7832 | 1.0363 | 4.6035 | 2.5731 | -2.8262 | 4.9663 | ... | 2.8562 | 1.3846 | 5.8430 | 1.1316 | -0.7108 | 3.5387 | -0.0725 | -0.4521 | 0.2029 | -2.8262 |
8 | Brain | 2.2960 | -9.9658 | 0.6608 | 5.2840 | 0.4233 | 4.8510 | -0.2671 | -0.1031 | 4.3068 | ... | 1.5415 | 4.6623 | 3.4687 | 0.5666 | -0.0130 | 3.0654 | 0.7916 | 1.0433 | -0.7346 | -0.7588 |
10 | Brain | -0.2498 | -9.9658 | -0.8863 | 3.1685 | -1.6394 | 2.8158 | -0.4719 | -1.1488 | 2.5313 | ... | 1.6045 | 0.9268 | 2.8055 | -0.5973 | 0.4657 | 1.8918 | 0.3460 | 0.3907 | -1.9942 | -1.5522 |
12 | Brain | 1.6045 | -6.5064 | 2.3193 | 3.6335 | -2.3147 | 5.0670 | -0.8863 | -0.8084 | 5.3937 | ... | 3.2018 | 1.7575 | 4.6894 | 0.4125 | -0.6643 | 3.6916 | -0.6193 | -2.2447 | 1.2023 | -1.9942 |
14 | Brain | 2.8974 | -6.5064 | 1.9601 | 4.1836 | -0.8084 | 4.5892 | -0.5543 | 0.3460 | 5.7522 | ... | 3.6018 | 2.7931 | 4.7274 | -0.0574 | 1.2271 | 4.3793 | 0.8488 | -0.2159 | 2.1378 | -0.6416 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1977 | Brain | -0.3383 | -6.5064 | 1.6234 | 2.7487 | -2.2447 | 5.2415 | -0.8863 | -2.9324 | 4.7165 | ... | 2.1988 | 0.4016 | 4.5142 | -1.1811 | -0.8084 | 3.9983 | -1.0862 | -3.1714 | -0.7588 | -1.9379 |
1978 | Brain | 0.4447 | -5.5735 | 0.3231 | 3.5237 | -1.5105 | 4.9016 | 0.9419 | -2.7274 | 4.9547 | ... | 2.8178 | 1.3567 | 4.4621 | -0.2845 | 1.0222 | 3.3336 | 0.1903 | -1.0559 | 0.0300 | -0.4719 |
1980 | Brain | 0.6969 | -6.5064 | -0.9686 | 2.3760 | -2.2447 | 4.0739 | -0.6193 | -4.0350 | 4.8788 | ... | 2.7357 | 1.5806 | 4.6882 | -0.9971 | -0.5756 | 3.5136 | 0.9343 | -1.0862 | 0.4340 | -2.2447 |
1981 | Brain | 0.1124 | -5.0116 | 2.2482 | 2.8897 | -0.5125 | 4.6445 | 0.3115 | -3.6259 | 4.5110 | ... | 2.1147 | 0.9716 | 5.1202 | 0.6608 | 0.4761 | 3.2343 | 0.8408 | -0.0574 | -0.1828 | -2.5479 |
1985 | Brain | 3.4622 | -5.5735 | 1.5013 | 5.4835 | 1.7702 | 4.7517 | 0.6790 | -3.1714 | 5.3597 | ... | 1.1960 | 4.1740 | 4.3002 | 0.5470 | -0.9971 | 3.7982 | -0.2498 | 1.4808 | -0.5125 | -0.5125 |
1152 rows × 28 columns
Violin plots are useful for showing the distribution of numerical data compared with different groups.
In [4]:
#violin plot for all the genes of the brain
plt.figure(figsize = (16, 6))
sns.violinplot(data = brain_integrins)
plt.title("Integrin Genes of the Brain")
plt.xlabel("Integrin Genes")
plt.ylabel("Gene Expression Levels")
plt.show()
In [5]:
liver_integrins = integrins[integrins['primary_site'] == 'Liver']
liver_integrins
Out[5]:
primary_site | ITGA10 | ITGAD | ITGAM | ITGA3 | ITGBL1 | ITGAE | ITGA2 | ITGB3 | ITGA7 | ... | ITGA6 | ITGA2B | ITGB1 | ITGAL | ITGA9 | ITGB5 | ITGA8 | ITGA4 | ITGA1 | ITGA11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13 | Liver | -0.0277 | -4.2934 | -0.3201 | 0.4340 | -1.2828 | 2.8055 | -2.9324 | -1.9379 | 2.6940 | ... | 1.1960 | -2.6349 | 4.4758 | 2.8582 | -0.1031 | 4.0454 | -2.5479 | -1.0262 | 2.7465 | -2.8262 |
49 | Liver | -0.1828 | -0.8339 | -0.5973 | 0.5568 | 0.6880 | 3.1278 | -3.3076 | -0.7346 | 2.3366 | ... | 1.0779 | -2.9324 | 5.3169 | 2.5213 | 0.7664 | 4.3958 | -0.7346 | -1.1488 | 3.0110 | -2.9324 |
62 | Liver | -1.4699 | -3.8160 | 0.5271 | 2.1313 | 2.9148 | 2.9984 | -1.9942 | -0.0277 | 3.4007 | ... | 2.3164 | -1.7322 | 6.0885 | 2.2813 | 2.8462 | 5.4683 | -1.9942 | -1.1488 | 3.4183 | -0.0877 |
65 | Liver | -0.3940 | -4.6082 | 0.3346 | -0.1504 | -1.4699 | 2.6624 | -3.0469 | 0.5568 | 1.6327 | ... | 0.4340 | -1.5522 | 5.4611 | 1.4704 | 0.3907 | 4.9538 | -3.4580 | -2.9324 | 3.4451 | -3.1714 |
83 | Liver | -0.0425 | -1.1488 | -0.2498 | 0.5069 | 0.7916 | 2.9281 | -2.8262 | -0.4325 | 2.1411 | ... | 1.4441 | 0.2400 | 5.1993 | 3.0287 | 0.9191 | 4.4932 | -2.5479 | 0.0014 | 3.3745 | -1.4699 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1923 | Liver | -1.0559 | -2.3884 | 1.8078 | -0.0425 | -1.4305 | 2.5852 | -4.0350 | 0.2998 | 3.2220 | ... | 2.1509 | -1.9942 | 6.6547 | 2.2513 | 2.1509 | 5.4283 | -2.6349 | -0.5973 | 3.9728 | -2.5479 |
1924 | Liver | 0.8805 | -5.5735 | 0.8164 | 0.9642 | -1.9379 | 3.3952 | -3.6259 | -1.2828 | 2.2082 | ... | 0.9862 | -2.4659 | 5.2510 | 2.0844 | 0.7146 | 5.1863 | -2.5479 | -1.9379 | 3.8401 | -1.5951 |
1930 | Liver | 0.6608 | -6.5064 | -0.1031 | -0.4325 | -2.2447 | 3.3076 | -3.6259 | -1.6394 | 1.8160 | ... | 1.4652 | -0.9686 | 5.6221 | 2.0325 | 0.4761 | 4.9855 | -4.6082 | -1.6394 | 3.4251 | -3.1714 |
1954 | Liver | -1.1811 | -2.3884 | 0.7058 | 0.6239 | 1.2934 | 3.1813 | -3.4580 | -1.1172 | 2.6208 | ... | 1.7141 | -1.7809 | 5.8746 | 2.5388 | 1.9302 | 5.1615 | -2.3884 | -0.5332 | 3.8126 | -1.0262 |
1969 | Liver | -0.6873 | -3.4580 | -0.5125 | -0.3566 | -0.4921 | 3.0654 | -4.0350 | -1.5951 | 2.3337 | ... | 0.9493 | -1.9942 | 5.2563 | 2.5924 | -0.3752 | 4.5053 | -4.6082 | -2.2447 | 3.1458 | -2.8262 |
110 rows × 28 columns
In [6]:
#violin plot for all the genes of the liver
plt.figure(figsize = (16, 6))
sns.violinplot(data = liver_integrins)
plt.title("Integrin Genes of the Liver")
plt.xlabel("Integrin Genes")
plt.ylabel("Gene Expression Levels")
plt.show()
In [7]:
brain_liver_integrins = integrins[integrins['primary_site'].isin(['Brain', 'Liver'])] #filter data by organ, display both brain and liver data
#rearrange data
brain_liver_integrins_vertical = brain_liver_integrins.melt(id_vars = 'primary_site', var_name = 'integrin_gene', value_name = 'expression_levels')
brain_liver_integrins_vertical
Out[7]:
primary_site | integrin_gene | expression_levels | |
---|---|---|---|
0 | Brain | ITGA10 | 0.5763 |
1 | Brain | ITGA10 | 2.2960 |
2 | Brain | ITGA10 | -0.2498 |
3 | Brain | ITGA10 | 1.6045 |
4 | Liver | ITGA10 | -0.0277 |
... | ... | ... | ... |
34069 | Brain | ITGA11 | -1.9379 |
34070 | Brain | ITGA11 | -0.4719 |
34071 | Brain | ITGA11 | -2.2447 |
34072 | Brain | ITGA11 | -2.5479 |
34073 | Brain | ITGA11 | -0.5125 |
34074 rows × 3 columns
In [8]:
plt.figure(figsize=(16, 6))
sns.violinplot(x = 'integrin_gene', y = 'expression_levels', hue = 'primary_site', data = brain_liver_integrins_vertical, split = True, inner = 'quartile')
plt.title("Integrin Genes of the Brain vs. the Liver")
plt.xlabel("Integrin Gene")
plt.ylabel("Gene Expression Levels")
plt.legend(title = 'primary_site')
plt.show()
In [9]:
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler
integrins
Out[9]:
primary_site | ITGA10 | ITGAD | ITGAM | ITGA3 | ITGBL1 | ITGAE | ITGA2 | ITGB3 | ITGA7 | ... | ITGA6 | ITGA2B | ITGB1 | ITGAL | ITGA9 | ITGB5 | ITGA8 | ITGA4 | ITGA1 | ITGA11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Brain | 0.5763 | -6.5064 | 2.2573 | 0.7832 | 1.0363 | 4.6035 | 2.5731 | -2.8262 | 4.9663 | ... | 2.8562 | 1.3846 | 5.8430 | 1.1316 | -0.7108 | 3.5387 | -0.0725 | -0.4521 | 0.2029 | -2.8262 |
1 | Lung | 4.9137 | -3.6259 | 4.7307 | 7.1584 | 1.7702 | 4.9556 | 1.9149 | 2.6067 | 3.9270 | ... | 4.2412 | 4.1211 | 7.7256 | 4.4900 | 2.9281 | 6.1483 | 5.1867 | 2.6185 | 4.7856 | -0.0277 |
2 | Ovary | 2.3953 | -5.0116 | 1.4547 | 4.2593 | -0.7346 | 4.4149 | 0.2642 | 1.5216 | 4.3492 | ... | 3.6816 | 1.5465 | 7.2964 | -0.9406 | 2.7742 | 5.0414 | 2.0325 | 0.7579 | 2.2573 | 1.2516 |
3 | Lung | 4.0541 | -2.3147 | 4.5053 | 7.5651 | 4.1788 | 4.1772 | 5.3695 | 1.8444 | 4.5355 | ... | 4.9631 | 1.9149 | 7.9947 | 3.3911 | 2.8462 | 6.7683 | 4.1636 | 2.7951 | 5.3284 | 1.2147 |
4 | Breast | 2.0569 | -2.4659 | 3.3993 | 3.1311 | 3.0074 | 4.4977 | -1.7809 | 2.7139 | 7.8698 | ... | 4.7340 | 0.6332 | 7.3496 | -0.9406 | 2.5338 | 6.5696 | 1.7229 | -0.6416 | 3.1195 | 1.1050 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1982 | Lung | 5.3067 | -3.8160 | 4.9065 | 7.5810 | 5.8714 | 4.7345 | 2.6185 | 3.1095 | 5.2032 | ... | 5.6080 | 3.7324 | 8.2849 | 4.6201 | 3.6440 | 6.7052 | 5.1094 | 3.3364 | 5.8153 | 1.6604 |
1983 | Prostate | 2.9581 | -4.6082 | 1.1641 | 4.6938 | 1.5902 | 5.8625 | -0.5125 | 1.7617 | 7.4152 | ... | 3.8798 | -1.4699 | 7.5163 | -0.3752 | 2.9562 | 5.3035 | 4.4304 | -0.9406 | 3.6136 | 0.4233 |
1984 | Breast | 4.3184 | -6.5064 | 1.0433 | 4.8440 | 3.5498 | 4.6809 | 1.0293 | 3.3478 | 6.2136 | ... | 5.3256 | -0.0725 | 7.7516 | 1.1382 | 2.1411 | 7.1132 | 0.3796 | 0.0854 | 3.8650 | 1.0151 |
1985 | Brain | 3.4622 | -5.5735 | 1.5013 | 5.4835 | 1.7702 | 4.7517 | 0.6790 | -3.1714 | 5.3597 | ... | 1.1960 | 4.1740 | 4.3002 | 0.5470 | -0.9971 | 3.7982 | -0.2498 | 1.4808 | -0.5125 | -0.5125 |
1986 | Lung | 2.5585 | -1.7809 | 6.7916 | 6.5865 | 2.7051 | 4.9519 | 4.3618 | 3.1892 | 7.7121 | ... | 3.5779 | 2.8974 | 7.7685 | 4.8294 | 1.9149 | 5.9989 | 2.4117 | 2.4198 | 4.2080 | 1.0007 |
1987 rows × 28 columns
Starting with data where each sample (each row) has expression levels for many integrin genes, we had 28-dimensional data. Using a t-SNE plot can help us view it 2-dimensionally.
In [10]:
X = integrins.drop(columns='primary_site') # all expression_levels
labels = integrins['primary_site'] #separates features and labels
scaler = StandardScaler() #standardizes expression_levels
X_scaled = scaler.fit_transform(X)
tsne = TSNE(n_components = 2, random_state = 42) #2D t-SNE plot #in The Hitchhiker's Guide to the Galaxy, 42 is the "Answer to the Ultimate Question of Life, the Universe, and Everything."
X_2d = tsne.fit_transform(X_scaled)
#the concept is that the X_2d array looks like this:
#X_2d = [
#[ 2.35, -1.72],
#[ 3.10, 0.45],
#[-0.88, -2.91],
#...]
#where each row in X_2d is a sample (row in the integrins table), and each column corresponds to a t-SNE component (one of the two new dimensions created by t-SNE)
In [11]:
plt.figure(figsize = (10, 6))
sns.scatterplot(x = X_2d[:, 0], y = X_2d[:, 1], hue = labels, palette = 'tab10', s = 30) #hue = labels colors the points according to the primary site, tab10 is the color pallete, s = 30 sets the size of the points
plt.title('Integrin Gene Expression Levels')
plt.xlabel('t-SNE 1') #t-SNE components 1 and 2, the new coordinates of data points in the 2D (instead of 28D) space.
plt.ylabel('t-SNE 2')
plt.legend(title = 'Primary Site', bbox_to_anchor = (1.05, 1), loc = 'upper left')
plt.tight_layout()
plt.show()
In [ ]: