Gene Expression of Integrins¶

The sequence of nucleotides in the integrin genes code for integrin proteins. Integrin proteins facilitate interactions between cells and the extracellular matrix, allowing the cell's adhesion, migration, signaling, and differentiation, as well as cell-to-cell communication.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
In [2]:
integrins = pd.read_excel(r"C:\Users\QBPAM\Downloads\'25 summer BigData AI Cancer class by Yongmei Wang\gtex_integrin_7_organs.xlsx")
integrins
Out[2]:
primary_site ITGA10 ITGAD ITGAM ITGA3 ITGBL1 ITGAE ITGA2 ITGB3 ITGA7 ... ITGA6 ITGA2B ITGB1 ITGAL ITGA9 ITGB5 ITGA8 ITGA4 ITGA1 ITGA11
0 Brain 0.5763 -6.5064 2.2573 0.7832 1.0363 4.6035 2.5731 -2.8262 4.9663 ... 2.8562 1.3846 5.8430 1.1316 -0.7108 3.5387 -0.0725 -0.4521 0.2029 -2.8262
1 Lung 4.9137 -3.6259 4.7307 7.1584 1.7702 4.9556 1.9149 2.6067 3.9270 ... 4.2412 4.1211 7.7256 4.4900 2.9281 6.1483 5.1867 2.6185 4.7856 -0.0277
2 Ovary 2.3953 -5.0116 1.4547 4.2593 -0.7346 4.4149 0.2642 1.5216 4.3492 ... 3.6816 1.5465 7.2964 -0.9406 2.7742 5.0414 2.0325 0.7579 2.2573 1.2516
3 Lung 4.0541 -2.3147 4.5053 7.5651 4.1788 4.1772 5.3695 1.8444 4.5355 ... 4.9631 1.9149 7.9947 3.3911 2.8462 6.7683 4.1636 2.7951 5.3284 1.2147
4 Breast 2.0569 -2.4659 3.3993 3.1311 3.0074 4.4977 -1.7809 2.7139 7.8698 ... 4.7340 0.6332 7.3496 -0.9406 2.5338 6.5696 1.7229 -0.6416 3.1195 1.1050
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1982 Lung 5.3067 -3.8160 4.9065 7.5810 5.8714 4.7345 2.6185 3.1095 5.2032 ... 5.6080 3.7324 8.2849 4.6201 3.6440 6.7052 5.1094 3.3364 5.8153 1.6604
1983 Prostate 2.9581 -4.6082 1.1641 4.6938 1.5902 5.8625 -0.5125 1.7617 7.4152 ... 3.8798 -1.4699 7.5163 -0.3752 2.9562 5.3035 4.4304 -0.9406 3.6136 0.4233
1984 Breast 4.3184 -6.5064 1.0433 4.8440 3.5498 4.6809 1.0293 3.3478 6.2136 ... 5.3256 -0.0725 7.7516 1.1382 2.1411 7.1132 0.3796 0.0854 3.8650 1.0151
1985 Brain 3.4622 -5.5735 1.5013 5.4835 1.7702 4.7517 0.6790 -3.1714 5.3597 ... 1.1960 4.1740 4.3002 0.5470 -0.9971 3.7982 -0.2498 1.4808 -0.5125 -0.5125
1986 Lung 2.5585 -1.7809 6.7916 6.5865 2.7051 4.9519 4.3618 3.1892 7.7121 ... 3.5779 2.8974 7.7685 4.8294 1.9149 5.9989 2.4117 2.4198 4.2080 1.0007

1987 rows × 28 columns

In [3]:
#pd.set_option('display.max_rows', None)     #shows all rows, no set maximum to the number of rows displayed
#pd.set_option('display.max_columns', None)     #shows all rows, no set maximum to the number of rows displayed
#pd.reset_option('display.max_rows')      #back to default settings for rows displayed
#pd.reset_option('display.max_columns')      #back to default settings for columns displayed
brain_integrins = integrins[integrins['primary_site'] == 'Brain']
brain_integrins
Out[3]:
primary_site ITGA10 ITGAD ITGAM ITGA3 ITGBL1 ITGAE ITGA2 ITGB3 ITGA7 ... ITGA6 ITGA2B ITGB1 ITGAL ITGA9 ITGB5 ITGA8 ITGA4 ITGA1 ITGA11
0 Brain 0.5763 -6.5064 2.2573 0.7832 1.0363 4.6035 2.5731 -2.8262 4.9663 ... 2.8562 1.3846 5.8430 1.1316 -0.7108 3.5387 -0.0725 -0.4521 0.2029 -2.8262
8 Brain 2.2960 -9.9658 0.6608 5.2840 0.4233 4.8510 -0.2671 -0.1031 4.3068 ... 1.5415 4.6623 3.4687 0.5666 -0.0130 3.0654 0.7916 1.0433 -0.7346 -0.7588
10 Brain -0.2498 -9.9658 -0.8863 3.1685 -1.6394 2.8158 -0.4719 -1.1488 2.5313 ... 1.6045 0.9268 2.8055 -0.5973 0.4657 1.8918 0.3460 0.3907 -1.9942 -1.5522
12 Brain 1.6045 -6.5064 2.3193 3.6335 -2.3147 5.0670 -0.8863 -0.8084 5.3937 ... 3.2018 1.7575 4.6894 0.4125 -0.6643 3.6916 -0.6193 -2.2447 1.2023 -1.9942
14 Brain 2.8974 -6.5064 1.9601 4.1836 -0.8084 4.5892 -0.5543 0.3460 5.7522 ... 3.6018 2.7931 4.7274 -0.0574 1.2271 4.3793 0.8488 -0.2159 2.1378 -0.6416
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1977 Brain -0.3383 -6.5064 1.6234 2.7487 -2.2447 5.2415 -0.8863 -2.9324 4.7165 ... 2.1988 0.4016 4.5142 -1.1811 -0.8084 3.9983 -1.0862 -3.1714 -0.7588 -1.9379
1978 Brain 0.4447 -5.5735 0.3231 3.5237 -1.5105 4.9016 0.9419 -2.7274 4.9547 ... 2.8178 1.3567 4.4621 -0.2845 1.0222 3.3336 0.1903 -1.0559 0.0300 -0.4719
1980 Brain 0.6969 -6.5064 -0.9686 2.3760 -2.2447 4.0739 -0.6193 -4.0350 4.8788 ... 2.7357 1.5806 4.6882 -0.9971 -0.5756 3.5136 0.9343 -1.0862 0.4340 -2.2447
1981 Brain 0.1124 -5.0116 2.2482 2.8897 -0.5125 4.6445 0.3115 -3.6259 4.5110 ... 2.1147 0.9716 5.1202 0.6608 0.4761 3.2343 0.8408 -0.0574 -0.1828 -2.5479
1985 Brain 3.4622 -5.5735 1.5013 5.4835 1.7702 4.7517 0.6790 -3.1714 5.3597 ... 1.1960 4.1740 4.3002 0.5470 -0.9971 3.7982 -0.2498 1.4808 -0.5125 -0.5125

1152 rows × 28 columns

Violin plots are useful for showing the distribution of numerical data compared with different groups.

In [4]:
#violin plot for all the genes of the brain
plt.figure(figsize = (16, 6))
sns.violinplot(data = brain_integrins)
plt.title("Integrin Genes of the Brain")
plt.xlabel("Integrin Genes")
plt.ylabel("Gene Expression Levels")
plt.show()
No description has been provided for this image
In [5]:
liver_integrins = integrins[integrins['primary_site'] == 'Liver']
liver_integrins
Out[5]:
primary_site ITGA10 ITGAD ITGAM ITGA3 ITGBL1 ITGAE ITGA2 ITGB3 ITGA7 ... ITGA6 ITGA2B ITGB1 ITGAL ITGA9 ITGB5 ITGA8 ITGA4 ITGA1 ITGA11
13 Liver -0.0277 -4.2934 -0.3201 0.4340 -1.2828 2.8055 -2.9324 -1.9379 2.6940 ... 1.1960 -2.6349 4.4758 2.8582 -0.1031 4.0454 -2.5479 -1.0262 2.7465 -2.8262
49 Liver -0.1828 -0.8339 -0.5973 0.5568 0.6880 3.1278 -3.3076 -0.7346 2.3366 ... 1.0779 -2.9324 5.3169 2.5213 0.7664 4.3958 -0.7346 -1.1488 3.0110 -2.9324
62 Liver -1.4699 -3.8160 0.5271 2.1313 2.9148 2.9984 -1.9942 -0.0277 3.4007 ... 2.3164 -1.7322 6.0885 2.2813 2.8462 5.4683 -1.9942 -1.1488 3.4183 -0.0877
65 Liver -0.3940 -4.6082 0.3346 -0.1504 -1.4699 2.6624 -3.0469 0.5568 1.6327 ... 0.4340 -1.5522 5.4611 1.4704 0.3907 4.9538 -3.4580 -2.9324 3.4451 -3.1714
83 Liver -0.0425 -1.1488 -0.2498 0.5069 0.7916 2.9281 -2.8262 -0.4325 2.1411 ... 1.4441 0.2400 5.1993 3.0287 0.9191 4.4932 -2.5479 0.0014 3.3745 -1.4699
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1923 Liver -1.0559 -2.3884 1.8078 -0.0425 -1.4305 2.5852 -4.0350 0.2998 3.2220 ... 2.1509 -1.9942 6.6547 2.2513 2.1509 5.4283 -2.6349 -0.5973 3.9728 -2.5479
1924 Liver 0.8805 -5.5735 0.8164 0.9642 -1.9379 3.3952 -3.6259 -1.2828 2.2082 ... 0.9862 -2.4659 5.2510 2.0844 0.7146 5.1863 -2.5479 -1.9379 3.8401 -1.5951
1930 Liver 0.6608 -6.5064 -0.1031 -0.4325 -2.2447 3.3076 -3.6259 -1.6394 1.8160 ... 1.4652 -0.9686 5.6221 2.0325 0.4761 4.9855 -4.6082 -1.6394 3.4251 -3.1714
1954 Liver -1.1811 -2.3884 0.7058 0.6239 1.2934 3.1813 -3.4580 -1.1172 2.6208 ... 1.7141 -1.7809 5.8746 2.5388 1.9302 5.1615 -2.3884 -0.5332 3.8126 -1.0262
1969 Liver -0.6873 -3.4580 -0.5125 -0.3566 -0.4921 3.0654 -4.0350 -1.5951 2.3337 ... 0.9493 -1.9942 5.2563 2.5924 -0.3752 4.5053 -4.6082 -2.2447 3.1458 -2.8262

110 rows × 28 columns

In [6]:
#violin plot for all the genes of the liver
plt.figure(figsize = (16, 6))
sns.violinplot(data = liver_integrins)
plt.title("Integrin Genes of the Liver")
plt.xlabel("Integrin Genes")
plt.ylabel("Gene Expression Levels")
plt.show()
No description has been provided for this image
In [7]:
brain_liver_integrins = integrins[integrins['primary_site'].isin(['Brain', 'Liver'])]     #filter data by organ, display both brain and liver data

#rearrange data
brain_liver_integrins_vertical = brain_liver_integrins.melt(id_vars = 'primary_site', var_name = 'integrin_gene', value_name = 'expression_levels')
brain_liver_integrins_vertical
Out[7]:
primary_site integrin_gene expression_levels
0 Brain ITGA10 0.5763
1 Brain ITGA10 2.2960
2 Brain ITGA10 -0.2498
3 Brain ITGA10 1.6045
4 Liver ITGA10 -0.0277
... ... ... ...
34069 Brain ITGA11 -1.9379
34070 Brain ITGA11 -0.4719
34071 Brain ITGA11 -2.2447
34072 Brain ITGA11 -2.5479
34073 Brain ITGA11 -0.5125

34074 rows × 3 columns

In [8]:
plt.figure(figsize=(16, 6))
sns.violinplot(x = 'integrin_gene', y = 'expression_levels', hue = 'primary_site', data = brain_liver_integrins_vertical, split = True, inner = 'quartile')
plt.title("Integrin Genes of the Brain vs. the Liver")
plt.xlabel("Integrin Gene")
plt.ylabel("Gene Expression Levels")
plt.legend(title = 'primary_site')
plt.show()
No description has been provided for this image
In [9]:
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler

integrins
Out[9]:
primary_site ITGA10 ITGAD ITGAM ITGA3 ITGBL1 ITGAE ITGA2 ITGB3 ITGA7 ... ITGA6 ITGA2B ITGB1 ITGAL ITGA9 ITGB5 ITGA8 ITGA4 ITGA1 ITGA11
0 Brain 0.5763 -6.5064 2.2573 0.7832 1.0363 4.6035 2.5731 -2.8262 4.9663 ... 2.8562 1.3846 5.8430 1.1316 -0.7108 3.5387 -0.0725 -0.4521 0.2029 -2.8262
1 Lung 4.9137 -3.6259 4.7307 7.1584 1.7702 4.9556 1.9149 2.6067 3.9270 ... 4.2412 4.1211 7.7256 4.4900 2.9281 6.1483 5.1867 2.6185 4.7856 -0.0277
2 Ovary 2.3953 -5.0116 1.4547 4.2593 -0.7346 4.4149 0.2642 1.5216 4.3492 ... 3.6816 1.5465 7.2964 -0.9406 2.7742 5.0414 2.0325 0.7579 2.2573 1.2516
3 Lung 4.0541 -2.3147 4.5053 7.5651 4.1788 4.1772 5.3695 1.8444 4.5355 ... 4.9631 1.9149 7.9947 3.3911 2.8462 6.7683 4.1636 2.7951 5.3284 1.2147
4 Breast 2.0569 -2.4659 3.3993 3.1311 3.0074 4.4977 -1.7809 2.7139 7.8698 ... 4.7340 0.6332 7.3496 -0.9406 2.5338 6.5696 1.7229 -0.6416 3.1195 1.1050
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1982 Lung 5.3067 -3.8160 4.9065 7.5810 5.8714 4.7345 2.6185 3.1095 5.2032 ... 5.6080 3.7324 8.2849 4.6201 3.6440 6.7052 5.1094 3.3364 5.8153 1.6604
1983 Prostate 2.9581 -4.6082 1.1641 4.6938 1.5902 5.8625 -0.5125 1.7617 7.4152 ... 3.8798 -1.4699 7.5163 -0.3752 2.9562 5.3035 4.4304 -0.9406 3.6136 0.4233
1984 Breast 4.3184 -6.5064 1.0433 4.8440 3.5498 4.6809 1.0293 3.3478 6.2136 ... 5.3256 -0.0725 7.7516 1.1382 2.1411 7.1132 0.3796 0.0854 3.8650 1.0151
1985 Brain 3.4622 -5.5735 1.5013 5.4835 1.7702 4.7517 0.6790 -3.1714 5.3597 ... 1.1960 4.1740 4.3002 0.5470 -0.9971 3.7982 -0.2498 1.4808 -0.5125 -0.5125
1986 Lung 2.5585 -1.7809 6.7916 6.5865 2.7051 4.9519 4.3618 3.1892 7.7121 ... 3.5779 2.8974 7.7685 4.8294 1.9149 5.9989 2.4117 2.4198 4.2080 1.0007

1987 rows × 28 columns

Starting with data where each sample (each row) has expression levels for many integrin genes, we had 28-dimensional data. Using a t-SNE plot can help us view it 2-dimensionally.

In [10]:
X = integrins.drop(columns='primary_site')  # all expression_levels
labels = integrins['primary_site']     #separates features and labels

scaler = StandardScaler()     #standardizes expression_levels
X_scaled = scaler.fit_transform(X)

tsne = TSNE(n_components = 2, random_state = 42)     #2D t-SNE plot      #in The Hitchhiker's Guide to the Galaxy, 42 is the "Answer to the Ultimate Question of Life, the Universe, and Everything."
X_2d = tsne.fit_transform(X_scaled)
    #the concept is that the X_2d array looks like this: 
    #X_2d = [
    #[ 2.35, -1.72],
    #[ 3.10,  0.45],
    #[-0.88, -2.91],
    #...]
    #where each row in X_2d is a sample (row in the integrins table), and each column corresponds to a t-SNE component (one of the two new dimensions created by t-SNE)
In [11]:
plt.figure(figsize = (10, 6))
sns.scatterplot(x = X_2d[:, 0], y = X_2d[:, 1], hue = labels, palette = 'tab10', s = 30)     #hue = labels colors the points according to the primary site, tab10 is the color pallete, s = 30 sets the size of the points
plt.title('Integrin Gene Expression Levels')
plt.xlabel('t-SNE 1')     #t-SNE components 1 and 2, the new coordinates of data points in the 2D (instead of 28D) space.
plt.ylabel('t-SNE 2')
plt.legend(title = 'Primary Site', bbox_to_anchor = (1.05, 1), loc = 'upper left')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]: