Cognitive Study (First Update)

Elan Ding

Modified: Aug 13, 2018

I have combined all data into a single csv file called cognition.csv.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
df = pd.read_csv('cognition2.csv')

The dataset cognition2.csv is an udpated version of the previous cognition.csv. Mostly we included more data for the second visit. To get an idea, let's see a countplot.

In [3]:
plt.rcParams['figure.figsize'] = (10, 8)
sns.countplot(df['Visit'].astype(str), data=df)
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x10efe0b70>

Exploratory Data Analysis

Objective 1: Plotting cognitive scores and total QOL scores stratified by disease (MM vs Control).

In [4]:
sns.lmplot(x='Cog Score', y='Total Score', hue='Disease', data=df, size=8)
Out[4]:
<seaborn.axisgrid.FacetGrid at 0x10ef86e48>

It looks like that among MM patients, the cog score is inversely related to the QOL scores. For the control the relationship is opposite. This makes sense.

Objective 2: Does sequence affect cognitive scores?

In [5]:
sns.boxplot(x = 'Sequence', y='Cog Score', data=df)
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x114372c88>

Looks like that sequence A gives higher score. This is just a minor observation.

Objective 3: Is gender a significant factor?

In [6]:
sns.countplot(df['Gender'].astype(str), data=df)
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x114bb2898>
In [7]:
sns.boxplot(x = 'Gender', y='Cog Score', data=df)
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x114be9320>
In [8]:
sns.boxplot(x = 'Gender', y='Cog Score', hue='Disease', data=df)
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x114d78748>
In [9]:
sns.boxplot(x = 'Gender', y='Total Score', data=df)
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x114f39668>

It appears that gender is not a very significant factor.

Objective 4: Is there a difference between MM patients and control?

In [10]:
sns.countplot(df['Disease'].astype(str), data=df)
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x1150ca198>
In [11]:
sns.boxplot(x = 'Disease', y='Cog Score', data=df)
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x11510e0b8>
In [12]:
sns.boxplot(x = 'Disease', y='Cog Score', hue='Visit', data=df)
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1bf5fb70>

Here we observe that among MM patients, the second visit gives a lower cognitive score, while among the controls, the second visit produced a higher score. Update: The trend remains; however, as we have more data for the second visit, the decrease in cog score among MM patients appears to be weaker than before.

Statistical Analysis for Objective 4

Although we see a slight decrease in cog scores among the MM patients and slight increase in cog scores among the control, they are not statistically significant at this point.

In [13]:
visit1_MM = df[(df['Disease']=='MM') & (df["Visit"]==1)]["Cog Score"]
visit2_MM = df[(df['Disease']=='MM') & (df["Visit"]==2)]["Cog Score"]

visit1_C = df[(df['Disease']=='C') & (df["Visit"]==1)]["Cog Score"]
visit2_C = df[(df['Disease']=='C') & (df["Visit"]==2)]["Cog Score"]

print("Among the MM group, there are {} first visits, with mean equal to {}.".format(len(visit1_MM),np.mean(visit1_MM)))
print("Among the MM group, there are {} second visits, with mean equal to {}.".format(len(visit2_MM), np.mean(visit2_MM)))
print("Among the C group, there are {} first visits, with mean equal to {}.".format(len(visit1_C), np.mean(visit1_C)))
print("Among the C group, there are {} second visits, with mean equal to {}.".format(len(visit2_C), np.mean(visit2_C)))
Among the MM group, there are 15 first visits, with mean equal to 47.75555555533333.
Among the MM group, there are 6 second visits, with mean equal to 45.625.
Among the C group, there are 13 first visits, with mean equal to 47.96153846153846.
Among the C group, there are 4 second visits, with mean equal to 52.6875.
In [14]:
from scipy.stats import ttest_ind as ttest

t_MM, p_MM = ttest(visit1_MM, visit2_MM)
t_C, p_C = ttest(visit1_C, visit2_C)

print("Comparing cog scores between two visits among the MM group, the p-value is {}.".format(p_MM))
print("Comparing cog scores between two visits among the MM group, the p-value is {}.".format(p_C))
Comparing cog scores between two visits among the MM group, the p-value is 0.5993131238028242.
Comparing cog scores between two visits among the MM group, the p-value is 0.2904097549666309.
In [15]:
sns.boxplot(x = 'Disease', y='Physical Score', hue='Visit', data=df)
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1c149ba8>
In [16]:
sns.boxplot(x = 'Disease', y='Social Score', data=df)
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1c321e10>
In [17]:
sns.boxplot(x = 'Disease', y='Social Score', hue='Visit', data=df)
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1c4a9940>
In [18]:
sns.boxplot(x = 'Disease', y='Economic Score', data=df)
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1c688ba8>
In [19]:
sns.boxplot(x = 'Disease', y='Economic Score', hue='Visit', data=df)
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1c8147f0>
In [20]:
sns.boxplot(x = 'Disease', y='Functional Score', data=df)
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1c9facc0>
In [21]:
sns.boxplot(x = 'Disease', y='Functional Score', hue='Visit', data=df)
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1cb86940>
In [22]:
sns.boxplot(x = 'Disease', y='Additional Score', data=df)
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1cd5a7b8>
In [23]:
sns.boxplot(x = 'Disease', y='Additional Score', hue='Visit', data=df)
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1cef4e10>
In [24]:
sns.boxplot(x = 'Disease', y='Total Score', data=df)
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1d0d0080>
In [25]:
sns.boxplot(x = 'Disease', y='Total Score', hue='Visit', data=df)
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1d262588>

Interestingly, we see an improvement between first and second visits in both MM and control groups.