Blood Donor Survey Results¶

These are the results of a survey that I conducted on r/SampleSize and r/Blooddonors in October of 2025. The dataset includes 179 responses.
Download the .csv dataset (CC0 license)
Download the .ipynb Jupyter Notebook.

The dataset contains the following features:

  • Blood Type: The respondent's blood type.
    ["A+", "A-", "B+", "B-", "AB+", "AB-", "O+", "O-", "I don't know"]

  • Known Blood Type: Whether the respondent knew their blood type before the first time they donated.
    ["Yes", "No", "I don't remember"]

  • First Donation Age: The age when the respondent first donated.
    [an integer]

  • Donation Count: The number of times a respondent has donated.
    [an integer]

  • 12 Month Donation: Whether the respondent has donated in the last 12 months.
    ["Yes", "No"]

InĀ [49]:
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
import seaborn as sns

df = pd.read_csv('blood_donor_survey.csv')

Section I: Results¶

These are the results for the five questions on the survey.

Survey Question 1: What is your blood type?
In addition to the eight blood types, respondents were given the option of selecting "I don't know".

InĀ [50]:
df_selected = df['Blood Type'].value_counts().reset_index()
ax = df_selected.plot(kind='bar', x='Blood Type', ylabel='Count', legend=False, rot=0)
for c in ax.containers:
    ax.bar_label(c)
plt.show()
df_selected = df_selected.groupby('Blood Type').sum()
df_selected.plot(kind='pie', y='count', ylabel='Percent', autopct='%1.0f%%', legend=False)
plt.show()
No description has been provided for this image
No description has been provided for this image

Survey Question 2: Did you know your blood type before the first time you donated blood?
Respondents were given the options of "Yes", "No", and "I don't remember."

InĀ [51]:
df_selected = df['Known Blood Type'].value_counts().reset_index()
ax = df_selected.plot(kind='bar', x='Known Blood Type', ylabel='Count', legend=False, rot=0)
for c in ax.containers:
    ax.bar_label(c)
plt.show()
df_selected = df_selected.groupby('Known Blood Type').sum()
df_selected.plot(kind='pie', y='count', ylabel='Percent', autopct='%1.0f%%', legend=False)
plt.show()
No description has been provided for this image
No description has been provided for this image

Survey Question 3: How old were you the first time you donated?
Respondents were instructed to make their best guess if they weren't sure.

InĀ [52]:
df_selected = df['First Donation Age'].reset_index()
df_selected.drop(columns='index', inplace=True)
df_selected.dropna(inplace=True)
print('Summary:')
print(df_selected.describe())
sns.boxplot(data=df_selected, x='First Donation Age')
plt.show()
Summary:
       First Donation Age
count          177.000000
mean            22.819209
std              7.894264
min             15.000000
25%             18.000000
50%             20.000000
75%             25.000000
max             51.000000
No description has been provided for this image

Survey Question 4: How many times have you successfully donated?
Respondents were instructed to make their best guess if they weren't sure.

InĀ [53]:
df_selected = df['Donation Count'].reset_index()
df_selected.drop(columns='index', inplace=True)
df_selected.dropna(inplace=True)
print('Summary:')
print(df_selected.describe())
sns.boxplot(data=df_selected, x='Donation Count')
plt.show()
Summary:
       Donation Count
count      178.000000
mean        35.814607
std         60.707360
min          1.000000
25%          5.000000
50%         15.000000
75%         37.500000
max        513.000000
No description has been provided for this image

Survey Question 5: Have you donated in the last 12 months?
Respondents were given the option to answer "I'm not sure," but no one selected that option.

InĀ [54]:
df_selected = df['12 Month Donation'].value_counts().reset_index()
ax = df_selected.plot(kind='bar', x='12 Month Donation', xlabel='Donation in Last 12 Months', ylabel='Count', legend=False, rot=0)
for c in ax.containers:
    ax.bar_label(c)
plt.show()
df_selected = df_selected.groupby('12 Month Donation').sum()
df_selected.plot(kind='pie', y='count', ylabel='Percent', autopct='%1.0f%%', legend=False)
plt.show()
No description has been provided for this image
No description has been provided for this image

Section II: Analysis¶

In this section, we will dive deeper into the data.

Analysis Question 1: How does the distribution of blood types in the survey results compare with the distribution of blood types worldwide?
For comparison, I used the data at the Wikipedia article on Blood type distribution by country.

InĀ [55]:
df_worldwide = pd.DataFrame({'Blood Type': ['O+', 'A+', 'B+', 'AB+', 'O-', 'A-', 'B-', 'AB-'], 
                             'Worldwide': [38.4, 27.3, 8.1, 2.0, 13.1, 8.1, 2.0, 0.01]})
df_selected = df['Blood Type'].value_counts().reset_index()
df_selected.drop(df_selected[df_selected['Blood Type'] == 'I don\'t know'].index, inplace=True)
df_selected['Survey'] = round(100 * df_selected['count'] / df_selected['count'].sum(), 1)
df_selected = pd.merge(left=df_selected, right=df_worldwide, on='Blood Type', how='outer')
df_melted = df_selected.melt(id_vars='Blood Type', value_vars=['Survey', 'Worldwide'],
                    var_name='Source', value_name='Proportion')
plt.figure(figsize=(10, 6))
ax = sns.barplot(data=df_melted, x='Blood Type', y='Proportion', hue='Source')
plt.ylabel('Proportion')
plt.xlabel('Blood Type')
plt.legend(title='Source')
for c in ax.containers:
    ax.bar_label(c)
plt.show()
No description has been provided for this image

Analysis Question 2: Is there a difference in the mean and median age a person first donates if they know their blood type beforehand?

InĀ [56]:
df_selected = df[['Known Blood Type', 'First Donation Age']].copy()
df_selected.dropna(inplace=True)
df_selected.drop(df_selected[df_selected['Known Blood Type'] == 'I don\'t remember'].index, inplace=True)
t_statistic, p_value = stats.ttest_ind(df_selected[df_selected['Known Blood Type'] == 'Yes']['First Donation Age'], 
                                       df_selected[df_selected['Known Blood Type'] == 'No']['First Donation Age'])
print('T-test results:')
print(f't-statistic: {t_statistic}')
print(f'p-value: {p_value}')
df_selected = df[['Known Blood Type', 'First Donation Age']].copy()
df_selected.dropna(inplace=True)
df_selected.drop(df_selected[df_selected['Known Blood Type'] == 'I don\'t remember'].index, inplace=True)
df_selected = df_selected.groupby('Known Blood Type', as_index=False)['First Donation Age'].mean()
df_selected.rename(columns={'First Donation Age': 'Mean First Donation Age'}, inplace=True)
df_selected['Mean First Donation Age'] = round(df_selected['Mean First Donation Age'], 1)
df_selected.sort_values(by='Known Blood Type', inplace=True)
ax = df_selected.plot(kind='bar', x='Known Blood Type', ylabel='Mean First Donation Age', legend=False, rot=0)
for c in ax.containers:
    ax.bar_label(c)
plt.show()
df_selected = df[['Known Blood Type', 'First Donation Age']].copy()
df_selected.dropna(inplace=True)
df_selected.drop(df_selected[df_selected['Known Blood Type'] == 'I don\'t remember'].index, inplace=True)
df_selected = df_selected.groupby('Known Blood Type', as_index=False)['First Donation Age'].median()
df_selected.rename(columns={'First Donation Age': 'Median First Donation Age'}, inplace=True)
df_selected.sort_values(by='Known Blood Type', inplace=True)
ax = df_selected.plot(kind='bar', x='Known Blood Type', ylabel='Median First Donation Age', legend=False, rot=0)
for c in ax.containers:
    ax.bar_label(c)
plt.show()
T-test results:
t-statistic: 3.6306131200670357
p-value: 0.0003775673779869992
No description has been provided for this image
No description has been provided for this image

Analysis Question 3: For people who knew their blood type before the first time they donated, which blood types had lower mean and median ages at which they first donated?

InĀ [57]:
df_selected = df[['Blood Type', 'Known Blood Type', 'First Donation Age']].copy()
df_selected.dropna(inplace=True)
df_selected = df_selected[df_selected['Known Blood Type'] == 'Yes']
df_selected = df_selected.groupby('Blood Type', as_index=False)['First Donation Age'].mean()
df_selected.rename(columns={'First Donation Age': 'Mean First Donation Age'}, inplace=True)
df_selected['Mean First Donation Age'] = round(df_selected['Mean First Donation Age'], 1)
df_selected.sort_values(by='Mean First Donation Age', inplace=True)
ax = df_selected.plot(kind='bar', x='Blood Type', ylabel='Mean First Donation Age', legend=False, rot=0)
for c in ax.containers:
    ax.bar_label(c)
plt.show()
df_selected = df[['Blood Type', 'Known Blood Type', 'First Donation Age']].copy()
df_selected.dropna(inplace=True)
df_selected = df_selected[df_selected['Known Blood Type'] == 'Yes']
df_selected = df_selected.groupby('Blood Type', as_index=False)['First Donation Age'].median()
df_selected.rename(columns={'First Donation Age': 'Median First Donation Age'}, inplace=True)
df_selected.sort_values(by='Median First Donation Age', inplace=True)
ax = df_selected.plot(kind='bar', x='Blood Type', ylabel='Median First Donation Age', legend=False, rot=0)
for c in ax.containers:
    ax.bar_label(c)
plt.show()
No description has been provided for this image
No description has been provided for this image

Analysis Question 4: Which blood types have higher mean and median counts of donations?

InĀ [58]:
df_selected = df[['Blood Type', 'Donation Count']].copy()
df_selected.dropna(inplace=True)
df_selected = df_selected.groupby('Blood Type', as_index=False)['Donation Count'].mean()
df_selected.rename(columns={'Donation Count': 'Mean Donation Count'}, inplace=True)
df_selected['Mean Donation Count'] = round(df_selected['Mean Donation Count'], 1)
df_selected.sort_values(by='Mean Donation Count', ascending=False, inplace=True)
ax = df_selected.plot(kind='bar', x='Blood Type', ylabel='Mean Donation Count', legend=False, rot=0)
for c in ax.containers:
    ax.bar_label(c)
plt.show()
df_selected = df[['Blood Type', 'Donation Count']].copy()
df_selected.dropna(inplace=True)
df_selected = df_selected.groupby('Blood Type', as_index=False)['Donation Count'].median()
df_selected.rename(columns={'Donation Count': 'Median Donation Count'}, inplace=True)
df_selected.sort_values(by='Median Donation Count', ascending=False, inplace=True)
ax = df_selected.plot(kind='bar', x='Blood Type', ylabel='Median Donation Count', legend=False, rot=0)
for c in ax.containers:
    ax.bar_label(c)
plt.show()
No description has been provided for this image
No description has been provided for this image