Dashboard for Doctorates Awarded in the United States
I believe that for graduate students, what to do after graduation may not have been decided yet. Is it a good choice to continue to study for a PhD, or is it a wise choice to find a job in the industry? What will doctoral degrees bring to students? Which field of doctoral degrees are more popular?
To answer these questions, I found some historical data about doctorates awarded in the United States to display some information behind the data. In this post, I mainly studied the following questions:
- What is the trend in the number of doctorates awarded in the United States over time?
- Over time, what are the trends in the number of doctorates awarded in the United States in different major fields?
- Over time, what are the distribution of the postgraduation plans?
- For those doctorates with a definite commitment, what's the expected salary in differnt jobs?
You can find my Github repository for this blog here.
Overview of Annual Doctorates Awarded in the United States
First of all, I'd like to get a rough idea of the trend in the number of doctorates awarded in the United States each year. I use data in Table 1 (Data Source: https://ncses.nsf.gov/pubs/nsf19301/data) and draw a bar graph.
## data preprocessing
xls = pd.ExcelFile('sed17-sr-tab001.xlsx')
df1 = xls.parse('Table 1', skiprows=3, index_col=None, na_values=['NA'])
## draw bar plot
phds_over_time = df1.plot(x = 'Year', y = 'Doctorate recipients', kind = 'bar', title = 'Number of Doctorate Recipients from 1958 to 2017')
phds_over_time.update_layout(
yaxis_title = 'Doctorate Recipients')
From the above graph, it is easy to see that the overall trend is increasing, showing that more and more people are choosing to study for a PhD. However, this does not mean that all fields are equally popular for those who choose to pursue a PhD.
Major Fields of Doctorates Awarded in the United States
In order to figure out which fields are more popular, I drew a stacked bar plot (Data Source: Table 12 from https://ncses.nsf.gov/pubs/nsf19301/data) to show the change in fields distribution over time.
## data preprocessing
xls = pd.ExcelFile('sed17-sr-tab012.xlsx')
df2 = xls.parse('Table 12', skiprows=5, index_col=None, na_values=['NA'])
df22 = df2.iloc[:,::2]
df22.columns=['All'] + list(range(1987, 2018, 5))
categories = ['Life sciences', 'Physical sciences and earth sciences', 'Mathematics and computer sciences', 'Psychology and social sciences', 'Engineering', 'Education', 'Humanities and arts', 'Othera']
df23 = df22[df22['All'].isin(categories)]
df23 = df23.set_index('All')
df23 = df23.T.reset_index()
df23 = df23.rename(dict(index='Year'), axis=1)
df24 = pd.melt(df23, id_vars=['Year'], value_name='percent', var_name='Field')
df24['percent'] = df24['percent'].astype(float)
## draw stacked barplot
majors_over_time = px.bar(df24, x="Year", y="percent", color="Field", title='Percent of doctorate recipients by Major Field')
It is worth noting that the number of doctorates awarded in Life Sciences, Engineering, and Mathematics and computer sciences continues to increase, while the number of doctorates awarded in Education continues to decrease. This indicates that with the rapid development of technology, especially the Internet, people's interests in data, computers and engineering has increased significantly. And it may also indicate that the future competition in these fields will become more and more intense. One following question is what are the popular post-graduation plans for doctorate recipients. How many percent of them choose to do a postdoctoral study? What percentage of them choose employment? Which employment sectors are common?
Postgraduation plans of doctorate recipients by Field
In order to obtain all specific postgraduation plans for doctorate recipients, I first dive in Table 42 (Data Source: https://ncses.nsf.gov/pubs/nsf19301/data). This table contains what percentage of doctorate recipients who have a definite commitment. Then, I link it to Table 44, from which you can get what percentage of doctorate recipients with a definite commitment will do a postdoctoral study or employment. After that, I link these two tables to Table 46, where you can get the employment sectors for those who are determined to choose employment.
Thus, I obtain the percentage of doctorate recipients who did not have a definite commitment, those who are committed to doing a postdoctoral study, and the percentage of those who will work in specific employment sectors. I create an interactive bar plot to display the distribution of their postgraduation plans over time.
## prepare the no definite commitment data -- Table 42
xls = pd.ExcelFile('sed17-sr-tab042.xlsx')
df3 = xls.parse('Table 42', skiprows=3, index_col=None, na_values=['NA'])
definite = df3.iloc[13:18,2:]
nodefinite = df3.iloc[19:,2:]
## prepare the postdoctoral study data -- Table 44
xls = pd.ExcelFile('sed17-sr-tab044.xlsx')
postdoctoral = xls.parse('Table 44', skiprows=3, index_col=None, na_values=['NA'])
postdoctoral = postdoctoral.iloc[19:,2:]
## calculate the employment in all Postgraduation commitment
employment = pd.DataFrame(definite.values * (1 - postdoctoral.values / 100), columns=definite.columns)
## calculate the postdoctoral study in all Postgraduation commitment
postdoctoral = pd.DataFrame(definite.values * postdoctoral.values / 100, columns=definite.columns)
## prepare the employment data -- Table 46
xls = pd.ExcelFile('sed17-sr-tab046.xlsx')
df4 = xls.parse('Table 46', skiprows=3, index_col=None, na_values=['NA'])
academe = pd.DataFrame(employment.values * df4.iloc[7:12, 2:] / 100, columns=definite.columns)
government = pd.DataFrame(employment.values * df4.iloc[13:18, 2:] / 100, columns=definite.columns)
industry = pd.DataFrame(employment.values * df4.iloc[19:24, 2:].replace('D',0) / 100, columns=definite.columns)
nonprofit = pd.DataFrame(employment.values * df4.iloc[25:30, 2:].replace('D',0) / 100, columns=definite.columns)
other = pd.DataFrame(employment.values * df4.iloc[31:, 2:].replace('D',0) / 100, columns=definite.columns)
## draw bar plots with a slider
commitment = go.Figure()
# Add traces, one for each slider step
for step in range(5):
commitment.add_trace(
go.Bar(name='Postdoctoral Study', x=postdoctoral.columns, y=postdoctoral.iloc[step,:], visible = False)
).add_trace(
go.Bar(name='Academe', x=academe.columns, y=academe.iloc[step,:], visible = False)
).add_trace(
go.Bar(name='Government', x=government.columns, y=government.iloc[step,:], visible = False)
).add_trace(
go.Bar(name='Industry or Business', x=industry.columns, y=industry.iloc[step,:], visible = False)
).add_trace(
go.Bar(name='Nonprofit Organization', x=nonprofit.columns, y=nonprofit.iloc[step,:], visible = False)
).add_trace(
go.Bar(name='Other or unknown Employment', x=other.columns, y=other.iloc[step,:], visible = False)
).add_trace(
go.Bar(name='No Definite Commitment', x=nodefinite.columns, y=nodefinite.iloc[step,:], visible = False)
)
commitment.data[0].visible = True
commitment.data[1].visible = True
commitment.data[2].visible = True
commitment.data[3].visible = True
commitment.data[4].visible = True
commitment.data[5].visible = True
commitment.data[6].visible = True
# Create and add slider
steps = []
for i in range(len(commitment.data) // 7):
step = dict(
method="update",
args=[{"visible": [False] * len(commitment.data)},
{"title": "Postgraduation Plan by Major Field in " + str(range(1997, 2022, 5)[i])},
], # layout attribute
label = range(1997, 2022, 5)[i]
)
step["args"][0]["visible"][7 * i] = True # Toggle i'th trace to "visible"
step["args"][0]["visible"][7 * i + 1] = True
step["args"][0]["visible"][7 * i + 2] = True
step["args"][0]["visible"][7 * i + 3] = True
step["args"][0]["visible"][7 * i + 4] = True
step["args"][0]["visible"][7 * i + 5] = True
step["args"][0]["visible"][7 * i + 6] = True
steps.append(step)
sliders = [dict(
active=0,
currentvalue={"prefix": "Year "},
pad={"t": 100},
steps=steps,
yanchor = 'top'
)]
commitment.update_layout(
sliders=sliders,
yaxis_title="percent",
title_x = .50,
title_text="Postgraduation Plan by Major Field"
)
commitment.show()
The plot reveals a lot of information. Overall, the distribution of postgraduation plans tends to be more equally distributed among different major fields as time goes by. In 1997, most of the doctorate recipients that plan to do postdoctoral studies are in Life sciences and Physical sciences and earth sciences. In 2017, more doctorate recipients with other majors plan to do postdoctoral studies, especially those who majored in Mathematics and computer sciences, Psychology and social sciences, and Engineering. Specifically, for doctorate recipients who majored in Mathematics and computer sciences or Engineering, the percentage of pursuing a postdoctoral study continues to increase. However, the percentage of doctorate recipients who majored in Life sciences continues to decrease, indicating that some of them now change to choose employment after graduation.
Median expected basic annual salary of doctorate recipients with a definite commitment
Salary are also an important factor when making postgraduation plans. To obtain the expected salary with different postgraduation plans, I first dive in Table 48 (Data Source: https://ncses.nsf.gov/pubs/nsf19301/data). This table contains the median expected basic annual salary of doctorate recipients who have a definite commitment. Then, I link it to Table 49, from which you can get the median expected basic annual salary of doctorate recipients in different employment sectors.
Therefore, I obtain the median expected basic annual salary of doctorate recipients who are committed to doing a postdoctoral study, and the median expected basic annual salary of those who will work in employment sectors. I create a sunburst chart to display the distribution of their median expected basic annual salary.
It is clear that the expected salary for postdoctoral study is much lower than employment. And within employment, jobs in industry have the highest expected salary, while jobs in academe have the lowest one.
Thanks for reading. I hope this post can provide some information for those who have not yet decided on their graduation plan.