Note: You can find this analysis updated for 2020 here.
Dev fell in love with code at a young age.
He graduated from college with an engineering degree and then joined the Navy as a cyber warfare engineer. Pressured by his highly educated Persian parents, Dev returned to school after military service to complete a master's in computer science. He has spent his entire career since in software development, mostly coding in C++.
He realized early on that workloads were moving to the cloud and spent years retooling and upgrading his skill set, familiarizing himself with each of the major cloud vendors. He championed his team's transition from an on-premise Oracle database to a next-gen database hosted in the public cloud.
Always on the cutting edge, Dev regularly attends employer-sponsored training sessions and hackathons.
Dev worked with all of the major programming methodologies over the years but eventually settled on Agile, becoming a big fan.
Now a manager at a large, 15,000+ employee technology giant, he takes time to work out 3-4 times per week at the free company gym. Even after completing an agile sprint he keeps moving - jogging home after work most days.
After promotion, Dev gave his second monitor to one of his younger direct reports. He wasn't spending as many hours furiously coding as he used to, so the extra real estate felt unnecessary.
A big proponent of healthy work habits, last year he procured a full set of standing desks for his entire team and required that all team members eat lunch away from their workstations. Rarely rushed himself, he never skips meals.
During the week, Dev wakes up just after 11am. Every. Single. Day.
Must be nice right?
However, at 50 years of age, Dev worries if his career has already peaked. He has no plans to retire any time soon, but whispers of ageism in tech and seeing his older colleagues being laid off have him wondering if he's next.
That said, he knows he could make up lost income working independently as a freelancer. He has a hunch he'd make even more running solo.
A proud, loving father, Dev hopes his daughter follows in his footsteps. He sees a bright, better future for her. As far as he's concerned, the world is her oyster.
If it's not clear already, Dev's not a real person.
But his personal and professional characteristics really do correlate with higher income.
Age. Gender. Race.
Education. Professional experience. Programming languages.
Methodology. Hackathons. Even how many monitors he uses.
The traits that correspond with higher pay are not always what we would expect - nor necessarily what we'd hope.
The economics of developer pay is an important topic for me. Developers are the key input in the production of software. You cannot understand tech unless you understand how developers get paid—you just can't.
However, the publicly available analysis of this important issue is quite poor.
As covered in my last post, Stack Overflow conducts an annual survey of software developers, asking about various aspects of their careers, like income, job title, etc.
One disappointing feature of most pay analyses is that they show the data without controlling for any other variables. Avoiding any math more complex than simple averages, the typical analyses of pay have no way to understand a multivariate world.
Stack Overflow’s own write-up of the survey results does this at times, taking the simple average of income across all full-stack developers, for example, and comparing the same metric for DevOps specialists:
Statements like "Engineering managers, DevOps specialists, and data scientists command the highest salaries” are then made, which, though technically true, tend to mislead readers who are not well-versed in statistics into thinking that these are ceteris paribus (all things equal) comparisons ("X makes more than Y, all else equal"), which they are not.
To their credit, Stack Overflow at least takes the additional step of comparing these salaries against years of professional experience, but that is just one of many possible controls. In fact, the entire survey provides a rich set of potential controls to “hold equal” and thereby generate more intuitively accurate statements about potential relationships (“X developers make more than Ys who are otherwise similar”, which is a qualitatively and likely quantitatively different statement than “Xs make more than Ys").
Unfortunately, most public discussion of pay does not control for even a single variable. This isn’t lying with statistics so much as it is misleading with them.
Said simply—we don’t only want to know “how much more or less do 45-year-old developers make compared to 25-year-old developers.” More importantly:
“How much does a 45-year-old developer earn relative to a 25-year-old developer who is otherwise equivalent (in skills, experience, age, company size, country etc.)?”
Both are important questions, but to me, the latter is much more interesting and intuitive. It also happens to be much more difficult to answer—hence it is rarely attempted.
This is an attempt do better. Not perfect, but much better.
I break down the results by survey question, with a chart displaying the controlled effect of each trait on income, in addition to 95% confidence intervals. Correspondingly, any references to statistical significance represent p-values < 0.05.
The results represent a subset of approximately 11,000 U.S-based developers from the Stack Overflow survey.
If you’d like a nicely formatted report with the full set of results, enter your email below and you’ll receive an email with a download link when ready.
For more detail on the methodology, please see the appendix at the end.
Initially, age increases earnings. That said, the annualized compound gain is quite meager—about 0.2% through 45-54 years of age—with most of that happening before a developer turns 25.
More interesting is how the impact of age declines and turns negative (relative to a developer of 18-24 years of age) as a developer approaches 65:
Ageism could be playing a role. Tech companies, especially startups, are perceived to have a preference for younger employees. Evidence of ageism is often anecdotal, but this data is at least suggestive that there may be something real behind this concern.
Unfortunately, ageism is notoriously difficult to prove or disprove, exactly because it is exceedingly rare to find a 25-year-old who is, in every way other than age, the same as a 65-year-old.
This is certainly an issue worth exploring further.
Asian and middle eastern developers are paid much more than similar white developers, and the pay premium is likely larger than the pay discount faced by other minorities.
Black developers appear to earn 0.8% less than white developers with similar traits, though with a wide enough confidence interval to lose statistical significance.
Hispanic or latino/a developers earn 1.5% less than similar whites. Again, we see reasonably large confidence intervals due to lack of sufficient data.
That there is such a large pay premium for asian and middle eastern developers is an interesting factoid in itself and one that warrants further exploration
The size of these effects are especially impressive given the small proportion of the dataset these minority groups respectively represent:
That they are statistically significant even with few data points suggests these effects are quite real.
Pay discrimination is a serious issue that warrants rigorous analysis, much more than what I’ve done here.
The pay gap for female software developers is similar in magnitude and low statistical significance to that of black developers—roughly 1.3% plus or minus 2.4%.
Gender non-binary / non-conforming developers face a large pay discount of 10% relative to male developers.
Given very few observations in the dataset, the pay effect for transgender developers cannot be estimated precisely enough to conclude anything meaningful.
Gay and lesbian software developers appear to make 2.5% more than straight / heterosexual developers, but this estimate is not statistically significant.
In fact, given the confidence intervals above, none of the categories of sexual orientation are statistically significantly different from heterosexual.
Good news for parents—developers with dependents (children, grandparents etc.) earn 3.7% more than those who don’t, controlling for other factors.
There are a number of potential explanations for this
Former and current U.S. military service members earn 3% more than those without prior military service.
As with other traits, there are any number of potential drivers of the veteran pay premium, and it's inherently difficult to pinpoint the most likely explanation.
Advanced, non-professional degrees drive higher earnings. A masters or doctoral degree (PhD etc.) drives statistically significant gains of 3% and 10%, respectively, relative to an equivalent developer with only a bachelor’s.
Going to college but not completing is associated with a 3% cut in earnings, while graduating with an associate degree decreases earning by about 8% vs. a bachelor’s. Again both are statistically significant.
The effect of professional degrees like JDs and MDs is not statistically significant due to wide confidence intervals.
Not much can be said for developers that never reached the college level—due to lack of data, the confidence intervals are too wide to draw meaningful conclusions. However, these folks are almost certainly worse off than someone with an advanced degree.
Interestingly, computer science is not the best-paid college major.
Outside of these areas, college major is largely irrelevant to developer pay. The meme of the underpaid, over-educated sociology major doesn’t apply.
The exceptions are information systems / technology and web development / design degrees, which bear a pay discount of 2.4% and 10% respectively.
Your parent’s level of education matters as a software developer.
Developers with at least one parent holding a professional degree like a JD or MD will earn 5.3% more than a developer whose parents never earned more than a bachelor’s degree. Masters and doctoral degrees among parents are associated with a 2.2% increase in pay for the children, though this relationship narrowly misses statistical significance for doctoral degrees.
Associate degrees, some college, and high school all have significantly negative impacts of 2.8-4%.
Primary school and no formal education were too rare in the dataset to generate precise effect estimates.
It’s interesting to consider how these effects manifest themselves throughout the professional life of a developer
Participation in hackathons is clearly associated with higher pay—more than 4.3% higher, as are full-time developer training programs (bootcamps), which provide a 3% bump.
We’ve witnessed an explosion in the number of coding bootcamps over the past decade. The pay bump is equivalent in magnitude to a master’s degree, which is incredible given bootcamps take months to complete, not years, and are generally cheaper in tuition than an advanced degree.
Contributing to open source software, or OSS (certainly an educational experience, in a sense), has a positive but not statistically significant bump of 1.6%
Industry certifications get a big fat zero. No effect.
On the other hand, online programming courses, MOOCs, etc. are associated with a significant drop of slightly more than 2.4%. Given the proliferation of online software development courses over the years, this is an inconvenient finding.
Likewise, the type of person to participate in a hackathon likely has other, unobserved traits that drive higher income
In other words, correlation may not be causation here, even after controlling for many variables.
As an avid self-learner myself, I am disappointed to see that no form of self-learning seems to have a statistically discernible impact on earnings.
Self-learning is a great way to, in targeted fashion, learn exactly what you need to know about a certain topic or area of knowledge. For many, it is more effective than traditional teaching methods.
However, it doesn’t seem to impact pay in and of itself.
Professional development experience matters much more than casual coding. The gains from more years of general coding experience tend to plateau after 15 years, while professional coding experience continues to pay dividends well into the 30 year range.
The benefit from experience with casual coding (relative to 0-2 years) maxes out at 10-15% and actually begins to decline after 26 years.
The fastest gains from professional experience come in the first ten years.
After the first decade, the line is nearly straight, aside from a mid-career slump at the 25-year mark.
With that caveat, it is still encouraging to see that developer pay does not fully plateau or reverse course with additional professional experience over time.
No surprise here—working part-time does not help your earnings.
More interestingly, developers working as freelancers or independent contractors make more than those employed full-time
Reverse causation could also be at play here—someone who knows they could make more as a freelancer is exactly the kind of person who would become one.
It’s no surprise that managers and executives earn more. This includes product managers, who might not write significant amounts of code themselves.
The only non-manager role that makes meaningfully more than its counterparts is DevOps:
Conversely, there are a number of roles that make noticeably less despite similar levels of experience, education etc.
The main implication of these results—languages largely don’t matter to pay. The vast majority of languages do not provide a statistically significant pay bump or discount relative to jobs that don’t require that language. The pay scale for programming languages is quite flat across the universe of languages.
Hack is a clear outlier—working with Hack yields a 25% bump. However, don’t drop everything to go learn Hack right now
Outside of Hack, on the positive side we find Objective-C, C++, Go, VBA, and TypeScript with statistically significant income effects, ranging from 2.1 to 3.9 percent.
On the negative side we have JavaScript, VB.NET, CSS, and PHP
Please note that the results above are not saying “knowing language X increases / decreases pay.” Knowing a language is not likely to ever harm one’s pay, but working a dev job that uses it might (relative to other development work).
React—what needs to be said?
Working with .NET Core and PyTorch negatively impacts income
One word—cloud. Almost all of the databases with positive income effects are hosted in the cloud, often exclusively as a managed service by one of the major public cloud service providers (Google, Amazon, Facebook).
Google’s BigQuery data warehouse tops the list, followed closely by Amazon’s competing Redshift. Apache Hbase and Hive come next. Afterwards it’s Amazon’s DynamoDB and Microsoft’s Azure Tables, Cosmos DB, and SQL. The list continues with Memcached, Redis, Elasticsearch, Google Cloud Storage, and Amazon RDS and Aurora—all positive and statistically significant.
Older database technologies from legacy vendors dominate the negative end. This includes Oracle’s once ubiquitous databases, MySQL, SQL Server, MariaDB, and IBM’s Db2.
It’s fascinating to see the impact of the cloud revolution laid out in stark relief. There are meaningful income gains associated with working with next-gen cloud-enabled databases. This fact should be top of mind for developers looking to upgrade their skills and pay.
I’ll say it again—cloud. The various public clouds are the only platforms aside from Windows Phone with positive, statistically significant effects on developer earnings. Azure is associated with 4.1% higher pay, GCP 2.7%, AWS 1.7%.
And Windows Phone? Not going to try to explain that one.
Android, Arduino and Drupal had meaningful negative association with income—2.4%, 4.6%, and 8.5% respectively.
Android’s ubiquitous operating system needs no introduction, though it’s negative pay relationship might need an explanation. Unfortunately, I am stumped here.
Arduino is an open-source prototyping platform with associated hardware microcontrollers that eases development of electronic devices. Though used for serious development work, Arduino is also heavily used by engineering students as well, which likely drives the negative value.
Drupal is a PHP-based content management system whose popularity peaked in 2011 but has since been on a slow decline.
You might not think that a developer’s choice of IDE or text editors wouldn’t matter—and you’d be (largely) correct! The associated effects here are generally quite small.
However, IntelliJ stands out from the pack
Notepad++ and Atom are associated with slightly lower earnings, on the order of 2%, which is statistically significant. Notably, both are popular but are generally viewed more as text editors than fully-fledged IDEs.
PhpStorm, Xcode, and Coda also had negative effects that were statistically significant.
The incredibly popular Agile leads the pack among project management methodologies. Devs who use Agile in their development work earn 3.9% more than those who don’t.
The less common extreme programming methodology is associated with 2.7% higher pay.
Lean comes in at 2.5%, while Scrum developers earn 1.5% more, all else equal.
Prince2 appears to be at outlier, but this is mostly due to limited data—only 0.1% of developers in the sample use the methodology.
For the most part, the exact form of version control used does not matter.
The key takeaway here—any kind of version control is better than no version control, but please, please avoid copying and pasting files to a network share.
The best practice appears to be the best paid
Not to sound like a broken record, but the reverse causation caveat applies here as well. It is completely plausible that checking in code more often increases a developers pay. It’s also possible that companies that pay more are more likely to follow and mandate development best practices, such as frequent code check-ins.
It seems too convenient that in a survey run by Stack Overflow that its own enterprise product would be associated with significantly higher developer pay, but I can only go where the data takes me. Developers using Stack Overflow Enterprise earn 9.4% more than those who don’t.
This likely picks up some effect of simply working at a company that uses Stack Overflow Enterprise, which may be higher paying on average.
Other communication tools had smaller positive impacts like the incredibly popular Slack, Atlassian’s Confluence, HipChat, and internal intranet sites (wikis, Google sites etc), which all have earnings effects in the 2-3% range.
Trello and Facebook are negative enough to be statistically significant.
Surprisingly, developers who use two monitors earn 2.8% less than those who use only a single monitor, all else equal.
At this time, it’s not clear why more monitors would negatively impact earnings. The potential for enhanced productivity may be swamped by the temptation to use the additional screen real estate for non-work activities.
No other monitor count had a statistically significant effect on income relative to one.
These results are admittedly difficult to interpret. There is no clear linear time trend for the impact of wake up time on earnings.
After 7am, later does appear to be somewhat better—up to a point. Strangely, developers between 11am and 12pm see 15.4% higher pay than those who are up before 5am.
The most important takeaway is this—have a set schedule. This will do more good than optimizing for a specific wake-up time. Not having a regular wake up time was associated with 7.7% lower pay for software developers.
Exercise is strongly associated with earnings. While only exercising once or twice a week does not appear impactful, exercising 3 times or more per week is associated with 2-2.9% higher pay than a similar developer who does not exercise at all.
It is possible however that reverse causation could cause developers who earn more to work out more—perhaps because they have more time or can more easily afford a gym membership. Alternatively, higher paying companies often have gyms on-premises, making it easier to work out more.
Don’t skip meals. Any amount of meal skipping was associated with lower pay, though never statistically significant. Developers are not rewarded for this unhealthy work habit.
Among ergonomic devices, standing desks were associated with 3.2% higher pay, which is statistically significant.
Again, higher paying companies are potentially more like to provide employees with standing desks, so the direction of causality here is questionable.
Time spent at the computer each day does not have a meaningful relationship with pay. Handcuffing yourself to your laptop is not going to earn you higher pay as a developer.
Unlike time spent at the computer, time spent outside does have an impact on pay, with the ideal amount being 1-2 hours. Spending fewer than 30 minutes outside is associated with 2.4% lower pay than 30 minutes to an hour.
Spending more than 4 hours outside was associated with lower pay, but there were not enough developers who do this regularly to generate a precise estimate.
There are many important takeaways from the study and the charts above.
This analysis is a first attempt at exploring the various factors that affect developer pay. To that end, I hope that it is illuminating and informative.
However, as with many attempts to answer difficult questions, the analysis raises as many questions as it answers.
I am publishing the full code to reproduce this analysis because I believe open source, replicable research is the key to the robust advancement of knowledge
I care deeply about the technology industry. But solving its issues and compounding its strengths demands a rigorous understanding of its component elements. Developers are a critical piece of the tech puzzle, and they deserve our attention.
As in my last post, I leverage data from Stack Overflow’s annual software developer survey, which asks about income, in addition to many other questions of interest.
Examples include
I’m interested in how answers to these questions affect income. While it’s impossible to completely avoid issues of reverse causality or correlation rather than causation, by using regression augmented with machine learning techniques, described here, we can have greater confidence that our results accurately represent the true relationship income.
Essentially, we’ll analyze how each possible answer affects income, holding all other answers constant.
I limit the dataset to only non-retired US respondents above the age of 18 with income between $10,000-250,000. Responses above $250K have a higher tendency to be troll (i.e. made up) responses, which I’d like to exclude, and few answers come in above this threshold regardless.
This leaves us with a dataset of approximately 11,000 developers.
I estimate the following equation on the data:
$$\log(income) = \beta_0 + \beta_1T + \beta_2X + \epsilon$$
Where \(T\) is our trait of interest, \(\beta_1\) is the effect of that trait on income relative to the base category, \(X\) is a set of controls (in our case, the respondent's answers to other questions in the survey), \(\beta_2\) is the set of effects for each respective control, and \(\epsilon\) is the irreducible error in our estimate.
Assuming we’ve included a “complete” and “correct" set of controls, this should provide a reasonably accurate estimate of the relationship of the trait of interest, \(T\), with income. The log transformation of income means our results will be in roughly percentage terms, implying that trait \(T\) is associated with an increase in income of \(\beta_1*100\%\) relative to the “base" category, all else equal. The base category will vary by trait.
Selecting the right set of controls is non-trivial. One has any number of degrees of freedom to select a subset of controls among all those available, opening the doors to “p-hacking” and other infamous behavior, which can lead to incorrect (biased) estimates of the parameters of interest.
To avoid this, I leverage a powerful machine learning technique, Double Selection (specifically Double Lasso Selection), to do principled covariate selection for each trait \(T\), rerunning the regression for every trait. As described in Belloni, Chernozhukov, Hansen (2011), this should provide a more accurate estimate of the income effect of each trait than simply using all the covariates or attempting to manually select a subset. I won’t cover the method in detail here, but refer to the original paper for more information. This post provides a relatively intuitive explanation, and is also where I originally learned of the technique.
Long story short, Double Selection makes us much more confident that the results represent the accurate effects.
This post has been published on www.productschool.com communities