Does data science have a dangerous gender gap?
A new report from the Alan Turing Institute identifies a worrying gender imbalance in a fast-growing and significant sector. PublicTechnology talks to the report’s authors about the scale of the problem and how it might be addressed.
Credit: Adobe Stock
“There is a troubling and persistent absence of women.”
So begins a report, published today by the Alan Turing Institute. While it examines the gender gap in the field of data science and artificial intelligence, the words above could surely have been written about any and all parts of the technology sector at any point in recent history.
But the report identifies important reasons for its particular area of focus.
“The existing evidence base about gender diversity in the AI and data science workforce is severely limited,” it says. “The available data is fragmented, incomplete and inadequate for investigating the career trajectories of women and men in the fields.”
Top-line data from the World Economic Forum finds that just 26% of the global workforce in the field are female.
In the UK, the figure is 22% – and just 9% for engineering roles, and 14% for another emerging technology area: cloud computing.
Clarifying, contextualising – and, ultimately, improving – these numbers is significant because of the increasing importance and expansion of these sectors, report co-author professor Judy Wajcman, of the London School of Economics and a fellow at the Turing, tells PublicTechnology.
"At big tech companies where, you're encouraged to be there all day and all night, and eat and go to the gym there. This is a world in which people without children and without caring responsibilities – which turns out are predominantly young men – can flourish."
Judy Wajcman, Turing fellow
“These new fields are exciting, well-paid and hugely expanding,” she says. “Discussions about the fourth industrial revolution are all about where are the jobs going to grow? What sectors are they going to grow in? The high end is going to be in this AI and data science area. And, so, we want equity to be represented in that area – particularly women's opportunities in that area being equal with men.”
She adds: “Here we have a completely new career in the 21st century – where we hope we've got equality, and we've got equality laws and all of these things. And yet we find, when we look in detail at the careers of men and women within these occupations within these fields, there does seem to be a very clear gender differentiation, right across status, pay, speciality, and skill.”
Data compiled by magazine Wired and software firm Element AI finds that, at major technology companies, no more than 15% of cited artificial intelligence researchers are women; Google AI’s records show that that only about 60 of 641 listed machine intelligence specialists are women. Across the three biggest conferences in the sector in 2017, only 12% of contributors were female. Once again, the comparable figure for the UK – 8.2% – lags even this low number.
Participation in the main online communities for sector professionals reveals even further under-representation of women. For each of Data Science Central, Kaggle, and OpenML, the proportion of female users is 17-18%; for Slack Overflow, it is just 7.9%.
Women in the data science and AI field are also, the Turing report finds, liable to be employed in lower-status and lower-paid jobs than their male peers.
Proportion of people employed in AI and data science in the UK that are women
Number of female machine specialists in Google AI's research pages - out of 641
Proportion of women in the field that have a graduate or undergraduate degree - compared with 55% of men
Proportion of female contributors to the world's three largest AI conferences in 2017
Source: Where are the women? Mapping the gender job gap in AI - Alan Turing Institute
“Women have more data preparation and exploration skills, whereas men have more machine learning, big data, general purpose computing and computer science skills; the latter are traditionally associated with more prestigious and higher paying careers,” it says. “Men predominate in engineering, architecture and development jobs, while women do so in analytics and research.”
Despite which, the report reveals that women across the sector tend to be better-qualified; 59% possess a graduate or post-graduate degree, compared with 55% of men, according to analysis of data drawn from almost 20,000 LinkedIn profiles.
However, this assessment also finds that “women are more likely to self-report fewer skills than men”.
The largest cohorts of female profiles list between 15 and 25 skills. Whereas comfortably the biggest portion of men – 17% – namechecked 50 separate areas of expertise. This compares with just 11% of women.
The importance of calculating – and closing – the gender gap goes beyond just ensuring equal opportunities and pay.
“Most importantly, these fields are massively shaping society,” says Dr Erin Young, another Turing fellow and co-author of the report. “And they will be threaded into, essentially, every aspect of society in the next couple of decades, so they're very, very important to look at. And, I think, alongside issues of equity, we have to also look at issues of bias in AI, and how these imbalances in AI and data science builds can shape biases in AI system – which then will be used across society.”
The report cites a number of examples in which such biases have already played out.
Among them a machine learning-powered recruitment tool developed by Amazon that discriminated against female applicants on the basis of their gender, a social media chatbot that swiftly learned the language of racist and misogynistic hate speech, and an algorithm for generating pictures that tended to fill out cropped pictures of men by dressing them in suits, while women were kitted out in bikinis.
“Marketing algorithms have disproportionally shown scientific job advertisements to men,” the report adds. “The introduction of automated hiring is particularly concerning, as the fewer the number of women employed within the AI sector, the higher the potential for future AI hiring systems to exhibit and reinforce gender bias.”
Natural language processing – which involves the application of AI to analyse large volumes of audio or written language – is commonly used in programs such as autocorrect tools, email filters, automated translators, and smart assistants. It has even been used by the government to support the work of policymaking and negotiating teams focused on the UK’s post-Brexit trade agreements, and the consultation processes that informed them.
But the Turing notes that “research analysing bias in NLP systems reveals that word embeddings learned automatically from the way words co-occur in large text [collections] exhibit human-like gender biases”.
An even more serious potential issue identified in the report is posed by automated facial recognition technology, which has already been used by a number of police forces – albeit not without controversy and legal challenge.
“Facial recognition software successfully identifies the faces of white men but fails to recognise those of dark-skinned women,” according to the report.
A key means by which to address such examples of bias in operation is to ensure greater diversity and representation in the industry that creates these tools.
“If you've got an unrepresentative group of people, then their values, their visions, and their experience are going to be replicated in the technology that they make – including in the kinds of algorithms that they design,” Wajcman says. “Good data science depends on the data you put in: if you put in biased data, you're going to get a biased decision.”
Young adds: “Bias enters not just in the original data collected, but at every point across the data pipeline. So, it's really important that we have diversity in all of these different roles – everything from deciding what data is collected and how it's collected, to the types of models and processes through which it is analysed, right through to [asking] what are we actually using this data for? Do we need to use this data? Do we want this AI system? So, it hits every point in the data pipeline.”
"These fields will be threaded into every aspect of society in the next couple of decades. And alongside issues of equity, we have to also look at issues of bias, and how these imbalances in AI and data science builds can shape biases in AI system."
Erin Young, Turing fellow
Other potential remedial measures identified by the report include further consideration of the problem by government, followed by “proactive measures” – particularly those encouraging, or even compelling, transparency from AI firms.
“For example, the UK government should require companies to scrutinise and disclose the gender composition of their technical, design, management and applied research teams,” it says. “This must also include mandating responsible gender-sensitive design and implementation of data science research and machine learning. This is an issue of social and economic justice, as well as one of AI ethics and fairness.”
It is not just government that should assume responsibility for addressing this issue. The industry and the businesses that comprise it, meanwhile, can do better in ensuring an environment that supports a more diverse workforce.
Wajcman says: “Often these problems are seen as pipeline problems, [whereby] you just have to increase the number of women going in – teach coding to girls, do more training, and the problem will be solved in a generation. I have lived through this argument for a very long time! And I more and more think that there are a lot of subtle processes going on within organisations and organisational culture.”
She adds that the modern characterisation of so-called ‘tech bro’ culture is somewhat “crude” – but rooted in a reality that has persisted since the hackers and computer engineers of the 1980s and 90s, whose work was often fuelled by “hamburgers in the middle of the night”.
“Now we're very concerned about the replication of this, particularly at big tech companies where, you're encouraged to be there all day and all night, and [you can] eat and go to the gym and do all of this stuff. All of this is a world in which people without children and without caring responsibilities – which turns out are predominantly young men – can flourish in these organisations.”
Young adds that the AI and data science sector has a chance to learn from the history of computing and numerous instances where women at the vanguard have been marginalised. This is exemplified by the historically overlooked contributions of the female ‘computers’ of the 1940s who supported the allied war effort and went on play a central role in the development of the first programmable digital computer, which was used by the US Army.
“During World War Two… women were key in the tech workforce,” Young says. “But the structural inequalities in the broader computing sector pushed them into lower-status, lower-pay subfields. And this is the kind of thing that we're seeing happening again, in data science and AI in particular. And this is partially because of this bro culture, this hustle culture.”
She adds: “And so I think something actionable and key that we can be doing is looking at this culture and ensuring that the same thing doesn't happen again, and to make sure that women are in positions where they can make key decisions, even down to what systems are, and are not, being built – and for what. And make sure they have power in leadership roles and in key technical roles as well.”
Click here to read the full report, Where are the women? Mapping the gender job gap in AI, published to mark International Women's Day 2021.
CMA created team last year to better understand and oversee the use of automated technologies in business
Staff will either become civil servants or move over to commercial providers
Digital and data once again had a starring role in supporting – and, occasionally, hampering – government’s work this year. PublicTechnology looks back at the most significant events.
Process will look at strategies adopted by other countries