How to do bad data science well – multiple methods and complex data sources
Venue
Online for attendees outwith UoE.In-person for UoE staff and students:
Seminar Room 2, Chrystal Macmillan Building, The University of Edinburgh 15a George Square Edinburgh EH8 9LD
Description
This event is open to staff and students at the University of Edinburgh. If you are external to the University, you can participate online by registering for the live-stream: how-to-do-bad-data-science-well-online.eventbrite.co.uk
This seminar forms part of the Research Training Centre's Talking Methods seminar series delivered in association with the Scottish Graduate School of Social Science.
Abstract
Data science was set to revolutionise statistics and social research, providing us with all kinds of new data sources and methods to open new windows into the social world. But, as many of us who’ve spent time working on data science projects will know, often the data is rubbish, the models don’t converge or converge suspiciously well, and the outputs make sense to us but are impossible to turn into a policy brief or short article. I’ll talk about research we did at the Cambridge Cybercrime Centre, which took half a dozen quite shaky data sources, a number of suspicious models, and some experimental practices that we chose to call ‘immersive data ethnography’ and turned them into a project with real findings and impact. In this research, we combined large-scale web scraping of cybercrime forums and chat channels with interviews and ethnographic study on online deviant communities, working back and forth between different kinds of data and analysis in an interdisciplinary team. By combining our strengths - in psychology, sociology, data science, ethnography, programming, and visualisation - we managed to make these extremely complex and confusing sources (including forums with tens of millions of posts) into a fascinating new resource for social scientists, and built our own AI and ML tools for others to explore them. I’ll also show how we discovered that far from magic hackers and techno-wizards, the reality of cybercrime is far more boring than we ever thought possible.
Biography
Ben Collier is a Lecturer in Digital Methods at the University of Edinburgh. Formerly employed as a statistician at Scottish Government in Transport and Health, Ben’s research interests are interdisciplinary, blending qualitative, quantitative, and data science approaches. Current projects include research on digital infrastructure, including a forthcoming history of the Tor anonymity network; mixed-methods research on cybercrime ecosystems; and a large-scale study of the use of data-driven digital strategic communications in the UK public sector.
University Profile: www.sps.ed.ac.uk/staff/ben-collier
Event Schedule
13:00 - 13:10 - Introductions
13:10 - 13:40 - Talk
13:40 - 14:00 - Q&A
14:00 - 15:00 - Refreshments (optional)