How do you select participants for a workshop? That’s the question I asked myself around this time last year. To me, going through this became one of the most informative experiences of the entire organization process, which is why I am devoting a whole (probably very long) blog post to this issue. Bear with me; I hope it will be useful.
Conferences generally provide a registration form, including payment. If the maximum number of participants has registered, the registration form closes and nobody else can register. Based on the positive feedback from the first Astro Hack Week in Seattle, we wondered whether we would have more interested researchers than spots. We also weren’t sure whether admitting everyone until the maximum number was reached would give us the cross section of fields and experiences we hoped to have at Astro Hack Week.
So I chose an alternative route: I first set up an application form that was more of a “notification of interest” kind of thing, based on which we could admit participants. The selected participants would then get a personalized link for the actual registration. This seemed simple and straightforward. Until, by the time the application form closed, I counted nearly 170 application for 50 open slots. I was overwhelmed by the positive response to the workshop we were organizing, and more than daunted by the task of selecting a subset we’d admit to Astro Hack Week. How were we going to select participants in a manner that was fair? What criteria did we even want to apply to the selection? In hindsight, this is something I should have thought about more deeply before actually setting up the application (I’ll come back to that part later).
Thankfully, I share an office with a computer scientist. Brian McFee told me that since I work at a Center for Data Science, I should do what every good data scientist does and use a computer. Being a maths wizard, he and I came up with an algorithm for automated participant selection, based on a set of user-defined criteria.
It works like this: given that I have a set of criteria I am interested in optimizing, I need to choose the fraction of participants I want in each subgroup. For example, let’s say I am interested in making sure every level of academic seniority is represented at my workshop, I could include a category “seniority” and define its corresponding target fractions like this: 10% undergraduate students, 30% graduate students, 30% postdocs, 15% tenure-track and tenured faculty, 15% other (e.g. research staff). At the same time, perhaps I’d also like to make sure that I have a good mix of people with statistics and machine learning skills, so I include another two categories “machine learning” and “statistics” with target fractions like this: 50% machine learning experts, 50% machine learning beginners for the first category, and 50% statistics experts, 50% statistics beginners for the second.
Of course, I could also include identity characteristics like, say, the home country of the applicant or their gender, and in fact ensuring that we’d have a reasonable fraction of female scientists at the workshop was one of several motivations behind using this approach. The idea is that the algorithm will choose the subset of participants that globally optimizes the participants’ attributes according to the constraints I set on the criteria of interest. In this important way, it is very different from a quota (and I learned the need to point that out explicitly). Nobody will be chosen because they are a woman or an expert statistician. But someone might be chosen because they’re a woman exoplanet researcher who is also an expert statistician and a postdoc. None of the individual criteria would ensure admission, but the combination might be something that we think would make the workshop better.
I think at this point I need to mention the one thing the algorithm is explicitly not designed to do: it is very much not a tool to make any kind of merit-based selection. That is to say, if you wanted to use it for example to pick contributed talks for a conference, and you’d given grades to your speakers based on how good you think their abstracts are, you would *not* want these grades to be part of your automated selection procedure. That part, judging whether e.g. a talk is worth being given time at a conference, or, in the case of Astro Hack Week, whether an applicant is serious about coming to the workshop and contributing, still requires human judgment. We can’t change that. But one thing that I was hoping to do with the algorithm was make things a little fairer and more transparent after that initial choice is made. To go back to the example of conference speakers: if you had split your set of abstracts into two groups, one containing abstracts you’d be happy seeing as talks, and one you wouldn’t, you could take the first group and then optimize over another set of criteria (e.g. to make sure you’ve got some PhD students in the mix of speakers, and also that all your talks aren’t just about black holes).
Sometimes, you might still want to admit participants or select speakers in advance (e.g. for invited talks). We did this for Astro Hack Week, too, because we obviously needed all the lecturers to be there, and we wanted to ensure we would have at least a few people who had some previous experience with this sort of workshop who could help us guide others. In principle, you could just take these participants out of the pool and flag them as accepted by hand. Our code allows pre-selection of a subset of applicants as definitely admitted, but their attributes will still count toward the target fractions (if all your invited speakers are senior professors, but you told the algorithm you want 30% PhD student speakers, the optimization will probably increase the number of PhD students among contributed talks).
Of course, using this approach requires careful thought about both the application form and the desired mix of criteria at the workshop. For Astro Hack Week, we very strongly prioritized diversity in every characteristic. We did include gender, but we were just as interested to have everything represented from undergraduate students to senior faculty, from beginners at data analysis to weathered experts, from exoplanets to black holes to cosmology. I hadn’t really thought about that when we deployed the application form, which meant I spent quite a few evenings trying to translate long-form replies to some questions we’d asked into numerical values that we could input into the algorithm. I’ll be the first to admit that this approach isn’t ideal, since it comes with its own set of biases. For last year’s Astro Hack Week, we had no choice, since that were the data we had to work with. In future workshops, I’d advocate anyone who’d want to use this approach to set up the application form carefully. Write out in advance which criteria matter to you, and think about how you can translate them into multiple-choice values or drop-down menus such that you don’t need to spend a huge amount of time coding responses. Be aware that asking for some of these (e.g. personal identity characteristics like gender or ethnic background) might have to be voluntary.
Why would you want to use this approach? For me, it started out as a technical question more than anything. Beyond the difficulty of trying to pick fifty participants out of 170 applicants (all of which made it through our initial screening thanks to their very well-thought out responses), it seemed impossible for a human to optimize over all the criteria we were interested in while trying to control for all of my human biases at the same time.
As we developed and applied this method, as I discussed it with participants at Astro Hack Week and later at the Moore-Sloan Data Science Environment Summit, what became increasingly clear to me and the other organizers were the social and psychological dimensions of the idea. I am not an expert on either, so below is my personal take on what I learned from the experience more than a formal evaluation. I would love to hear what people more expert than me would have to say about this.
I believe the most compelling reason why you’d choose this algorithm over a more traditional selection procedure is transparency. It forces workshop organizers to be very explicit about the criteria they are interested in, and within the criteria which target fractions they wish to set. That is not to say it is necessarily unbiased: the selection criteria and the target fractions are still set by the organizers. It was rightfully pointed out to me that while we did have gender as a selection criterion, that was the only identity characteristic represented in our selection. I left out a whole lot of other criteria I could have chosen (e.g. ethnic background), and that was a choice I suddenly had to question and think about (and admit that for the next workshop, we need to at the very least think carefully about which criteria to choose and what impact they might have on the characteristics of the group of participants).
Of course, nothing would stop organizers to consciously bias against certain groups by setting target fractions. But while a member of a scientific organizing committee working on black holes might unconsciously be more likely to accept black hole-related talks at a conference, choosing to set the target fraction for talks about black holes in the algorithm at 90%, that suddenly because a very obvious and glaring choice (one that said organizer might have to defend to their peers). In an ideal world, workshop organizers would publish their criteria and target fractions, such that the participants know, too, what choices were made during the selection. I think for organizers, this can become quite uncomfortable, because these choices suddenly need to be justified (and I certainly started questioning some of our choices in the process). But ultimately, I think everyone would benefit.
We discussed the use of our automated participant selection pretty openly with the workshop participants during Astro Hack Week. Anecdotally, I think most participants saw it favourably, though there were some concerns, too. Partly, I think, this was related to the fact that the use of a random number generator as part of the algorithm got a bit more exposure than it should have. It unsurprisingly disconcerts participants when they’re being told chance was part of why they ended up at Astro Hack Week. The new version of the algorithm we hacked together during Astro Hack Week itself depends much less on a random number generator; it only uses one for tie-breaking if two applicants have exactly the same score.
Some of the participants did point out to me in conversations that they welcomed the thought we’d put into the selection procedure. On the other hand, we also wondered whether using an algorithm to select participants would increase or decrease impostor syndrome among participants, another thing we talked about at Astro Hack Week (and subject of a future blog post). Being told that they were chosen not because the organizers saw them as special, but because an algorithm picked them for their unique combination of attributes and experiences might make participants more uncertain about their place at the workshop and thus increase impostor syndrome. We didn’t really get a chance to test that hypothesis, but I would love to do that in the future.
Now, having some distance to last year’s Astro Hack Week, having discussed the approach with both colleagues and Astro Hack Week participants, I’ve come to the following conclusion for myself: I would rather be chosen by an algorithm with a clear set of criteria that the organizers had to figure out and settle on in advance than some opaque algorithm in someone’s head. This is not to say that I think organizers who don’t use this kind of method don’t do a careful job, or don’t think about the selection very carefully. I think they do. But having sat in front of a spreadsheet with many names and responses to our questions I’ve learned just how difficult that task can be.
I’d love to hear about your experiences, comments and questions. What do you think about the idea of automated participant selection? Do you have suggestions for improving? If you’re interested in using this algorithm for yourself, send me a message, or just have a look at the (open source!) code.