Wikimedia Research Showcase, December 21, 2016, “Editing Encyclopedias and other dangerous jobs” by Andreas Forte
- Transcript derived from video on YouTube
- From research study “Privacy, Anonymity, and Perceived Risk in Open Collaboration. A Study of Tor Users and Wikipedians”, by Andrea Forte, Rachel Greenstadt, and Nazanine Andalibi, 2017. Download paper (PDF)
- See also my summary and comments at “Review of ‘Privacy, Anonymity, and Perceived Risk in Open Collaboration‘”.
- Slide deck does not seem to be available online, I have provided screen shots.
[33:00] Andrea Forte: Okay. Hi there. So, I’m going to be talking about something completely different. This is work that I’ve been doing for the past couple of years with my collaborator Rachel Greenstadt, who’s also on the hangout here and also PhD student Nazanine Andalibi.
So, this is not the title of the paper that we wrote about this study, but as I’ve continued to think about the kind of research that we’re doing on privacy-related issues in Wikipedia, this is how I’ve started to think about it. There’s risk inherent in participating online, and people who are working on seemingly mundane tasks are encountering this risk like anyone else online.
And so, for about fifteen years now–my first Wikipedia study started in 2004–I’ve been looking at, sort of, the positive features of participation online, and the things that people can learn through these processes, the exciting new things that people can do, you know, the new ways that people can do existing activities like writing encyclopedias, if only we understand how to design the infrastructures and policies that support these activities. So I’ve mostly been focusing on opportunity, and recently it’s become more and more important to me to think about the kinds of risks, and take more of a critical perspective on the things that people are doing online, Wikipedia included.
[34:42] So, there’s been a lot of research in Wikipedia over the past decade or so on what it takes to develop successful open collaboration projects. What does it mean to be able to attract participants and volunteers, how do we teach people to contribute to these projects well. How do we keep them contributing over time, or at least get them to come back if they go away. And so this last bit is something that I hadn’t considered, and so, that’s making sure people feel safe and able to participate in the ways that they want to.
[35:20] So, this is a story that people in the Wikipedia community know pretty well. Bassel Khartabil disappeared, and so a lot of the discussion around this has been about his participation in open culture projects. And so this is a really high profile, really sort of dramatic and public example of the kinds of dangers that people might face, but there are a lot of less spectacular examples of people encountering harassment, and people encountering different kinds of threats, as they negotiate their participation in online projects, like Wikipedia.
[36:09] So last year, about a year and a half ago, Rachel and Naz and I set out to interview people who both participate in Wikipedia who are concerned about their privacy and people who are known to seek out privacy – so anonymity-seekers – so we talked to Tor users who also contribute to online projects, many of them contribute to Wikipedia, some of them contribute to other projects, who are clearly concerned about their privacy. So we sampled on both of these populations to both collect experiences from people who are definitely high-level contributors, so, Wikipedians who are definitely participating online and then Tor users who are definitely concerned about their privacy, and sort of started these two ends of the spectrum, and gather data about all of the sort of points in between. So we recruited through the Tor project blog, through various social media, through Wikimedia lists.
And we ended up interviewing 23 people. And twelve of these were Tor participants, eleven of them were Wikipedia participants – you know the division isn’t really that stark. People who use Tor, also many of them had edited Wikipedia. People who edited Wikipedia – a smaller number – but still some of them had used Tor. But they still represented pretty much two distinct populations. They brought different perspectives, as you’ll see, when I show you the kinds of threats that they perceived. So usually when you do a qualitative study like this you provide a participant table that sort of breaks down, you know, we had a male who was age 40 who had a Masters degree,… we didn’t do it this way for this study, because people had really serious privacy concerns, and so we sort of aggregated things, to you know, communicate, what kinds of people we talked to, where they were from, what their experiences were like.
[38:22] As far as analysis, this was a very standard inductive analysis. We interviewed these 23 individuals, actually we interviewed 22 individuals, and recorded the interviews and transcribed them and analyzed them; one of our participants preferred to meet face to face without any recording, and then I took all of those data and did open coding and shared these themes–the themes that emerged–with the co-authors, and we reviewed and discussed them. So this is a very typical kind of approach to doing qualitative research on people’s experiences on something like privacy.
[39:04] So, what we found, and you can read in a lot more detail about this, in the paper that’s linked from Wiki research page. We found out about the kinds of things they were worried about, about the sources of those perceived threats. We found out what the conditions were for people not perceiving threats when they participate in open collaboration projects. Everyone had some sort of privacy concern, but some people weren’t very worried when they were specifically contributing to open collaboration projects like Wikipedia. And then we gathered data about strategies for — how do they deal with these risks, how do they deal with these threats, and some people modified their participation, other people enacted different degrees of anonymity, and we’ll have some examples of how that all shook out.
[40:00] So the kinds of perceived threats that are out there–not surprisingly the most general one was that there would be some unknown party surveilling everything that they did. And for Tor users, this was a really big concern; Wikipedia users, somewhat. Loss of employment, or loss of some sort of opportunity, like being able to get into grad school, this was something that Wikipedians and Tor users both talked about–Tor users were slightly more likely to talk about it, but these are small numbers, so we don’t really put much stock in the breakdowns, but it’s interesting to see what kind of things float to the top, and where people’s concerns generally lie. Safety and harassment, intimidation–these were concerns that tended to lean more towards the Wikipedian experience, and the reasons why Wikipedians wanted to seek out privacy, likewise reputation loss. And the sources of threats are not surprising, these are standard: governments, businesses, and other private citizens.
[41:18] So this is an example of the kind of data that we would have coded as a concern for safety. We had a Tor user who talks about why they started using Tor for their online contributions, and says, “they busted his door down” – he’s speaking about a friend who was politically active, in similar ways as the participant – “and they beat the ever-living crap out of him. He was hospitalized for two and a half weeks and they told him, ‘if you and your family want to live, then you’re going to stop causing trouble'”. So here this person has a family, and decides that, okay, I’m going to start taking some of my human rights activities online into other identities through the Tor network. Wikipedians had similarly dire perceptions of threats to their safety, so we had one Wikipedian who mentioned that he had been threatened with drive-by shooting of his house. And he didn’t really take it very seriously, but then, he said “I pulled back from some of that Wikipedia work when I could no longer hide in quite the same way. For a long time I lived on my own, so it’s just my own personal risk, but now my wife lives here, so I can’t take that risk.” So, you know, people were experiencing threats to their own safety, threats that were directed against their loved ones, or that they perceived to be directed against their loved ones, in both of the populations that we interviewed.
- having their head photoshopped onto porn,
- being beaten up,
- being swatted,
- being doxxed,
so these run the gamut of online and face-to-face threats.
[43:16] Now in some cases people didn’t talk about feeling threatened, and there were two reasons that tended to come up when people said that they weren’t worried about participating in projects like Wikipedia. One, they’re not interested in controversial topics, and two, they self-identified as a member of a privileged class that didn’t need to worry. So, a Tor user says “I come at it from a completely privileged position, I’m an employed white male, so I have no horse in the race. I have colleagues who get the death threats and rape threats, and all the rest of it,” So they recognized that features of their identity make them part of what they perceive to be a protected class. And then a Wikipedian says, “I’m in the position of not being interested in any of the topics that be of particular interest to say, the NSA, so being a white American who’s probably not on top of the watch lists to begin with”… this person does not have many concerns about contributing.
[44:16] So when people do perceive a risk, how do they deal with that, how do they mitigate that? Two basic ways: first, modifying their participation, this gets particularly problematic when you think about the impact on projects like Wikipedia. So here’s a female Wikipedia editor says, “I let them know who I am so I’m no fun to chase, but I don’t edit topics like, for example, women’s health topics, or sexuality, not because I think I might be wrong about it – I’ve got my giant obstetrics textbook open right next to me – but I don’t want the backlash.” So this one was pre-med; this is the person you want editing women’s health topics and sexuality. This is the person who’s genuinely concerned that people have good information and who’s actually studying to be an expert in this area, but she doesn’t want to deal with it. So that’s one way that we heard people eliminated risk, was just to modify the ways that they participate. The other one was to attempt to enact different degrees of anonymity. So we heard lots of different strategies for doing this. Using multiple accounts was one, asking others to post things for them, whether in Wikipedia or forums or where ever was another way that people maintained their ability to participate in different kinds of projects, and using privacy-enhancing tools like Tor.
[45:51] So, that’s a really kind of high-level overview of the kinds of things we found when we talked to people. The implications that we see in this are that, so, first of all, this challenges the assumption that underlies the discussions about open collaboration that knowledge and skills are equally sharable, that we just need to get people to show up, that we need to motivate them and incentivize them, and teach them to write, then they’ll show up and do it. It’s also about providing contributors with these safe places where they feel like they can contribute their knowledge. And this is not just a concern for Wikipedia. It’s really something that is relevant to all kinds of online participation. And so there’s this threat of a new kind of digital divide between those people who describe themselves as privileged in various ways ad who feel like they can participate and contribute their views under their real identities and then those who don’t feel that freedom to do so. If some groups of people are systematically excluded, the information that we have access to becomes biased, and the opportunity for big ideas that are underlying projects like Wikipedia – like democratizing knowledge, and creating a representative resource for human knowledge – it becomes diminished if people don’t feel equally able to contribute. So its not just harmful to Wikipedia, it’s a critical concern for online participation in general. There are possible socio-technical solutions that could help mitigate some of this.
[47:49] So let’s talk a little bit about what we’re thinking about next. Rachel and I have been talking to – I should give a shout-out to Meko Hill, University of Washington – about ways that we could continue this work, and think about some of the problems that were uncovered – this was a very exploratory interview study. How do we continue this? So, for one, there was a disconnect between internet contributors threat models and the kinds of privacy protections that are supported by providers like Wikipedia and other places where they want to contribute. So we had reports from people who said, “Well, when I realized I couldn’t use Tor, I just decided to stop contributing. But if service providers aren’t supporting tools like Tor, it’s probably because they aren’t perceiving the same threat levels as their users. So to develop privacy-enhancing toolkits that can help with this situation, we really need to not only understand threat models of the people who are contributing, which we tried to start with this interview study, but also understand what’s informing the decision that the organizations and not just Wikipedia, but other organizations as well. So we’ve been talking to a whole bunch of places that support user-generated content development, to start to set up the foundations for studying this further.
[49:22] We saw chilling effects on participation, so, like this example of a pre-med student who didn’t want to edit women’s health issues? So we know that some people want to contribute, but they’re limiting their contributions. We have at least some proof of concept experiences that we’ve seen. So we don’t know the extent of that loss though, we don’t know how to measure this, and how can we measure what isn’t being contributed, is a really hard problem that I think we’d really love to talk to people about ideas for measuring what is the potential loss if protections for anonymity aren’t supported.
[50:11] And then we also have these anecdotal bits of stories that suggest edits are being reverted when participants don’t use well-known user names or don’t use well-known identities. So we had one participant who stopped editing under a well-known user name and started getting reverted, not because he changed the way he was editing, but because he started using a different account. And so we’re really curious about the extent to which perceptions of anonymity influence perceptions of quality when people contribute to projects like Wikipedia. Because if we can figure that out, then we might be able to develop reputation metrics or other kinds of approaches to help support good contributions that are coming from anonymous users and sources.
[51:06] So, in the paper we talk about a couple of ideas for directions that socio-technical solutions could take, like systematically – everyone we talked to who had become an administrator or taken some central role in Wikipedia said, “oh well, yeah when I started, I had no idea that it would be problematic to say, edit near my home or edit about geographic locations near my home or divulge things about who I am. And then later on they became more concerned, about these behaviors, so handling these temporal features of privacy concerns as people move from peripheral to more central participation. There’s things that open collaboration projects could do, thinking about unlinking technical identities for admins from their past identities, while retaining markers of trustworthiness or experience. This might play out differently in different kinds of projects, supporting users of the anonymous web, so, experimenting with existing tools that MediaWiki supports, something like pending changes for Tor might help, but it needs to incorporate feedback from Tor users, to understand does this actually solve the problems that they perceive when they come to something like Wikipedia.
[53:00] Dario: Thanks Andrea, a round of virtual applause from everyone at the hangout. We have time for questions so like five minutes for the topic (?) so IRC and hangouts
Q & A
Jonathan: One question from Siko on IRC, they ask, “Privacy-enhancing tools are good for potential victims, but also for potential perpetrators, or does it not matter so much for this latter group?”
Andrea: So that’s a really great question and I think that this is, sort of the wicked problem for the anonymous web is figuring out how do we support people in using anonymous tools when they want to make good faith contributions, whether it’s to Wikipedia or political discussions or whatever it is they want to do online, and balance that with service providers’ desire and need to quash abusive uses of their tools, and so we’ve discussed some different kinds of approaches to this. The successful ways of doing this are going to be tailored to individual projects. So something like a general communication tool like Twitter or discussion forums isn’t going to require the same kind of solution as Wikipedia, right, so there are tools that could be developed that are tailored to projects like Wikipedia that would be able to allow some sort of oversight, either by the community – probably by the community – or by some groups within the community, so I think this is a problem that is able to be dealt with, both from a social perspective – if we give if we give people the right technical tools – and potentially from a technical perspective that’s not really my area of expertise so, I don’t, I haven’t really explored the technical possibilities
[55:23] Rachel: There has been a lot of research in things like anonymous blacklisting, approaches that try and sort of balance that area. What I think that is missing in that work is sort of a connection between the privacy-enhancing technology, researchers that do it, and the actual service providers who have sort of different perceptions of their own threats, so the solutions may not be a perfect fit for the exact situation. So I think that’s an area where again more conversations with service providers and so on. I think it’s also really interesting to see that a lot of…it depends on what you think of in terms of perpetrators, whether it’s the people doing the harassment being anonymous or just sort of other groups; there’s an interesting study done by the ADL recently on harassment in Twitter that seems to show that, like, a lot of times the people doing the harassment are not particularly hiding behind anonymous…I mean, in some cases they are, but in many cases people are very overt about the harassment, so I think the perception that anonymity is sort of the enabling force of harassment is perhaps a little overstated.
[56:58] Aaron: So if there’s nobody else, I’ve a question, so I’ve been looking at all sorts of fun and interesting ways to catch vandalism and spam and personal attacks and all sorts of negative things with machine learning models, and it seems like this might be a way to sort of calm concerns around Tor users contributing in spaces like Wikipedia and so, but the thing that I really want to know is how much do we know about like the negative things that come out of this, so that I can know if we have appropriate models now, or if I should work out a new public strategy to be able to help out with this kind of project.
[57:38] Andrea: So, that’s a really excellent question, and this is precisely what I was saying when I said that, you know, we know that there are people that want to contribute, but are holding back because of concerns about revealing who they are, and this may not be something that Tor will solve in all cases. It’s not a placebo, that’s for sure. But there are also people who want to use tools like Tor, so having ..so there’s two things, there’s measuring what kind of bad stuff is going to happen, how much bad stuff is going to happen, and what kind of bad stuff is going to happen, and there’s measuring how much good stuff is going to happen and what kind of good stuff is going to happen, and it may be that there’s lots of drivel would be let in if you open the Tor floodgate, right, and I don’t know if it’s a floodgate or if it’s a trickle, frankly, I have no idea how many edits might be being turned away, that’s another question, it’s open, but the kinds of edits that are being attempted from Tor — I think it’s really important to look at what kinds of geographic locations are… well (reconsidering) , you know, what kinds of topics are being attempted to be edited, what’s the nature of the loss versus the nature of what could be gained. I think it’s a much more nuanced question than how many edits are bad and how many edits are good, but what is the kind of content that is potentially being lost, because if it is coming from voices that are currently underrepresented, that could go a long way toward improving quality in areas that aren’t currently getting attention from those people.
[59:28] Aaron: So just a quick followup, so I wonder if you are saying what we need to do is open the gates, maybe for a short period of time, see what comes through, and then re-strategize based on that.
[59:44] Andrea: That’s kind of my dream, yes, is to have that data to see, you know, what actually would happen.
[59:52] Rachel: Meko’s done some really interesting research on wikias in terms of like real-name kind of account policies versus not and seeing how that affects it, which is not the same thing, but it might be able to inform things a little bit. He has a really interesting suggestive graph in his work that seems to suggest that, you know, you lose the trolls and you lose some quality, but also you lose some good quality stuff when you make these kinds of changes, in terms of being more restrictive, but it does look like though that trolls tend to come back, whereas some of the good stuff just gets lost, and I mean it’s… more work is clearly needed, but I think we’re definitely hoping to do that work.
[1:00:46] Dario: I wanted to jump in, I’ve got two, I think very relevant points. so the first thing is that it turns out that we are looking into harassment, we have been running this project for about three quarters now, on improving ways to harassment from the research team in coordination with folks engaged in WMF. We’re going to have some more announcements in the next few months, but I just want to highlight an area we’re looking to, and the issues you both have mentioned around the lack of data, we’re going to train the model, we have a giant survivor bias, we will not capture all the who are out there. It’s going to be interesting to see how we can be better identify the issues so we can train these models. But the second kind of bigger picture make goes back to slide said people who are feeling comfortable contributing are (a), not interested in topics and (b) privileged class, and I’m going to say a few things in a personal capacity, isn’t representative of what everybody thinks about the issues, but I would say that people who are not from privileged class and people who are contributing on controversial topics are exactly what we want. I think that the issue of closer narrative is more timely and important than ever, that the last thing that probably we want in our projects to have is an order copies that are not controversial or that are edited for only a very tiny privileged class. the value of Wikipedia is to provide a long term memory on topics, including topics that are controversial, using neutral tone, and the best possible sources, is really critical that we address these biases, and we try this computers. There is a shown reason why this is profoundly problematic, and the question is that whoever controls the narrative doesn’t just control what Wikipedia says. You know, so racial, gender, cultural bias be propagated in an entire system of consumers data that are derived from Wikipedia. It is massive, so we are training an entire generation of AI tools or other search engines or whatnot, easily called and said we have today, by effected by we have today, that is much easier ExpSL, my comment. Yes, more questions from IRC.
[1:03:58] Jonathan: So we have one question from Tilman Bayer, it’s over several messages to I’m trying to actually like, okay, I’m just going to read what he says. “I read the paper and the press release and I think this statement does not make sense. ‘Wikipedia allows people to edit without an account, but does not permit users to mask their IP addresses and blocks Tor users, except in special cases. So it is still possible to piece together an editor’s identity or location by looking at the things they’ve contributed.'” That’s the quote. The question starts, “The IP address, which Tor obfuscates, and the contributions, which are public by defaults, are entirely separate. And in my humble opinion, this also undermines the argument for relaxing restrictions on Tor editing. Any thoughts on that discrepancy?”
[1:05:08] Andrea: So the ways that people can be, sort of, outed in terms of who they are and where they are – there were lots of different ways that people were concerned about being outed. Some of them were concerned about the idea that Wikimedia Foundation has their IP address, or that, you know, even though they logged in with a user name, they didn’t want to reveal where they were editing from, because of laws in their local country or whatever, lots of different reasons–that they felt they might be sanctioned for participating. There were also people who were concerned that their history of edits over time, by being linked together, either through their user name or because they have a unique IP address or whatever it is, would reveal who they are, because of having edited things about, for example their college, and their home town, and their place of work, or it would be obvious to other people, in the Wikipedia community sometimes, and they didn’t want that; they wanted to remain anonymous. And in some cases these were really long term participants in the project. These weren’t just people who were dropping by, these were folks who were wanted to contribute and were contributing in good faith but didn’t want to reveal who they were for whatever reason, so they found it very difficult to maintain anonymity, because of the combination of ways that you could piece together who they were, using both technical resources and features of their identity because of the content that they had contributed. So I’m not sure if that quite answers the question.
Rachel: And I think one difference between the Tor users and the Wikipedians in our study is that a lot of the Tor users who said that they didn’t really trust the — they knew that the IP history of edits were not public to everybody, but that you needed privileges, but they didn’t necessarily trust that those privileges would be, you know, enough for somebody who subpoenaed those logs or somebody who hacked Wikipedia and got access to it and doxxed them or things like that. People were still very – especially Tor users, are obviously concerned about that model or didn’t necessarily think that that was, sort of, safe enough.
[1:07:51] Jonathan: Awesome, thank you. Anther question from Siko on IRC: “Andrea, what do you think of allowing only registered editing?
[1:08:03] Andrea: Registered, as in you have to make an account?
Jonathan: I think that’s it, yeah.
Andrea: Okay, I’m really impressed with the line that the Foundation has taken on allowing people to remain “anonymous” (air quotes) which it’s sort of an interesting conflict of definitions that we ran into continuously in this work is that from the Wikipedia perspective anonymous editing means you have revealed your IP address, which is the least possible , you know, that’s the least anonymous thing, to people who are using Tor, specifically to subvert revealing their IP address, so yeah. So I’m not a huge fan of the idea of making people register to contribute things, I think it’s a great strength of the project.
[1:09:04] Rachel: No, I think the Tor project themselves have suggested that maybe only allowing registered users to edit through Tor might be a reasonable compromise.
Andrea: Yeah. So I would agree that if you want–there are ways that we could imagine facilitating editing using Tor tools like Tor that would be sort of special cases, without having to turn them away at the door, right, like you can register or your edits get reviewed or…there’s lots of different possible ways to enable something like that.
[1:09:43] Dario: Since we’re talking about this, I just want to make one important clarification of terminology and implications when we talk about registered versus “anonymous” or IP edits, so first off, you mentioned old obligations about revealing one’s identity, location, gender just via public data so, it is true, that basically all the data about contributions, about discussions that someone posts on Wikipedia or on talk pages creates basically a record that leaves traces and may potentially, and there’s been in fact a…some great research in a previous showcase that we hosted, a series by Marian-Andrei Rizoiu, that’s called “Evolution of privacy loss in Wikipedia“, and it tackles exactly that problem, so I invite you all to look it up if you are interested. In the implications is based on public data. the separate question about private data, and I want to reinforce something that may not be entirely clear, in general , we talked about retention, also IP addresses, the foundation does not retain any data, any per se information at the IP addresses in private, past 90 days, so what that means is that if you edit using a registered user name it will be during which time IP is mostly retained for operational issues for virtual issues like detecting idio liaising, this kind of stuff. But also for protect users, but the past days information pertaining to servers, so the answer to the question about subpoenas simply doesn’t apply. Is that it doesn’t exist and it is not retained at all by the Foundation.
Andrea: This is definitely something that a lot of times people don’t have a clear idea of what’s happening and that’s what actually contributes to a lot of privacy concerns is just a lack of understanding of how data is being captured, how it can be captured, how it will be used, how it is being used, so I think that could be potentially helpful to some of the people we talked to, if it were more widely understood.
Dario: I think you’re right, real quick it would be useful to have at least some pointers of explanation at a time of thinking of the edit stream for a set of years or for the restriction process using up for an account that’s a great suggestion. Are there any more comments from IRC? Jon are there any you want to relay?
Jonathan: There’s another comment from Tilman, just as a remark regarding Andrea and Rachel’s response: “I think that mixes the very practical concern of public edit histories with the in-almost-all-cases quite theoretical concern of subpoenas and checkuser abuse.” Also he wanted to clarify he did not see the latter mentioned in the paper specifically, although of course it does not cover all the interview material.
Andrea: Thanks, yes.
Dario: All right, are there any other questions about Andrea’s talk or Aaron’s talk, before we wrap up?
[1:13:40] Aaron: Maybe one thing that I should highlight while I have the chance is that Twitter kind of exploded about the quality trend change for WikiProject Women Scientists, and there was a substantial project that was started by Keilana in mid-2013 that likely explains this substantial shift that we’ve already started to talk about as the Keilana effect, because it’s actually affected a few other initiatives, like the WikiEducation Foundation and that sort of stuff, so I think that we have an answer for at least that transition.
Dario: Yeah, thanks everyone for like, supporting these multi-channel conversations we have an IRC so hangout at the same time and thanks to Brandon who’s been helping behind the scenes so that’s all. If you don’t have any additional questions I’d like to thank everybody again, a round of virtual applause to our speakers and the last of 2016, so we’re going to come back in the new year with more research presentations, so thanks for watching today and happy holidays everybody.