First Workshop on NLP for Internet Freedom
A workshop dedicated to NLP methods that potentially contribute (either positively or negatively) to the free flow of information
on the Internet, or to our understanding of the issues that arise in this area.
Venue: COLING 2018, Santa Fe, NM, USA, August 20
According to the recent report produced by Freedom House (freedomhouse.org), an “independent watchdog organization dedicated to the expansion of freedom and democracy around the world”, Internet freedom declined in 2016 for the sixth consecutive year. 67% of all Internet users live in countries where criticism of the government, military, or ruling family are subject to censorship. Social media users face unprecedented penalties, as authorities in 38 countries made arrests based on social media posts over the past year. Globally, 27 percent of all internet users live in countries where people have been arrested for publishing, sharing, or merely “liking” content on Facebook. Governments are increasingly going after messaging apps like WhatsApp and Telegram, which can spread information quickly and securely.
Various barriers exist to prevent citizens of a large number of countries to access information. Some involve infrastructural and economic barriers, others violations of user rights such as surveillance, privacy and repercussions for online speech and activities such as imprisonment, extralegal harassment or cyberattacks. Yet another area is limits on content, which involves legal regulations on content, technical filtering and blocking websites, (self-)censorship.
Large internet providers are effective monopolies, and themselves have the power to use NLP techniques to control information flow. Users are suspended or banned, sometimes without human intervention, and with little opportunity for redress. Users react to this by using coded, oblique or metaphorical language, by taking steps to conceal their identity such as the use of multiple accounts, raising questions about who the real originating author of a post actually is.
This workshop should bring together NLP researchers whose work contributes to the free flow of information on the Internet.
The topics of interest include (but are not limited) to the following:
- Censorship detection: detecting deleted or edited text; detecting blocked keywords/banned terms;
- Censorship circumvention techniques: linguistically inspired countermeasure for Internet censorship such as keyword substitution, expanding coverage of existing banned terms, text paraphrasing, linguistic steganography, generating information morphs etc.;
- Detection of self-censorship;
- Identifying potentially censorable content;
- Disinformation/Misinformation detection: fake news, fake accounts, rumor detection, etc.;
- Techniques to empirically measure Internet censorship across communication platforms;
- Investigations on covert linguistic communication and its limits;
- Identity and private information detection;
- Passive and targeted surveillance techniques;
- Ethics in NLP;
- “Walled gardens”, personalization and fragmentation of the online public space;
We hope that our workshop will promote Internet freedom in countries where accessing and sharing of information are strictly controlled by censorship.
[Mailing list for the workshop](https://groups.google.com/forum/#!forum/nlp4if)
- workshop submission deadline: May 25, 2018
- notification: June 20, 2018
- camera-ready submission deadline: June 30, 2018
- workshop date: August 20, 2018
Submissions should be written in English and anonymized with regard to the authors and/or their institution (no author-identifying information on the title page nor anywhere in the paper), including referencing style as usual. Authors should also ensure that identifying meta-information is removed from files submitted for review.
Submissions must use the Word or LaTeX template files provided by COLING 2018 and conform to the format defined by the COLING 2018 style guidelines.
- Long paper submission: up to 8 pages of content, plus 2 pages for references; final versions of long papers: one additional page: up to 9 pages with unlimited pages for references
- Short paper submission: up to 4 pages of content, plus 2 pages for references; final version of short papers: up to 5 pages with unlimited pages for references
PDF files must be submitted electronically via the START submission system.
The recommended style files are available from the COLING repository.
Double submission policy: Parallel submission to other meetings or publications are possible but must be immediately notified to the workshop contact person. If accepted, withdrawals are only possible within two days after notification.
Organizers and Program Committee
- Chris Brew, Computational Research Scientist, Digital Operatives: email@example.com
- Anna Feldman,Professor of Linguistics and Computer Science at Montclair State University. firstname.lastname@example.org
- Chris Leberknight,Associate Professor of Computer Science at Montclair State University. email@example.com
### Program Committee
- Joan Bachenko, Deception Discovery Technologies, NJ
- Jedidiah Crandall, University of New Mexico, NM
- Chaya Hiruncharoenvate, Mahasarakham University
- Lifu Huang, Rensselaer Polytechnic Institute (RPI), NY
- Zubin Jelveh, The University of Chicago
- Judith Klavans, Columbia University, NY
- Jeffrey Knockel, University of New Mexico, NM
- Will Lowe, Princeton University
- Rada Mihalcea, University of Michigan, Ann Arbor, MI
- Prateek Mittal, Princeton University, NJ
- Rishab Nithyanand, Data & Society, NY
- Noah Smith, University of Washington
- Thamar Solorio, University of Houston, TX
- Mahmood Sharif, Carnegie Mellon University, PA
- Evan Sultanik, Trail of Bits, NY
- Svitlana Volkova, Pacific Northwest National Laboratory, WA
- Brook Wu, NJIT, NJ
How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument
The Chinese government has long been suspected of hiring as many as 2 million people to surreptitiously insert huge numbers of pseudonymous and other deceptive writings into the stream of real social media posts, as if they were the genuine opinions of ordinary people. Many academics, and most journalists and activists, claim that these so-called 50c party posts vociferously argue for the government’s side in political and policy debates. As we show, this is also true of most posts openly accused on social media of being 50c. Yet almost no systematic empirical evidence exists for this claim or, more importantly, for the Chinese regime’s strategic objective in pursuing this activity. In the first large-scale empirical analysis of this operation, we show how to identify the secretive authors of these posts, the posts written by them, and their content. In contrast to prior claims, the Chinese regime’s strategy is to avoid arguing with skeptics of the party and the government, and to not even discuss controversial issues. The goal of this massive secretive operation is instead to distract the public and change the subject, as most of these posts involve cheerleading for China, the revolutionary history of the Communist Party, or other symbols of the regime.
How to Talk Dirty and Influence Machines
Prof. Jedidiah Crandall is an Associate Professor in the University of New Mexico Department of Computer Science. The principle that drives his research is this: those who censor and surveil the Internet should only be able to do so with full transparency. Towards this end his research group carries out Internet measurements, reverse engineering of applications, and social media analysis to shed light on censorship and surveillance around the world. Crandall is part of the Net Alerts project, a collaborative effort to protect at-risk populations (such as journalists and activists) on the Internet by educating them about the unique threats they face.
In the book, "How to Talk Dirty and Influence People," Lenny Bruce expounds on the idea that humans need to be able to communicate with words that draw on human experiences in order to communicate some ideas effectively. This suggests a lot of euphemism, vernacular, and, yes, dirty words. In this talk I'll give an NLP research outsider's perspective on the problem of understanding what humans are saying online algorithmically. I'll use real examples of posts and keywords that trigger censorship or surveillance in China to illustrate challenging problems in this space. For example, how can we do topical analysis on language that is deliberately obfuscated? China's netizens have affectionately coined the term "Martian Language" to refer to how they communicate online, and my research group has proposed pointillism as an approach to apply topical analysis in this environment. I'll also propose some even more challenging problems, like can we take hundreds of keyword blacklists containing hundreds of thousands of keywords---many of which return 0 hits on Google---and categorize them? (The answer is maybe).
Nancy Watzman is an award-winning investigative journalist, researcher, and strategist with a focus on launching data-rich journalism projects on emerging platforms. She has more than two decades of experience doing research, writing, strategy, communications, and policy analysis. Her reporting and commentary has appeared in many leading publications, including Harper’s Magazine, The Nation, The New Republic, USA Today, The Washington Monthly, and she has appeared on NPR, Fox News, and C-SPAN, and other networks. She’s also worked with watchdog and other public interest groups, including the Internet Archive, Sunlight Foundation, Common Cause, Center for Public Integrity, Center for Responsive Politics, Public Campaign (now Every Voice), and Public Citizen. In 2016 she managed the launch of the Political TV Ad Archive, a collection of 2016 political ads with underlying, downloadable data on airings in key TV markets. She is co-author, with Micah Sifry, of Is That a Politician in Your Pocket? Washington on $2 Million
a Day (John Wiley & Sons, 2004), and contributed to The Buying of the Congress, Center for Public Integrity (Avon Books, 1998). She was the recipient of the Century Fund grant and served as fellow for the Center for Independent Media. She is a graduate of Swarthmore College, majoring in History and English Literature.