Universitat Pompeu Fabra
Universitat Pompeu Fabra
Masashi Toyoda
Masashi Toyoda
Masashi
Toyoda
265ff069df1de1eaf69a39ab4fdc98fa8d8817d4
Detecting Collective Attention Spam
We examine the problem of collective attention spam, in which spammers target social media where user attention quickly coalesces and then collectively focuses around a phenomenon. Compared to many existing spam types, collective attention spam relies on the users themselves to seek out the content -- like breaking news, viral videos, and popular memes -- where the spam will be encountered, potentially increasing its effectiveness and reach. We study the presence of collective spam in one popular service, Twitter, and we develop spam classifier to detect spam messages generated by collective attention spammers. Since many instances of collective attention are bursty and unexpected, it is difficult to build spam detectors to pre-screen them before they arise; hence, we examine the effectiveness of quickly learning a classifier based on the first moments of a bursting phenomenon. Through initial experiments over a small set of trending topics on Twitter, we find encouraging results, suggesting that collective attention spam may be identified early in its life cycle and shielded from the view of unsuspecting social media users.
social media
spam
collective attention
Detecting Collective Attention Spam
Adam Wierzbicki
Adam Wierzbicki
Adam
Wierzbicki
170c33247b74dde9b39ad98422dcd227b1702274
Christopher Horn
Christopher Horn
Christopher
Horn
0b49e47074c8adcf7b529fee32bec3bad19f08f1
University of Queensland
University of Queensland
Leticia Cagnina
Leticia Cagnina
Leticia
Cagnina
cfb91ec75e3b387abdc5b63aded3a393f897f81f
kaPoW Plugins: Protecting Web Applications Using Reputation-based Proof-of-Work
reputation
puzzles
spam
Comment spam is a fact of life if you have a blog or forum. Tools
like Akismet and CAPTCHA help prevent spam in applications like
WordPress or phpBB. However, they are not devoid of
shortcomings. CAPTCHAs are getting easier to solve by automated
adversaries like bots and pose usability issues. Akismet strives to
detect spam, but can't do much to reduce it. This paper presents the
kaPoW plugin and reputation service that can complement existing
anti-spam tools. kaPoW creates disincentives for sending spam by
slowing down spammers. It uses a web-based proof-of-work approach
wherein a client is given a computational puzzle to solve before
accessing a service (e.g. comment posting). The idea is to set
puzzle difficulties based on a client's reputation, thereby, issuing
``harder'' puzzles to spammers. The more time spammers solve
puzzles, the less time they have to send spam. Unlike CAPTCHAs,
kaPoW requires no additional user interaction since all the puzzles
are issued and solved in software. kaPoW can be used by any web
application that supports an extension framework (e.g. plugins) and
is concerned about spam.
kaPoW Plugins: Protecting Web Applications Using Reputation-based Proof-of-Work
Akshay Dua
Akshay Dua
Akshay
Dua
e40c937ec4cbfe3fe99b1854f27a957f9b44ed1a
Kazutoshi Sumiya
Kazutoshi Sumiya
Kazutoshi
Sumiya
5b2446a1901fd3311c05413595b23d35e7cccfee
Valentina Presutti
Valentina
Presutti
lexical quality
social media
On Measuring the Lexical Quality of the Web
Web text accessibility
On Measuring the Lexical Quality of the Web
In this paper we propose a measure for estimating the lexical quality of the Web, that is, the representational aspect of the textual web content. Our lexical quality measure is based in a small corpus of spelling errors and we apply it to English and Spanish. We first compute the correlation of our measure with web popularity measures to show that gives independent information and then we apply it to different web segments, including social media. Our results shed a light on the lexical quality of the Web and show that authoritative websites have several orders of magnitude less misspellings than the overall Web. We also present an analysis of geographical distribution of lexical quality throughout English and Spanish speaking countries.
Haijie Gu
Haijie Gu
Haijie
Gu
a37cfee0e0a399133173d8bce6be20bc70e4fce1
Software Engineering Institute, East China Normal University
Software Engineering Institute, East China Normal University
Polish Academy of Sciences
Polish Academy of Sciences
Marcelo Errecalde
Marcelo Errecalde
Marcelo
Errecalde
f3b7d75f846b91c693905f1f92d8c6c64f6a6d51
Wu-Chang Feng
Wu-Chang Feng
Wu-Chang
Feng
c9034a64f56c541a904dd5b46084df974b589c0d
Edgardo Ferretti
Edgardo Ferretti
Edgardo
Ferretti
028fd0811c14c3dbb9f437432b0a1f8284147e31
MTA SZTAKI, Budapest
MTA SZTAKI, Budapest
Jingwei Zhang
Jingwei Zhang
Jingwei
Zhang
f5b819a23c25c035bdefe463c24eb967be49346a
Wellesley College
Wellesley College
EPFL
EPFL
UCSB
UCSB
Benno Stein
Benno Stein
Benno
Stein
f99e406dc9986a385127475c99a7ebd26f3a3819
Texas A&M University
Texas A&M University
Brian Davison
Brian Davison
Brian
Davison
d5df1c16beb2a9f09dc9e898f385542068211838
Ching Man Au Yeung
Ching Man Au Yeung
Ching
Au Yeung
3864faf51fa80c3a0378f16a6ff66a86e2164dfd
Aoying Zhou
Aoying Zhou
Aoying
Zhou
5aa192f02e55a4b1c614fc3a9cd3f90c92f877a0
Microsoft Research
Microsoft Research
Dataset about webquality2012.
Tue May 03 19:04:53 CEST 2016
Elisabeth Lex
Elisabeth Lex
Elisabeth
Lex
ba49a50893084a4514054f2ec27f217dbb22af7d
Navigation System
A Deformation Analysis Method for Artificial Maps Based on Geographical Accuracy and Its Applications
Artificial maps are widely used for a variety of purposes, including as tourist guides to help people find geographical objects using simple figures. We aim to develop an editing system and a navigation system for artificial maps. Artificial maps made for tourists show suitable objects for traveling users. Therefore, if the artificial map has a navigation system, users can get geographical information such as object positions and routes without performing any operations. However, artificial maps might contain incorrect or superfluous information, such as some objects on the map being intentionally enlarged or omitted. For developing the system, there are two problems: 1. how to extract geographical information from the raster graphics of the artificial map and 2. how to revise inaccurate geographical information on the artificial map. We propose a deformation-analyzing method based on geographical accuracy using optical character recognition techniques and comparing gazetteer information. That is, our proposed method detects the tolerance level for deformation according to the purpose of the artificial map. Then, we detect a certain position on the artificial map using deformation analysis. In this paper, we develop a prototype system and we evaluate the accuracy of extracting information from the artificial map and detecting positions.
GIS
Artificial Maps
A Deformation Analysis Method for Artificial Maps Based on Geographical Accuracy and Its Applications
Geographical Accuracy
Dasiuke Kitayama
Dasiuke Kitayama
Dasiuke
Kitayama
f9e8e1517cec0ae8dbcb1668b0fa93504fe3bc21
Google
Google
Anna Lisa Gentile
Anna Lisa
Gentile
Carnegie Mellon University
Carnegie Mellon University
Tien Le
Tien Le
Tien
Le
7f36143dd4e9d85e1866fdef8d913cc649f70393
Andrea Giovanni Nuzzolese
Andrea Giovanni
Nuzzolese
Google Research
Google Research
East China Normal University
East China Normal University
Steve Webb
Steve Webb
Steve
Webb
2ae5a115af70b6cff4b85c873d4bbdc12e7b6444
University of Passau
University of Passau
Measuring the Quality of Web Content using Factual Information
Web Content
Factual Information
Information Quality
Measuring the Quality of Web Content using Factual Information
Nowadays, many decisions are based on information found in the Web. For the most part, the disseminating sources are not certified, and hence an assessment of the quality and credibility of Web content became more important than ever. With factual density we present a simple statistical quality measure that is based on facts extracted from Web content using Open Information Extraction. In a first case study, we use this measure to identify featured/good articles in Wikipedia. We compare the factual density measure with word count, a measure that has successfully been applied to this task in the past. Our evaluation corroborates the good performance of word count in Wikipedia since featured/good articles are often longer than non-featured. However, for articles of similar lengths the word count measure fails while factual density can separate between them with an F-measure of 90.4%. We also investigate the use of relational features for categorizing Wikipedia articles into featured/good versus non-featured ones. If articles have similar lengths, we achieve an F-measure of 86.7% and 84% otherwise.
Portland State University
Portland State University
Karl Aberer
Karl Aberer
Karl
Aberer
a9877790616eb28af52fd602e67b0dbeb50f5399
Thanasis Papaioannou
Thanasis Papaioannou
Thanasis
Papaioannou
5d3fd89c934eddd5e3822d862ccd4b932e39e511
Aldo Gangemi
Aldo
Gangemi
Kyoto University
Kyoto University
Matt Cutts
Matt Cutts
Matt
Cutts
7caa0bbaf3a28358ffb044d751124b308221b911
Miriam Metzger
Miriam Metzger
Miriam
Metzger
9d9739824bd042398a6c3d44d5f4d16a940fa253
Computer and Automation Research Institute, Hungarian Academy of Sciences
Computer and Automation Research Institute, Hungarian Academy of Sciences
Polish-Japanese Institute of Information Technology (PJIIT)
Polish-Japanese Institute of Information Technology (PJIIT)
Kyumin Lee
Kyumin Lee
Kyumin
Lee
bb6bd050857abf42d5f4db27e05211c9d87dd373
Bauhaus-Universität Weimar
Bauhaus-Universität Weimar
NTT Communication Science Laboratories
NTT Communication Science Laboratories
Panagiotis Metaxas
Panagiotis Metaxas
Panagiotis
Metaxas
60520cde270b013434559cc049edde889b29cd4c
Adam Jatowt
Adam Jatowt
Adam
Jatowt
f47c4d88e54139b089dce542c6b5e82c90f09b18
Universidad Nacional de San Luis
Universidad Nacional de San Luis
Andras A. Benczur
Andras A. Benczur
Andras
Benczur
8d0c49540f79c6dc87f92a92741cce31efd166d0
Zoltan Gyongyi
Zoltan Gyongyi
Zoltan
Gyongyi
fa38ed4ef5eb6692c8f299bf26f7ab5042cc7034
Andrew Flanagin
Andrew Flanagin
Andrew
Flanagin
a4a8c332b755d81faa2c43a249ac041284ec129f
Lehigh University
Lehigh University
incentives
evolution
Research on Web credibility assessment can significantly benefit from new models that are better suited for evaluation and study of adversary strategies. Currently employed models lack several important aspects, such as the explicit modeling of Web content properties (e.g. presentation quality), the user economic incentives and assessment capabilities. In this paper, we introduce a new, game-theoretic model of credibility, referred to as the Credibility Game. We perform equilibrium and stability analysis of a simple variant of the game and then study it as a signaling game against naive and expert information consumers. By a generic economic model of the player payoffs, we study, via simulation experiments, more complex variants of the Credibility Game and demonstrate the effect of consumer expertise and of the signal for credibility evaluation on the evolutionary stable strategies of the information producers and consumers.
Game-theoretic Models of Web Credibility
Game-theoretic Models of Web Credibility
signaling game
equilibrium
rationality
Luz Rello
Luz Rello
Luz
Rello
117c0c2e0c23b036f8c55afd7c191d3d1ce9ccc7
Web Quality
Machine Learning
Content-Based Trust and Bias Classification via Biclustering
In this paper we give probably the first practically useful result for trust, bias and factuality classification over Web data on the domain level. Unlike the majority of literature in this area that aims at extracting opinion and handling short text on the micro level, we aim to aid a researcher or an archivist in obtaining a large collection that, on the high level, originates from unbiased and trustworthy sources. Our method generates features as Jensen-Shannon distances from centers in a host-term biclustering. On top of the distance features, we apply kernel fusion methods and also combine with baseline text classifiers. We test our method on the ECML/PKDD Discovery Challenge data set DC2010. Our method improves over the best achieved text classification AUC results by over 0.02 for both neutrality and trustworhiness.
The fact that the ECML/PKDD Discovery Challenge 2010 participants reached an AUC around 0.5 indicates the hardness of the task.
Bias
Document Classification
Trust
Content-Based Trust and Bias Classification via Biclustering
Maik Anderka
Maik Anderka
Maik
Anderka
8e78ac805478883a7b5cab88ddc843f95581a2a8
Katarzyna Abramczuk
Katarzyna Abramczuk
Katarzyna
Abramczuk
81bca9ca08099dd5e72bd178e24ff9f52dbcf52b
University of Hyogo
University of Hyogo
Paulina Adamska
Paulina Adamska
Paulina
Adamska
53f1f40712b83f0757f46dffa2f688518dec166a
Know-Center GmbH
Know-Center GmbH
Georgia Institute of Technology
Georgia Institute of Technology
Zhiyuan Cheng
Zhiyuan Cheng
Zhiyuan
Cheng
82f41ad7c934830bba6baa785cee9e03381f60b9
Rishi Chandy
Rishi Chandy
Rishi
Chandy
58e4ae70ef04a2552675a2cdf0abc00ba4819951
mutual information
information theory
Sentiment classification is a task of classifying documents according to their overall sentimental inclination. It is very important and popular in many web applications, such as analysis of credibility of news sites on the web, recommendation system and mining online discussion. Vector space model is widely applied on modeling documents in supervised sentiment classification, in which the feature presentation (including features type and weighting method) is crucial for classification accuracy. The traditional feature presentation methods of text categorization do not perform well in sentiment classification, because the expressing manners of sentiment are more subtle. We analyze the relationships of terms with sentimental labels based on information theory, and propose applying information theoretic approach on sentiment classification of documents. In this paper, the sentimental polarities of the terms in a document are quantified by mutual information. And then the terms are weighted in vector space based on their sentiment scores and contribution to the document. We perform extensive experiments with SVM on the sets of multiple products reviews, and the experimental results show our approach is more effective than the traditional ones.
sentiment classification
An Information Theoretic Approach to Sentimental Polarity Classification
An Information Theoretic Approach to Sentimental Polarity Classification
feature presentation
Graz University of Technology
Graz University of Technology
Katsumi Tanaka
Katsumi Tanaka
Katsumi
Tanaka
4c9952bfa8fb1e620b1776715b16c37ebfbd5826
Yahoo! Research
Yahoo! Research
Xiaoling Wang
Xiaoling Wang
Xiaoling
Wang
8a2e5a3bbb9d7ad5046a643ef7cc98714296f749
opinion spam
mobile apps
review spam
Popular apps on the Apple iOS App Store can generate millions of dollars in profit and collect valuable personal user information. Fraudulent reviews could deceive users into downloading potentially harmful spam apps or unfairly ignoring apps that are victims of review spam, thus automatically identifying spam in the App Store is an important problem. This paper aims to introduce and characterize novel datasets acquired through crawling the iOS App Store, compare a baseline Decision Tree model with a novel Latent Class graphical model for classification of app spam, and analyze preliminary results for clustering reviews.
Identifying Spam in the iOS App Store
Identifying Spam in the iOS App Store
Dávid Siklósi
Dávid Siklósi
Dávid
Siklósi
fc46bd6156d5c75697d8d8fba8219b969de2e711
Michael Voelske
Michael Voelske
Michael
Voelske
3647c622bc47b2129b9f4182e2ebd2943913d9a0
Krishna Kamath
Krishna Kamath
Krishna
Kamath
b75d283c7f23881685b7c84d0d77ba7ee9c540ff
A Breakdown of Quality Flaws in Wikipedia
Wikipedia
The online encyclopedia Wikipedia is a successful example of the increasing popularity of user generated content on the Web. Despite its success, Wikipedia is often criticized for containing low-quality information, which is mainly attributed to its core policy of being open for editing by everyone. The identification of low-quality information is an important task since Wikipedia has become the primary source of knowledge for a huge number of people around the world. Previous research on quality assessment in Wikipedia either investigates only small samples of articles, or else focuses on single quality aspects, like accuracy or formality. This paper targets the investigation of quality flaws, and presents the first complete breakdown of Wikipedia's quality flaw structure. We conduct an extensive exploratory analysis, which reveals (1) the quality flaws that actually exist, (2) the distribution of flaws in Wikipedia, and (3) the extent of flawed content. An important finding is that more than one in four English Wikipedia articles contains at least one quality flaw, 70% of which concern article verifiability.
Quality Flaws
User-generated Content Analysis
A Breakdown of Quality Flaws in Wikipedia
Information Quality
Department of Social Informatics, Graduate School of Informatics, Kyoto University
Department of Social Informatics, Graduate School of Informatics, Kyoto University
Dennis Fetterly
Dennis Fetterly
Dennis
Fetterly
1349f2f18fcf3e6954bb318cdee5d016b0ba6abf
Carlos Castillo
Carlos Castillo
Carlos
Castillo
e3e61ae410a57d39ac5e08822b1c19d72d04a87b
UC Santa Barbara
UC Santa Barbara
Ricardo Baeza-Yates
Ricardo Baeza-Yates
Ricardo
Baeza-Yates
21e323f1da1813fd1703d3a824675d5e63465b70
Xiaofang Zhou
Xiaofang Zhou
Xiaofang
Zhou
6b3cbbde7b0e4e9281b0a1ef9b059525e017974f
Michael Granitzer
Michael Granitzer
Michael
Granitzer
bdd721879814228c3059502b927444eeb559cafb
Bálint Daróczy
Bálint Daróczy
Bálint
Daróczy
81d92be95a26816477a36a58b10ef68ee2166725
University of Tokyo
University of Tokyo
James Caverlee
James
Caverlee
b23dfc9b2e4ee228cc40defec8efea21ca4344d1
James Caverlee
b89ac2a66fdc59ba171afe9337392e07bac6da87
Yuming Lin
Yuming Lin
Yuming
Lin
fb52dee078deff3b2d4d281366ddf03b5e54783d