自然语言处理—学习

今日学术视野(2018.8.22)

2018-08-22  本文已影响118人  ZQtGe6

cs.AI - 人工智能
cs.AR - 硬件体系结构
cs.CL - 计算与语言
cs.CV - 机器视觉与模式识别
cs.CY - 计算与社会
cs.DC - 分布式、并行与集群计算
cs.DS - 数据结构与算法
cs.IR - 信息检索
cs.IT - 信息论
cs.LG - 自动学习
cs.NE - 神经与进化计算
cs.NI - 网络和互联网体系结构
cs.RO - 机器人学
cs.SE - 软件工程
cs.SI - 社交网络与信息网络
cs.SY - 系统与控制
eess.AS - 语音处理
eess.SP - 信号处理
math.CO - 组合数学
math.OC - 优化与控制
math.PR - 概率
math.ST - 统计理论
nucl-th - 核理论
physics.med-ph - 医学物理学
q-bio.QM - 定量方法
stat.AP - 应用统计
stat.ME - 统计方法论
stat.ML - (统计)机器学习

• [cs.AI]Discovering Context Specific Causal Relationships
• [cs.AI]Learning to Dialogue via Complex Hindsight Experience Replay
• [cs.AI]Let CONAN tell you a story: Procedural quest generation
• [cs.AR]Wrangling Rogues: Managing Experimental Post-Moore Architectures
• [cs.CL]A Recipe for Arabic-English Neural Machine Translation
• [cs.CL]Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization
• [cs.CL]Adaptive Document Retrieval for Deep Question Answering
• [cs.CL]Automatic Detection of Vague Words and Sentences in Privacy Policies
• [cs.CL]Detecting cognitive impairments by agreeing on interpretations of linguistic features
• [cs.CL]Emoji Sentiment Scores of Writers using Odds Ratio and Fisher Exact Test
• [cs.CL]Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts
• [cs.CL]Learning to Compose over Tree Structures via POS Tags
• [cs.CL]Lexicosyntactic Inference in Neural Models
• [cs.CL]Linked Recurrent Neural Networks
• [cs.CL]Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension
• [cs.CL]Neural Machine Translation of Text from Non-Native Speakers
• [cs.CL]Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding
• [cs.CL]Question Generation from SQL Queries Improves Neural Semantic Parsing
• [cs.CL]SeVeN: Augmenting Word Embeddings with Unsupervised Relation Vectors
• [cs.CL]SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
• [cs.CL]Source-Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language
• [cs.CL]State-of-the-art Chinese Word Segmentation with Bi-LSTMs
• [cs.CL]XL-NBT: A Cross-lingual Neural Belief Tracking Framework
• [cs.CV]A Fast and Robust Matching Framework for Multimodal Remote Sensing Image Registration
• [cs.CV]CU-Net: Coupled U-Nets
• [cs.CV]CapsDeMM: Capsule network for Detection of Munro\textquoteright s Microabscess in skin biopsy images
• [cs.CV]CellLineNet: End-to-End Learning and Transfer Learning For Multiclass Epithelial Breast cell Line Classification via a Convolutional Neural Network
• [cs.CV]Class-Aware Fully-Convolutional Gaussian and Poisson Denoising
• [cs.CV]Concept Mask: Large-Scale Segmentation from Semantic Concepts
• [cs.CV]DeeSIL: Deep-Shallow Incremental Learning
• [cs.CV]Deep Multi-View Clustering via Multiple Embedding
• [cs.CV]Deep Multiple Instance Learning for Airplane Detection in High Resolution Imagery
• [cs.CV]Distractor-aware Siamese Networks for Visual Object Tracking
• [cs.CV]Dynamic Temporal Alignment of Speech to Lips
• [cs.CV]Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery
• [cs.CV]FusionNet and AugmentedFlowNet: Selective Proxy Ground Truth for Training on Unlabeled Images
• [cs.CV]GridFace: Face Rectification via Learning Local Homography Transformations
• [cs.CV]Haze Density Estimation via Modeling of Scattering Coefficients of Iso-depth Regions
• [cs.CV]In Defense of Single-column Networks for Crowd Counting
• [cs.CV]Incremental Learning in Person Re-Identification
• [cs.CV]Learning Monocular Depth by Distilling Cross-domain Stereo Networks
• [cs.CV]Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods
• [cs.CV]Learning to Learn from Web Data through Deep Semantic Embeddings
• [cs.CV]Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality
• [cs.CV]Person Re-Identification by Semantic Region Representation and Topology Constraint
• [cs.CV]Simultaneous synthesis of FLAIR and segmentation of white matter hypointensities from T1 MRIs
• [cs.CV]Single-View Place Recognition under Seasonal Changes
• [cs.CV]Universal Image Manipulation Detection using Deep Siamese Convolutional Neural Network
• [cs.CV]Video-to-Video Synthesis
• [cs.CY]Characterizing Transgender Health Issues in Twitter
• [cs.CY]Deep learning, deep change? Mapping the development of the Artificial Intelligence General Purpose Technology
• [cs.CY]Detecting home locations from CDR data: introducing spatial uncertainty to the state-of-the-art
• [cs.CY]New Approaches and Trends in the Philosophy of Educational Technology for Learning and Teaching Environments
• [cs.CY]The Effect of Security Education and Expertise on Security Assessments: the Case of Software Vulnerabilities
• [cs.CY]What do the US West Coast Public Libraries Post on Twitter?
• [cs.DC]GPU PaaS Computation Model in Aneka Cloud Computing Environment
• [cs.DC]Pangea: Monolithic Distributed Storage for Data Analytics
• [cs.DS]Scalable Edge Partitioning
• [cs.IR]Attainment Ratings for Graph-Query Recommendation
• [cs.IR]Dynamic Intention-Aware Recommendation with Self-Attention
• [cs.IR]Heuristics for publishing dynamic content as structured data with schema.org
• [cs.IR]Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up and Enhance Recommendations
• [cs.IR]The Deconfounded Recommender: A Causal Inference Approach to Recommendation
• [cs.IT]Amplitude Quantization for Type-2 Codebook Based CSI Feedback in New Radio System
• [cs.IT]Configurable Distributed Physical Downlink Control Channel for 5G New Radio: ResourceBundling and Diversity Trade-off
• [cs.IT]Contract-based Incentive Mechanism for LTE over Unlicensed Channels
• [cs.IT]Improved Latency-Communication Trade-Off for Map-Shuffle-Reduce Systems with Stragglers
• [cs.IT]Non-Asymptotic and Asymptotic Fundamental Limits of Guessing Subject to Distortion
• [cs.IT]On cyclic codes of length 2^e over finite fields
• [cs.IT]On the compression of messages in the multi-party setting
• [cs.IT]Optimized Rate-Adaptive Protograph-Based LDPC Codes for Source Coding with Side Information
• [cs.IT]The Capacity of Some Pólya String Models
• [cs.IT]Ultra Reliable, Low Latency Vehicle-to-Infrastructure Wireless Communications with Edge Computing
• [cs.LG]A Distribution Similarity Based Regularizer for Learning Bayesian Networks
• [cs.LG]A Semi-Supervised and Inductive Embedding Model for Churn Prediction of Large-Scale Mobile Games
• [cs.LG]Effect of secular trend in drug effectiveness study in real world data
• [cs.LG]Exact Passive-Aggressive Algorithms for Learning to Rank Using Interval Labels
• [cs.LG]Faster Support Vector Machines
• [cs.LG]Fourier analysis perspective for sufficient dimension reduction problem
• [cs.LG]Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies
• [cs.LG]Optimizing Deep Neural Network Architecture: A Tabu Search Based Approach
• [cs.LG]PAC-learning is Undecidable
• [cs.LG]Reproducible evaluation of classification methods in Alzheimer's disease: framework and application to MRI and PET data
• [cs.LG]Synthetic Patient Generation: A Deep Learning Approach Using Variational Autoencoders
• [cs.LG]TLR: Transfer Latent Representation for Unsupervised Domain Adaptation
• [cs.LG]Tangent-Normal Adversarial Regularization for Semi-supervised Learning
• [cs.LG]Triangle Lasso for Simultaneous Clustering and Optimization in Graph Datasets
• [cs.NE]Progressive Operational Perceptron with Memory
• [cs.NI]Energy Efficiency of Server-Centric PON Data Center Architecture for Fog Computing
• [cs.NI]Impact of Link Failures on the Performance of MapReduce in Data Center Networks
• [cs.NI]Towards Fine Grained Network Flow Prediction
• [cs.RO]Proprioceptive Sonomyographic Control: A novel method of intuitive proportional control of multiple degrees of freedom for upper-extremity amputees
• [cs.RO]What Stands-in for a Missing Tool? A Prototypical Grounded Knowledge-based Approach to Tool Substitution
• [cs.SE]Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks
• [cs.SE]Towards Anticipation of Architectural Smells using Link Prediction Techniques
• [cs.SI]An incremental local-first community detection method for dynamic graphs
• [cs.SI]Community detection in networks with unobserved edges
• [cs.SI]Detecting Core-Periphery Structure in Spatial Networks
• [cs.SI]Ensemble-based Overlapping Community Detection using Disjoint Community Structures
• [cs.SI]Multi-dimensional Graph Convolutional Networks
• [cs.SI]Signed Graph Convolutional Network
• [cs.SY]Optimized Path Planning for Inspection by Unmanned Aerial Vehicles Swarm with Energy Constraints
• [eess.AS]Multimodal speech synthesis architecture for unsupervised speaker adaptation
• [eess.SP]On Geometric Analysis of Affine Sparse Subspace Clustering
• [math.CO]Reed-Solomon codes over small fields with constrained generator matrices
• [math.OC]Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions
• [math.PR]A General Framework of Multi-Armed Bandit Processes by Switching Restrictions
• [math.ST]Generalized Bregman and Jensen divergences which include some f-divergences
• [math.ST]On the error in Laplace approximations of high-dimensional integrals
• [math.ST]Optimal proposals for Approximate Bayesian Computation
• [math.ST]The Mismatch Principle: Statistical Learning Under Large Model Uncertainties
• [nucl-th]Revisiting the proton-radius problem using constrained Gaussian processes
• [physics.med-ph]Translational Motion Compensation for Soft Tissue Velocity Images
• [q-bio.QM]Peptide-Spectra Matching from Weak Supervision
• [stat.AP]Alzheimer's Disease Modelling and Staging through Independent Gaussian Process Analysis of Spatio-Temporal Brain Changes
• [stat.AP]An Assessment of Covariates of Nonstationary Storm Surge Statistical Behavior by Bayesian Model Averaging
• [stat.AP]Analyzing within Garage Fuel Economy Gaps to Support Vehicle Purchasing Decisions - A Copula-Based Modeling & Forecasting Approach
• [stat.AP]Bayesian Hidden Markov Tree Models for Clustering Genes with Shared Evolutionary History
• [stat.AP]Spatio-temproal prediction of crimes using network analytic approach
• [stat.ME]A Stepwise Approach for High-Dimensional Gaussian Graphical Models
• [stat.ME]A Structural-Factor Approach to Modeling High-Dimensional Time Series
• [stat.ME]A general approach to detect gene (G)-environment (E) additive interaction leveraging G-E independence in case-control studies
• [stat.ME]Analysis of "Learn-As-You-Go" (LAGO) Studies
• [stat.ME]Bayesian Regression for a Dirichlet Distributed Response using Stan
• [stat.ME]On Design of Problem Token Questions in Quality of Experience Surveys
• [stat.ME]Semiparametric estimation of structural failure time model in continuous-time processes
• [stat.ME]Spillover Effects in Cluster Randomized Trials with Noncompliance
• [stat.ME]The empirical likelihood prior applied to bias reduction of general estimating equations
• [stat.ML]Applying Machine Learning To Maize Traits Prediction
• [stat.ML]Causal Discovery by Telling Apart Parents and Children
• [stat.ML]Multi-View Graph Embedding Using Randomized Shortest Paths

·····································

• [cs.AI]Discovering Context Specific Causal Relationships
Saisai Ma, Jiuyong Li, Lin Liu, Thuc Duy Le
http://arxiv.org/abs/1808.06316v1

With the increasing need of personalised decision making, such as personalised medicine and online recommendations, a growing attention has been paid to the discovery of the context and heterogeneity of causal relationships. Most existing methods, however, assume a known cause (e.g. a new drug) and focus on identifying from data the contexts of heterogeneous effects of the cause (e.g. patient groups with different responses to the new drug). There is no approach to efficiently detecting directly from observational data context specific causal relationships, i.e. discovering the causes and their contexts simultaneously. In this paper, by taking the advantages of highly efficient decision tree induction and the well established causal inference framework, we propose the Tree based Context Causal rule discovery (TCC) method, for efficient exploration of context specific causal relationships from data. Experiments with both synthetic and real world data sets show that TCC can effectively discover context specific causal rules from the data.

• [cs.AI]Learning to Dialogue via Complex Hindsight Experience Replay
Keting Lu, Shiqi Zhang, Xiaoping Chen
http://arxiv.org/abs/1808.06497v1

Reinforcement learning methods have been used for learning dialogue policies from the experience of conversations. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the relatively small number of successful dialogues in early learning phase. Hindsight experience replay (HER) enables an agent to learn from failure, but the vanilla HER is inapplicable to dialogue domains due to dialogue goals being implicit (c.f., explicit goals in manipulation tasks). In this work, we develop two complex HER methods providing different trade-offs between complexity and performance. Experiments were conducted using a realistic user simulator. Results suggest that our HER methods perform better than standard and prioritized experience replay methods (as applied to deep Q-networks) in learning rate, and that our two complex HER methods can be combined to produce the best performance.

• [cs.AI]Let CONAN tell you a story: Procedural quest generation
Vincent Breault, Sebastien Ouellet, Jim Davies
http://arxiv.org/abs/1808.06217v1

This work proposes an engine for the Creation Of Novel Adventure Narrative (CONAN), which is a procedural quest generator. It uses a planning approach to story generation. The engine is tested on its ability to create quests, which are sets of actions that must be performed in order to achieve a certain goal, usually for a reward. The engine takes in a world description represented as a set of facts, including characters, locations, and items, and generates quests according to the state of the world and the preferences of the characters. We evaluate quests through the classification of the motivations behind the quests, based on the sequences of actions required to complete the quests. We also compare different world descriptions and analyze the difference in motivations for the quests produced by the engine. Compared against human structural quest analysis, the current engine was found to be able to replicate the quest structures found in commercial video game quests.

• [cs.AR]Wrangling Rogues: Managing Experimental Post-Moore Architectures
Will Powell, Jason Riedy, Jeffrey S. Young
http://arxiv.org/abs/1808.06334v1

The Rogues Gallery is a new experimental testbed that is focused on tackling "rogue" architectures for the Post-Moore era of computing. While some of these devices have roots in the embedded and high-performance computing spaces, managing current and emerging technologies provides a challenge for system administration that are not always foreseen in traditional data center environments. We present an overview of the motivations and design of the initial Rogues Gallery testbed and cover some of the unique challenges that we have seen and foresee with upcoming hardware prototypes for future post-Moore research. Specifically, we cover the networking, identity management, scheduling of resources, and tools and sensor access aspects of the Rogues Gallery and techniques we have developed to manage these new platforms.

• [cs.CL]A Recipe for Arabic-English Neural Machine Translation
Abdullah Alrajeh
http://arxiv.org/abs/1808.06116v1

In this paper, we present a recipe for building a good Arabic-English neural machine translation. We compare neural systems with traditional phrase-based systems using various parallel corpora including UN, ISI and Ummah. We also investigate the importance of special preprocessing of the Arabic script. The presented results are based on test sets from NIST MT 2005 and 2012. The best neural system produces a gain of +13 BLEU points compared to an equivalent simple phrase-based system in NIST MT12 test set. Unexpectedly, we find that tuning a model trained on the whole data using a small high quality corpus like Ummah gives a substantial improvement (+3 BLEU points). We also find that training a neural system with a small Arabic-English corpus is competitive to a traditional phrase-based system.

• [cs.CL]Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization
Logan Lebanoff, Kaiqiang Song, Fei Liu
http://arxiv.org/abs/1808.06218v1

Generating an abstract from a set of relevant documents remains challenging. Despite the development of the neural encoder-decoder framework, prior studies focus primarily on single-document summarization, possibly because labelled training data can be automatically harvested from the Web. Nevertheless, labelled data for multi-document summarization are scarce. There is thus an increasing need to adapt the encoder-decoder framework from single- to multiple-document summarization in an unsupervised fashion. In this paper we present an initial investigation into a novel adaptation method. It exploits the maximal marginal relevance method to select representative sentences from multi-document input, and an abstractive encoder-decoder model to fuse disparate sentences to an abstractive summary. The adaptation method is robust and itself requires no training data. Our system compares favorably to state-of-the-art extractive and abstractive approaches judged by both automatic metrics and human assessors.

• [cs.CL]Adaptive Document Retrieval for Deep Question Answering
Bernhard Kratzwald, Stefan Feuerriegel
http://arxiv.org/abs/1808.06528v1

State-of-the-art systems in deep question answering proceed as follows: (1) an initial document retrieval selects relevant documents, which (2) are then processed by a neural network in order to extract the final answer. Yet the exact interplay between both components is poorly understood, especially concerning the number of candidate documents that should be retrieved. We show that choosing a static number of documents -- as used in prior research -- suffers from a noise-information trade-off and yields suboptimal results. As a remedy, we propose an adaptive document retrieval model. This learns the optimal candidate number for document retrieval, conditional on the size of the corpus and the query. We report extensive experimental results showing that our adaptive approach outperforms state-of-the-art methods on multiple benchmark datasets, as well as in the context of corpora with variable sizes.

• [cs.CL]Automatic Detection of Vague Words and Sentences in Privacy Policies
Logan Lebanoff, Fei Liu
http://arxiv.org/abs/1808.06219v1

Website privacy policies represent the single most important source of information for users to gauge how their personal data are collected, used and shared by companies. However, privacy policies are often vague and people struggle to understand the content. Their opaqueness poses a significant challenge to both Internet users and policy regulators. In this paper, we seek to identify vague content in privacy policies. We construct the first corpus of human-annotated vague words and sentences and present empirical studies on automatic vagueness detection. We investigate context-aware and context-agnostic models for predicting vague words, and explore auxiliary-classifier generative adversarial networks for characterizing sentence vagueness. Our experimental results demonstrate the effectiveness of proposed approaches. Finally, we provide suggestions for resolving vagueness and improving the usability of privacy policies.

• [cs.CL]Detecting cognitive impairments by agreeing on interpretations of linguistic features
Zining Zhu, Jekaterina Novikova, Frank Rudzicz
http://arxiv.org/abs/1808.06570v1

Linguistic features have shown promising applications for detecting various cognitive impairments. To improve detection accuracies, increasing the amount of data or linguistic features have been two applicable approaches. However, acquiring additional clinical data could be expensive, and hand-carving features are burdensome. In this paper, we take a third approach, putting forward Consensus Networks (CN), a framework to diagnose after reaching agreements between modalities. We divide the linguistic features into non-overlapping subsets according to their natural categories, let neural networks ("ePhysicians") learn low-dimensional representations ("interpretation vectors") that agree with each other. These representations are passed into a neural network classifier, resulting in a framework for assessing cognitive impairments. In this paper, we also present methods that empirically improve the performance of CN. Namely, the addition of a noise modality and allowing gradients to propagate to interpreters while optimizing the classifier. We then present two ablation studies to illustrate the effectiveness of CN: dividing subsets in the natural modalities is more beneficial than doing so randomly, and that models built with consensus settings outperform those without given the same modalities of features. To understand further what happens in consensus networks, we visualize the interpretation vectors during training procedures. They demonstrate symmetry in an aggregate manner. Overall, using all of the 413 linguistic features, our models significantly outperform traditional classifiers, which are used by the state-of-the-art papers.

• [cs.CL]Emoji Sentiment Scores of Writers using Odds Ratio and Fisher Exact Test
Jose Berengueres
http://arxiv.org/abs/1808.06110v1

The sentiment of a given emoji is traditionally calculated by averaging the ratings {-1, 0 or +1} given by various users to a given context where the emoji appears. However, using such formula complicates the statistical significance analysis particularly for low sample sizes. Here, we provide sentiment scores using odds and a sentiment mapping to a 4-icon scale. We show how odds ratio statistics leads to simpler sentiment analysis. Finally, we provide a list of sentiment scores with the often-missing exact p-values and CI for the most common emoji.

• [cs.CL]Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts
Di Jin, Peter Szolovits
http://arxiv.org/abs/1808.06161v1

Prevalent models based on artificial neural network (ANN) for sentence classification often classify sentences in isolation without considering the context in which sentences appear. This hampers the traditional sentence classification approaches to the problem of sequential sentence classification, where structured prediction is needed for better overall classification performance. In this work, we present a hierarchical sequential labeling network to make use of the contextual information within surrounding sentences to help classify the current sentence. Our model outperforms the state-of-the-art results by 2%-3% on two benchmarking datasets for sequential sentence classification in medical scientific abstracts.

• [cs.CL]Learning to Compose over Tree Structures via POS Tags
Gehui Shen, Zhi-Hong Deng, Ting Huang, Xi Chen
http://arxiv.org/abs/1808.06075v1

Recursive Neural Network (RecNN), a type of models which compose words or phrases recursively over syntactic tree structures, has been proven to have superior ability to obtain sentence representation for a variety of NLP tasks. However, RecNN is born with a thorny problem that a shared compositional function for each node of trees can't capture the complex semantic compositionality so that the expressive power of model is limited. In this paper, in order to address this problem, we propose Tag-Guided HyperRecNN/TreeLSTM (TG-HRecNN/TreeLSTM), which introduces hypernetwork into RecNNs to take as inputs Part-of-Speech (POS) tags of word/phrase and generate the semantic composition parameters dynamically. Experimental results on five datasets for two typical NLP tasks show proposed models both obtain significant improvement compared with RecNN and TreeLSTM consistently. Our TG-HTreeLSTM outperforms all existing RecNN-based models and achieves or is competitive with state-of-the-art on four sentence classification benchmarks. The effectiveness of our models is also demonstrated by qualitative analysis.

• [cs.CL]Lexicosyntactic Inference in Neural Models
Aaron Steven White, Rachel Rudinger, Kyle Rawlins, Benjamin Van Durme
http://arxiv.org/abs/1808.06232v1

We investigate neural models' ability to capture lexicosyntactic inferences: inferences triggered by the interaction of lexical and syntactic information. We take the task of event factuality prediction as a case study and build a factuality judgment dataset for all English clause-embedding verbs in various syntactic contexts. We use this dataset, which we make publicly available, to probe the behavior of current state-of-the-art neural systems, showing that these systems make certain systematic errors that are clearly visible through the lens of factuality prediction.

• [cs.CL]Linked Recurrent Neural Networks
Zhiwei Wang, Yao Ma, Dawei Yin, Jiliang Tang
http://arxiv.org/abs/1808.06170v1

Recurrent Neural Networks (RNNs) have been proven to be effective in modeling sequential data and they have been applied to boost a variety of tasks such as document classification, speech recognition and machine translation. Most of existing RNN models have been designed for sequences assumed to be identically and independently distributed (i.i.d). However, in many real-world applications, sequences are naturally linked. For example, web documents are connected by hyperlinks; and genes interact with each other. On the one hand, linked sequences are inherently not i.i.d., which poses tremendous challenges to existing RNN models. On the other hand, linked sequences offer link information in addition to the sequential information, which enables unprecedented opportunities to build advanced RNN models. In this paper, we study the problem of RNN for linked sequences. In particular, we introduce a principled approach to capture link information and propose a linked Recurrent Neural Network (LinkedRNN), which models sequential and link information coherently. We conduct experiments on real-world datasets from multiple domains and the experimental results validate the effectiveness of the proposed framework.

• [cs.CL]Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension
Liang Wang, Sujian Li, Wei Zhao, Kewei Shen, Meng Sun, Ruoyu Jia, Jingming Liu
http://arxiv.org/abs/1808.06289v1

Cloze-style reading comprehension has been a popular task for measuring the progress of natural language understanding in recent years. In this paper, we design a novel multi-perspective framework, which can be seen as the joint training of heterogeneous experts and aggregate context information from different perspectives. Each perspective is modeled by a simple aggregation module. The outputs of multiple aggregation modules are fed into a one-timestep pointer network to get the final answer. At the same time, to tackle the problem of insufficient labeled data, we propose an efficient sampling mechanism to automatically generate more training examples by matching the distribution of candidates between labeled and unlabeled data. We conduct our experiments on a recently released cloze-test dataset CLOTH (Xie et al., 2017), which consists of nearly 100k questions designed by professional teachers. Results show that our method achieves new state-of-the-art performance over previous strong baselines.

• [cs.CL]Neural Machine Translation of Text from Non-Native Speakers
Alison Lui, Antonios Anastasopoulos, David Chiang
http://arxiv.org/abs/1808.06267v1

Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.5 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.

• [cs.CL]Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding
Bin Wang, Fenxiao Chen, Angela Wang, C. -C. Jay Kuo
http://arxiv.org/abs/1808.06305v1

Although embedded vector representations of words offer impressive performance on many natural language processing (NLP) applications, the information of ordered input sequences is lost to some extent if only context-based samples are used in the training. For further performance improvement, two new post-processing techniques, called post-processing via variance normalization (PVN) and post-processing via dynamic embedding (PDE), are proposed in this work. The PVN method normalizes the variance of principal components of word vectors while the PDE method learns orthogonal latent variables from ordered input sequences. The PVN and the PDE methods can be integrated to achieve better performance. We apply these post-processing techniques to two popular word embedding methods (i.e., word2vec and GloVe) to yield their post-processed representations. Extensive experiments are conducted to demonstrate the effectiveness of the proposed post-processing techniques.

• [cs.CL]Question Generation from SQL Queries Improves Neural Semantic Parsing
Daya Guo, Yibo Sun, Duyu Tang, Nan Duan, Ming Zhou, Jian Yin
http://arxiv.org/abs/1808.06304v1

We study how to learn a semantic parser of state-of-the-art accuracy with less supervised training data. We conduct our study on WikiSQL, the largest hand-annotated semantic parsing dataset to date. First, we demonstrate that question generation is an effective method that empowers us to learn a state-of-the-art neural network based semantic parser with thirty percent of the supervised training data. Second, we show that applying question generation to the full supervised training data further improves the state-of-the-art model. In addition, we observe that there is a logarithmic relationship between the accuracy of a semantic parser and the amount of training data.

• [cs.CL]SeVeN: Augmenting Word Embeddings with Unsupervised Relation Vectors
Luis Espinosa-Anke, Steven Schockaert
http://arxiv.org/abs/1808.06068v1

We present SeVeN (Semantic Vector Networks), a hybrid resource that encodes relationships between words in the form of a graph. Different from traditional semantic networks, these relations are represented as vectors in a continuous vector space. We propose a simple pipeline for learning such relation vectors, which is based on word vector averaging in combination with an ad hoc autoencoder. We show that by explicitly encoding relational information in a dedicated vector space we can capture aspects of word meaning that are complementary to what is captured by word embeddings. For example, by examining clusters of relation vectors, we observe that relational similarities can be identified at a more abstract level than with traditional word vector differences. Finally, we test the effectiveness of semantic vector networks in two tasks: measuring word similarity and neural text categorization. SeVeN is available at bitbucket.org/luisespinosa/seven.

• [cs.CL]SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo, John Richardson
http://arxiv.org/abs/1808.06226v1

This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models directly from raw sentences, which allows us to make a purely end-to-end and language independent system. We perform a validation experiment of NMT on English-Japanese machine translation, and find that it is possible to achieve comparable accuracy to direct subword training from raw sentences. We also compare the performance of subword training and segmentation with various configurations. SentencePiece is available under the Apache 2 license at https://github.com/google/sentencepiece.

• [cs.CL]Source-Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language
He Bai, Yu Zhou, Jiajun Zhang, Liang Zhao, Mei-Yuh Hwang, Chengqing Zon
http://arxiv.org/abs/1808.06167v1

To deploy a spoken language understanding (SLU) model to a new language, language transferring is desired to avoid the trouble of acquiring and labeling a new big SLU corpus. Translating the original SLU corpus into the target language is an attractive strategy. However, SLU corpora consist of plenty of semantic labels (slots), which general-purpose translators cannot handle well, not to mention additional culture differences. This paper focuses on the language transferring task given a tiny in-domain parallel SLU corpus. The in-domain parallel corpus can be used as the first adaptation on the general translator. But more importantly, we show how to use reinforcement learning (RL) to further finetune the adapted translator, where translated sentences with more proper slot tags receive higher rewards. We evaluate our approach on Chinese to English language transferring for SLU systems. The experimental results show that the generated English SLU corpus via adaptation and reinforcement learning gives us over 97% in the slot F1 score and over 84% accuracy in domain classification. It demonstrates the effectiveness of the proposed language transferring method. Compared with naive translation, our proposed method improves domain classification accuracy by relatively 22%, and the slot filling F1 score by relatively more than 71%.

• [cs.CL]State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Ji Ma, Kuzman Ganchev, David Weiss
http://arxiv.org/abs/1808.06511v1

A wide variety of neural-network architectures have been proposed for the task of Chinese word segmentation. Surprisingly, we find that a bidirectional LSTM model, when combined with standard deep learning techniques and best practices, can achieve better accuracy on many of the popular datasets as compared to models based on more complex neural-network architectures. Furthermore, our error analysis shows that out-of-vocabulary words remain challenging for neural-network models, and many of the remaining errors are unlikely to be fixed through architecture changes. Instead, more effort should be made on exploring resources for further improvement.

• [cs.CL]XL-NBT: A Cross-lingual Neural Belief Tracking Framework
Wenhu Chen, Jianshu Chen, Yu Su, Xin Wang, Dong Yu, Xifeng Yan, William Yang Wang
http://arxiv.org/abs/1808.06244v1

Task-oriented dialog systems are becoming pervasive, and many companies heavily rely on them to complement human agents for customer service in call centers. With globalization, the need for providing cross-lingual customer support becomes more urgent than ever. However, cross-lingual support poses great challenges---it requires a large amount of additional annotated data from native speakers. In order to bypass the expensive human annotation and achieve the first step towards the ultimate goal of building a universal dialog management system, we set out to build a cross-lingual state tracking framework without requiring any human labor. Specifically, we assume that there exists a source language with dialog belief tracking annotations while having no access to any form of dialogue data for the other target languages. Then, we pre-train a state tracker for the source language as a teacher, which is able to exploit easy-to-access parallel data and distill its own knowledge to the student state tracker in target languages. In this paper, we specifically discuss two different types of common parallel resources (bilingual corpus and bilingual dictionary) and design different strategies to realize our transfer learning framework. Experimentally, we successfully use English state tracker as the teacher to transfer its knowledge to both Italian and German trackers and achieve promising results.

• [cs.CV]A Fast and Robust Matching Framework for Multimodal Remote Sensing Image Registration
Yuanxin Ye, Lorenzo Bruzzone, Jie Shan, Francesca Bovolo, Qing Zhu
http://arxiv.org/abs/1808.06194v1

While image registration has been studied in remote sensing community for decades, registering multimodal data [e.g., optical, light detection and ranging (LiDAR), synthetic aperture radar (SAR), and map] remains a challenging problem because of significant nonlinear intensity differences between such data. To address this problem, we present a novel fast and robust matching framework integrating local descriptors for multimodal registration. In the proposed framework, a local descriptor (such as Histogram of Oriented Gradient (HOG), Local Self-Similarity or Speeded-Up Robust Feature) is first extracted at each pixel to form a pixel-wise feature representation of an image. Then we define a similarity measure based on the feature representation in frequency domain using the Fast Fourier Transform (FFT) technique, followed by a template matching scheme to detect control points between images. We also propose a novel pixel-wise feature representation using orientated gradients of images, which is named channel features of orientated gradients (CFOG). This novel feature is an extension of the pixel-wise HOG descriptor, and outperforms that both in matching performance and computational efficiency. The major advantages of the proposed framework include (1) structural similarity representation using the pixel-wise feature description and (2) high computational efficiency due to the use of FFT. Moreover, we design an automatic registration system for very large-size multimodal images based on the proposed framework. Experimental results obtained on many different types of multimodal images show the superior matching performance of the proposed framework with respect to the state-of-the-art methods and the effectiveness of the designed system, which show very good potential large-size image registration in real applications.

• [cs.CV]CU-Net: Coupled U-Nets
Zhiqiang Tang, Xi Peng, Shijie Geng, Yizhe Zhu, Dimitris N. Metaxas
http://arxiv.org/abs/1808.06521v1

We design a new connectivity pattern for the U-Net architecture. Given several stacked U-Nets, we couple each U-Net pair through the connections of their semantic blocks, resulting in the coupled U-Nets (CU-Net). The coupling connections could make the information flow more efficiently across U-Nets. The feature reuse across U-Nets makes each U-Net very parameter efficient. We evaluate the coupled U-Nets on two benchmark datasets of human pose estimation. Both the accuracy and model parameter number are compared. The CU-Net obtains comparable accuracy as state-of-the-art methods. However, it only has at least 60% fewer parameters than other approaches.

• [cs.CV]CapsDeMM: Capsule network for Detection of Munro\textquoteright s Microabscess in skin biopsy images
Anabik Pal, Akshay Chaturvedi, Utpal Garain, Aditi Chandra, Raghunath Chatterjee, Swapan Senapati
http://arxiv.org/abs/1808.06428v1

This paper presents an approach for automatic detection of Munro\textquoteright s Microabscess in stratum corneum (SC) of human skin biopsy in order to realize a machine assisted diagnosis of Psoriasis. The challenge of detecting neutrophils in presence of nucleated cells is solved using the recent advances of deep learning algorithms. Separation of SC layer, extraction of patches from the layer followed by classification of patches with respect to presence or absence of neutrophils form the basis of the overall approach which is effected through an integration of a U-Net based segmentation network and a capsule network for classification. The novel design of the present capsule net leads to a drastic reduction in the number of parameters without any noticeable compromise in the overall performance. The research further addresses the challenge of dealing with Mega-pixel images (in 10X) vis-`a-vis Giga-pixel ones (in 40X). The promising result coming out of an experiment on a dataset consisting of 273 real-life images shows that a practical system is possible based on the present research. The implementation of our system is available at~\url{https://github.com/Anabik/CapsDeMM}

• [cs.CV]CellLineNet: End-to-End Learning and Transfer Learning For Multiclass Epithelial Breast cell Line Classification via a Convolutional Neural Network
Darlington Ahiale Akogo, Vincent Appiah, Xavier-Lewis Palmer
http://arxiv.org/abs/1808.06041v1

Computer Vision for Analyzing and Classifying cells and tissues often require rigorous lab procedures and so automated Computer Vision solutions have been sought. Most work in such field usually requires Feature Extractions before the analysis of such features via Machine Learning and Machine Vision algorithms. We developed a Convolutional Neural Network that classifies 5 types of epithelial breast cell lines comprised of two human cancer lines, 2 normal immortalized lines, and 1 immortalized mouse line (MDA-MB-468, MCF7, 10A, 12A and HC11) without requiring feature extraction. The Multiclass Cell Line Classification Convolutional Neural Network extends our earlier work on a Binary Breast Cancer Cell Line Classification model. CellLineNet is 31-layer Convolutional Neural Network trained, validated and tested on a 3,252 image dataset of 5 types of Epithelial Breast cell Lines (MDA-MB-468, MCF7, 10A, 12A and HC11) in an end-to-end fashion. End-to-End Learning enables CellLineNet to identify and learn on its own, visual features and regularities most important to Breast Cancer Cell Line Classification from the dataset of images. Using Transfer Learning, the 28-layer MobileNet Convolutional Neural Network architecture with pre-trained ImageNet weights is extended and fine tuned to the Multiclass Epithelial Breast cell Line Classification problem. CellLineNet simply requires an imaged Cell Line as input and it outputs the type of breast epithelial cell line (MDA-MB-468, MCF7, 10A, 12A or HC11) as predicted probabilities for the 5 classes. CellLineNet scored a 96.67% Accuracy.

• [cs.CV]Class-Aware Fully-Convolutional Gaussian and Poisson Denoising
Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein
http://arxiv.org/abs/1808.06562v1

We propose a fully-convolutional neural-network architecture for image denoising which is simple yet powerful. Its structure allows to exploit the gradual nature of the denoising process, in which shallow layers handle local noise statistics, while deeper layers recover edges and enhance textures. Our method advances the state-of-the-art when trained for different noise levels and distributions (both Gaussian and Poisson). In addition, we show that making the denoiser class-aware by exploiting semantic class information boosts performance, enhances textures and reduces artifacts.

• [cs.CV]Concept Mask: Large-Scale Segmentation from Semantic Concepts
Yufei Wang, Zhe Lin, Xiaohui Shen, Jianming Zhang, Scott Cohen
http://arxiv.org/abs/1808.06032v1

Existing works on semantic segmentation typically consider a small number of labels, ranging from tens to a few hundreds. With a large number of labels, training and evaluation of such task become extremely challenging due to correlation between labels and lack of datasets with complete annotations. We formulate semantic segmentation as a problem of image segmentation given a semantic concept, and propose a novel system which can potentially handle an unlimited number of concepts, including objects, parts, stuff, and attributes. We achieve this using a weakly and semi-supervised framework leveraging multiple datasets with different levels of supervision. We first train a deep neural network on a 6M stock image dataset with only image-level labels to learn visual-semantic embedding on 18K concepts. Then, we refine and extend the embedding network to predict an attention map, using a curated dataset with bounding box annotations on 750 concepts. Finally, we train an attention-driven class agnostic segmentation network using an 80-category fully annotated dataset. We perform extensive experiments to validate that the proposed system performs competitively to the state of the art on fully supervised concepts, and is capable of producing accurate segmentations for weakly learned and unseen concepts.

• [cs.CV]DeeSIL: Deep-Shallow Incremental Learning
Eden Belouadah, Adrian Popescu
http://arxiv.org/abs/1808.06396v1

Incremental Learning (IL) is an interesting AI problem when the algorithm is assumed to work on a budget. This is especially true when IL is modeled using a deep learning approach, where two com- plex challenges arise due to limited memory, which induces catastrophic forgetting and delays related to the retraining needed in order to incorpo- rate new classes. Here we introduce DeeSIL, an adaptation of a known transfer learning scheme that combines a fixed deep representation used as feature extractor and learning independent shallow classifiers to in- crease recognition capacity. This scheme tackles the two aforementioned challenges since it works well with a limited memory budget and each new concept can be added within a minute. Moreover, since no deep re- training is needed when the model is incremented, DeeSIL can integrate larger amounts of initial data that provide more transferable features. Performance is evaluated on ImageNet LSVRC 2012 against three state of the art algorithms. Results show that, at scale, DeeSIL performance is 23 and 33 points higher than the best baseline when using the same and more initial data respectively.

• [cs.CV]Deep Multi-View Clustering via Multiple Embedding
Bingqian Lin, Yuan Xie, Yanyun Qu, Cuihua Li
http://arxiv.org/abs/1808.06220v1

Exploring the information among multiple views usually leads to more promising clustering performance. Most existing multi-view clustering algorithms perform clustering separately: first extracts multiple handcrafted features or deep features, then conducts traditional clustering such as spectral clustering or K-means. However, the learned features may not often work well for clustering. To overcome this problem, we propose the Deep Multi-View Clustering via Multiple Embedding (DMVC-ME), which learns deep embedded features, multi-view fusion mechanism and clustering assignment simultaneously in an end-to-end manner. Specifically, we adopt a KL divergence to refine the soft clustering assignment with the help of a multi-view fused target distribution. The parameters are updated via an efficient alternative optimization scheme. As a result, more clustering-friendly features can be learned and the complementary traits among different views can be well captured. We demonstrate the effectiveness of our approach on several challenging image datasets, where significant superiority can be found over single/multi-view baselines and the state-of-the-art multi-view clustering methods.

• [cs.CV]Deep Multiple Instance Learning for Airplane Detection in High Resolution Imagery
Mohammad Reza Mohammadi
http://arxiv.org/abs/1808.06178v1

Automatic airplane detection in aerial imagery has a variety of applications. Two of the major challenges in this area are variations in scale and direction of the airplanes. In order to solve these challenges, we present a rotation-and-scale invariant airplane proposal generator. This proposal generator is developed based on the symmetric and regular boundaries of airplanes from the top view called symmetric line segments (SLS). Then, the generated proposals are used to train a deep convolutional neural network for removing non-airplane proposals. Since each airplane can have multiple SLS proposals, where some of them are not in the direction of the fuselage, we collect all proposals correspond to one ground truth as a positive bag and the others as the negative instances. To have multiple instance deep learning, we modify the training approach of the network to learn from each positive bag at least one instance as well as all negative instances. Finally, we employ non-maximum suppression to remove duplicate detections. Our experiments on NWPU VHR-10 dataset show that our method is a promising approach for automatic airplane detection in very high resolution images. Moreover, the proposed algorithm can estimate the direction of the airplanes using box-level annotations as an extra achievement.

• [cs.CV]Distractor-aware Siamese Networks for Visual Object Tracking
Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, Weiming Hu
http://arxiv.org/abs/1808.06048v1

Recently, Siamese networks have drawn great attention in visual tracking community because of their balanced accuracy and speed. However, features used in most Siamese tracking approaches can only discriminate foreground from the non-semantic backgrounds. The semantic backgrounds are always considered as distractors, which hinders the robustness of Siamese trackers. In this paper, we focus on learning distractor-aware Siamese networks for accurate and long-term tracking. To this end, features used in traditional Siamese trackers are analyzed at first. We observe that the imbalanced distribution of training data makes the learned features less discriminative. During the off-line training phase, an effective sampling strategy is introduced to control this distribution and make the model focus on the semantic distractors. During inference, a novel distractor-aware module is designed to perform incremental learning, which can effectively transfer the general embedding to the current video domain. In addition, we extend the proposed approach for long-term tracking by introducing a simple yet effective local-to-global search region strategy. Extensive experiments on benchmarks show that our approach significantly outperforms the state-of-the-arts, yielding 9.6% relative gain in VOT2016 dataset and 35.9% relative gain in UAV20L dataset. The proposed tracker can perform at 160 FPS on short-term benchmarks and 110 FPS on long-term benchmarks.

• [cs.CV]Dynamic Temporal Alignment of Speech to Lips
Tavi Halperin, Ariel Ephrat, Shmuel Peleg
http://arxiv.org/abs/1808.06250v1

Many speech segments in movies are re-recorded in a studio during postproduction, to compensate for poor sound quality as recorded on location. Manual alignment of the newly-recorded speech with the original lip movements is a tedious task. We present an audio-to-video alignment method for automating speech to lips alignment, stretching and compressing the audio signal to match the lip movements. This alignment is based on deep audio-visual features, mapping the lips video and the speech signal to a shared representation. Using this shared representation we compute the lip-sync error between every short speech period and every video frame, followed by the determination of the optimal corresponding frame for each short sound period over the entire video clip. We demonstrate successful alignment both quantitatively, using a human perception-inspired metric, as well as qualitatively. The strongest advantage of our audio-to-video approach is in cases where the original voice in unclear, and where a constant shift of the sound can not give a perfect alignment. In these cases state-of-the-art methods will fail.

• [cs.CV]Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery
Grégoire Payen de La Garanderie, Amir Atapour Abarghouei, Toby P. Breckon
http://arxiv.org/abs/1808.06253v1

Recent automotive vision work has focused almost exclusively on processing forward-facing cameras. However, future autonomous vehicles will not be viable without a more comprehensive surround sensing, akin to a human driver, as can be provided by 360{\deg} panoramic cameras. We present an approach to adapt contemporary deep network architectures developed on conventional rectilinear imagery to work on equirectangular 360{\deg} panoramic imagery. To address the lack of annotated panoramic automotive datasets availability, we adapt a contemporary automotive dataset, via style and projection transformations, to facilitate the cross-domain retraining of contemporary algorithms for panoramic imagery. Following this approach we retrain and adapt existing architectures to recover scene depth and 3D pose of vehicles from monocular panoramic imagery without any panoramic training labels or calibration parameters. Our approach is evaluated qualitatively on crowd-sourced panoramic images and quantitatively using an automotive environment simulator to provide the first benchmark for such techniques within panoramic imagery.

• [cs.CV]FusionNet and AugmentedFlowNet: Selective Proxy Ground Truth for Training on Unlabeled Images
Osama Makansi, Eddy Ilg, Thomas Brox
http://arxiv.org/abs/1808.06389v1

Recent work has shown that convolutional neural networks (CNNs) can be used to estimate optical flow with high quality and fast runtime. This makes them preferable for real-world applications. However, such networks require very large training datasets. Engineering the training data is difficult and/or laborious. This paper shows how to augment a network trained on an existing synthetic dataset with large amounts of additional unlabelled data. In particular, we introduce a selection mechanism to assemble from multiple estimates a joint optical flow field, which outperforms that of all input methods. The latter can be used as proxy-ground-truth to train a network on real-world data and to adapt it to specific domains of interest. Our experimental results show that the performance of networks improves considerably, both, in cross-domain and in domain-specific scenarios. As a consequence, we obtain state-of-the-art results on the KITTI benchmarks.

• [cs.CV]GridFace: Face Rectification via Learning Local Homography Transformations
Erjin Zhou, Zhimin Cao, Jian Sun
http://arxiv.org/abs/1808.06210v1

In this paper, we propose a method, called GridFace, to reduce facial geometric variations and improve the recognition performance. Our method rectifies the face by local homography transformations, which are estimated by a face rectification network. To encourage the image generation with canonical views, we apply a regularization based on the natural face distribution. We learn the rectification network and recognition network in an end-to-end manner. Extensive experiments show our method greatly reduces geometric variations, and gains significant improvements in unconstrained face recognition scenarios.

• [cs.CV]Haze Density Estimation via Modeling of Scattering Coefficients of Iso-depth Regions
Jie Chen, Cheen-Hau Tan, Lap-Pui Chau
http://arxiv.org/abs/1808.06207v1

Vision based haze density estimation is of practical implications for the purpose of precaution alarm and emergency reactions toward disastrous hazy weathers. In this paper, we introduce a haze density estimation framework based on modeling of scattering coefficients of iso-depth regions. A haze density metric of Normalized Scattering Coefficient (NSC) is proposed to measure current haze density level with reference to two reference scales. Iso-depth regions are determined via superpixel segmentation. Efficient searching and matching of iso-depth units could be carried out for measurements via unstationary cameras. A robust dark SP selection method is used to produce reliable predictions for most out-door scenarios.

• [cs.CV]In Defense of Single-column Networks for Crowd Counting
Ze Wang, Zehao Xiao, Kai Xie, Qiang Qiu, Xiantong Zhen, Xianbin Cao
http://arxiv.org/abs/1808.06133v1

Crowd counting usually addressed by density estimation becomes an increasingly important topic in computer vision due to its widespread applications in video surveillance, urban planning, and intelligence gathering. However, it is essentially a challenging task because of the greatly varied sizes of objects, coupled with severe occlusions and vague appearance of extremely small individuals. Existing methods heavily rely on multi-column learning architectures to extract multi-scale features, which however suffer from heavy computational cost, especially undesired for crowd counting. In this paper, we propose the single-column counting network (SCNet) for efficient crowd counting without relying on multi-column networks. SCNet consists of residual fusion modules (RFMs) for multi-scale feature extraction, a pyramid pooling module (PPM) for information fusion, and a sub-pixel convolutional module (SPCM) followed by a bilinear upsampling layer for resolution recovery. Those proposed modules enable our SCNet to fully capture multi-scale features in a compact single-column architecture and estimate high-resolution density map in an efficient way. In addition, we provide a principled paradigm for density map generation and data augmentation for training, which shows further improved performance. Extensive experiments on three benchmark datasets show that our SCNet delivers new state-of-the-art performance and surpasses previous methods by large margins, which demonstrates the great effectiveness of SCNet as a single-column network for crowd counting.

• [cs.CV]Incremental Learning in Person Re-Identification
Prajjwal Bhargava
http://arxiv.org/abs/1808.06281v1

Person Re-Identification is still a challenging task in Computer Vision due to variety of reasons. On the other side, Incremental Learning is still an issue since Deep Learning models tend to face the problem of overcatastrophic forgetting when trained on subsequent tasks. In this paper, we propose a model which can be used for multiple tasks in Person Re-Identification, provide state-of-the-art results on variety of tasks and still achieve considerable accuracy later on. We evaluated our model on three datasets Market 1501, CUHK-03, Duke MTMC. Extensive experiments show that this method can achieve Incremental Learning in Person ReID efficiently as well as for other tasks in computer vision as well.

• [cs.CV]Learning Monocular Depth by Distilling Cross-domain Stereo Networks
Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, Xiaogang Wang
http://arxiv.org/abs/1808.06586v1

Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving. Existing supervised and unsupervised methods face great challenges. Supervised methods require large amounts of depth measurement data, which are generally difficult to obtain, while unsupervised methods are usually limited in estimation accuracy. Synthetic data generated by graphics engines provide a possible solution for collecting large amounts of depth data. However, the large domain gaps between synthetic and realistic data make directly training with them challenging. In this paper, we propose to use the stereo matching network as a proxy to learn depth from synthetic data and use predicted stereo disparity maps for supervising the monocular depth estimation network. Cross-domain synthetic data could be fully utilized in this novel framework. Different strategies are proposed to ensure learned depth perception capability well transferred across different domains. Our extensive experiments show state-of-the-art results of monocular depth estimation on KITTI dataset.

• [cs.CV]Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods
Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas
http://arxiv.org/abs/1808.06369v1

Massive tourism is becoming a big problem for some cities, such as Barcelona, due to its concentration in some neighborhoods. In this work we gather Instagram data related to Barcelona consisting on images-captions pairs and, using the text as a supervisory signal, we learn relations between images, words and neighborhoods. Our goal is to learn which visual elements appear in photos when people is posting about each neighborhood. We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level. The presented pipeline allows analyzing the differences between the images that tourists and locals associate to the different neighborhoods. The proposed method, which can be extended to other cities or subjects, proves that Instagram data can be used to train multi-modal (image and text) machine learning models that are useful to analyze publications about a city at a neighborhood level. We publish the collected dataset, InstaBarcelona and the code used in the analysis.

• [cs.CV]Learning to Learn from Web Data through Deep Semantic Embeddings
Raul Gomez, Lluis Gomez, Jaume Gibert, Dimosthenis Karatzas
http://arxiv.org/abs/1808.06368v1

In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.

• [cs.CV]Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality
Sajad Saeedi, Bruno Bodin, Harry Wagstaff, Andy Nisbet, Luigi Nardi, John Mawer, Nicolas Melot, Oscar Palomar, Emanuele Vespa, Tom Spink, Cosmin Gorgovan, Andrew Webb, James Clarkson, Erik Tomusk, Thomas Debrunner, Kuba Kaszyk, Pablo Gonzalez-de-Aledo, Andrey Rodchenko, Graham Riley, Christos Kotselidis, Björn Franke, Michael F. P. O'Boyle, Andrew J. Davison, Paul H. J. Kelly, Mikel Luján, Steve Furber
http://arxiv.org/abs/1808.06352v1

Visual understanding of 3D environments in real-time, at low power, is a huge computational challenge. Often referred to as SLAM (Simultaneous Localisation and Mapping), it is central to applications spanning domestic and industrial robotics, autonomous vehicles, virtual and augmented reality. This paper describes the results of a major research effort to assemble the algorithms, architectures, tools, and systems software needed to enable delivery of SLAM, by supporting applications specialists in selecting and configuring the appropriate algorithm and the appropriate hardware, and compilation pathway, to meet their performance, accuracy, and energy consumption goals. The major contributions we present are (1) tools and methodology for systematic quantitative evaluation of SLAM algorithms, (2) automated, machine-learning-guided exploration of the algorithmic and implementation design space with respect to multiple objectives, (3) end-to-end simulation tools to enable optimisation of heterogeneous, accelerated architectures for the specific algorithmic requirements of the various SLAM algorithmic approaches, and (4) tools for delivering, where appropriate, accelerated, adaptive SLAM solutions in a managed, JIT-compiled, adaptive runtime context.

• [cs.CV]Person Re-Identification by Semantic Region Representation and Topology Constraint
Jianjun Lei, Lijie Niu, Huazhu Fu, Bo Peng, Qingming Huang, Chunping Hou
http://arxiv.org/abs/1808.06280v1

Person re-identification is a popular research topic which aims at matching the specific person in a multi-camera network automatically. Feature representation and metric learning are two important issues for person re-identification. In this paper, we propose a novel person re-identification method, which consists of a reliable representation called Semantic Region Representation (SRR), and an effective metric learning with Mapping Space Topology Constraint (MSTC). The SRR integrates semantic representations to achieve effective similarity comparison between the corresponding regions via parsing the body into multiple parts, which focuses on the foreground context against the background interference. To learn a discriminant metric, the MSTC is proposed to take into account the topological relationship among all samples in the feature space. It considers two-fold constraints: the distribution of positive pairs should be more compact than the average distribution of negative pairs with regard to the same probe, while the average distance between different classes should be larger than that between same classes. These two aspects cooperate to maintain the compactness of the intra-class as well as the sparsity of the inter-class. Extensive experiments conducted on five challenging person re-identification datasets, VIPeR, SYSU-sReID, QUML GRID, CUHK03, and Market-1501, show that the proposed method achieves competitive performance with the state-of-the-art approaches.

• [cs.CV]Simultaneous synthesis of FLAIR and segmentation of white matter hypointensities from T1 MRIs
Mauricio Orbes-Arteaga, M. Jorge Cardoso, Lauge Sørensen, Marc Modat, Sébastien Ourselin, Mads Nielsen, Akshay Pai
http://arxiv.org/abs/1808.06519v1

Segmenting vascular pathologies such as white matter lesions in Brain magnetic resonance images (MRIs) require acquisition of multiple sequences such as T1-weighted (T1-w) --on which lesions appear hypointense-- and fluid attenuated inversion recovery (FLAIR) sequence --where lesions appear hyperintense--. However, most of the existing retrospective datasets do not consist of FLAIR sequences. Existing missing modality imputation methods separate the process of imputation, and the process of segmentation. In this paper, we propose a method to link both modality imputation and segmentation using convolutional neural networks. We show that by jointly optimizing the imputation network and the segmentation network, the method not only produces more realistic synthetic FLAIR images from T1-w images, but also improves the segmentation of WMH from T1-w images only.

• [cs.CV]Single-View Place Recognition under Seasonal Changes
Daniel Olid, José M. Fácil, Javier Civera
http://arxiv.org/abs/1808.06516v1

Single-view place recognition, that we can define as finding an image that corresponds to the same place as a given query image, is a key capability for autonomous navigation and mapping. Although there has been a considerable amount of research in the topic, the high degree of image variability (with viewpoint, illumination or occlusions for example) makes it a research challenge. One of the particular challenges, that we address in this work, is weather variation. Seasonal changes can produce drastic appearance changes, that classic low-level features do not model properly. Our contributions in this paper are twofold. First we pre-process and propose a partition for the Nordland dataset, frequently used for place recognition research without consensus on the partitions. And second, we evaluate several neural network architectures such as pre-trained, siamese and triplet for this problem. Our best results outperform the state of the art of the field. A video showing our results can be found in https://youtu.be/VrlxsYZoHDM. The partitioned version of the Nordland dataset at http://webdiis.unizar.es/~jmfacil/pr-nordland/.

• [cs.CV]Universal Image Manipulation Detection using Deep Siamese Convolutional Neural Network
Aniruddha Mazumdar, Jaya Singh, Yosha Singh Tomar, Prabin Kumar Bora
http://arxiv.org/abs/1808.06323v1

Detection of different types of image editing operations carried out on an image is an important problem in image forensics. It gives the information about the processing history of an image, and also can expose forgeries present in an image. There have been few methods proposed to detect different types of image editing operations in a single framework. However, all the operations have to be known a priori in the training phase. But, in real-forensics scenarios it may not be possible to know about the editing operations carried out on an image. To solve this problem, we propose a novel deep learning-based method which can differentiate between different types of image editing operations. The proposed method classifies image patches in a pair-wise fashion as either similarly or differently processed using a deep siamese neural network. Once the network learns feature that can discriminate between different image editing operations, it can differentiate between different image editing operations not present in the training stage. The experimental results show the efficacy of the proposed method in detecting/discriminating different image editing operations.

• [cs.CV]Video-to-Video Synthesis
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro
http://arxiv.org/abs/1808.06601v1

We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. While its image counterpart, the image-to-image synthesis problem, is a popular topic, the video-to-video synthesis problem is less explored in the literature. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality. In this paper, we propose a novel video-to-video synthesis approach under the generative adversarial learning framework. Through carefully-designed generator and discriminator architectures, coupled with a spatio-temporal adversarial objective, we achieve high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats including segmentation masks, sketches, and poses. Experiments on multiple benchmarks show the advantage of our method compared to strong baselines. In particular, our model is capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis. Finally, we apply our approach to future video prediction, outperforming several state-of-the-art competing systems.

• [cs.CY]Characterizing Transgender Health Issues in Twitter
Amir Karami, Frank Webb, Vanessa L. Kitzie
http://arxiv.org/abs/1808.06022v1

Although there are millions of transgender people in the world, a lack of information exists about their health issues. This issue has consequences for the medical field, which only has a nascent understanding of how to identify and meet this population's health-related needs. Social media sites like Twitter provide new opportunities for transgender people to overcome these barriers by sharing their personal health experiences. Our research employs a computational framework to collect tweets from self-identified transgender users, detect those that are health-related, and identify their information needs. This framework is significant because it provides a macro-scale perspective on an issue that lacks investigation at national or demographic levels. Our findings identified 54 distinct health-related topics that we grouped into 7 broader categories. Further, we found both linguistic and topical differences in the health-related information shared by transgender men (TM) as com-pared to transgender women (TW). These findings can help inform medical and policy-based strategies for health interventions within transgender communities. Also, our proposed approach can inform the development of computational strategies to identify the health-related information needs of other marginalized populations.

• [cs.CY]Deep learning, deep change? Mapping the development of the Artificial Intelligence General Purpose Technology
J. Klinger, J. Mateos-Garcia, K. Stathoulopoulos
http://arxiv.org/abs/1808.06355v1

General Purpose Technologies (GPTs) that can be applied in many industries are an important driver of economic growth and national and regional competitiveness. In spite of this, the geography of their development and diffusion has not received significant attention in the literature. We address this with an analysis of Deep Learning (DL), a core technique in Artificial Intelligence (AI) increasingly being recognized as the latest GPT. We identify DL papers in a novel dataset from ArXiv, a popular preprints website, and use CrunchBase, a technology business directory to measure industrial capabilities related to it. After showing that DL conforms with the definition of a GPT, having experienced rapid growth and diffusion into new fields where it has generated an impact, we describe changes in its geography. Our analysis shows China's rise in AI rankings and relative decline in several European countries. We also find that initial volatility in the geography of DL has been followed by consolidation, suggesting that the window of opportunity for new entrants might be closing down as new DL research hubs become dominant. Finally, we study the regional drivers of DL clustering. We find that competitive DL clusters tend to be based in regions combining research and industrial activities related to it. This could be because GPT developers and adopters located close to each other can collaborate and share knowledge more easily, thus overcoming coordination failures in GPT deployment. Our analysis also reveals a Chinese comparative advantage in DL after we control for other explanatory factors, perhaps underscoring the importance of access to data and supportive policies for the successful development of this complex, `omni-use' technology.

• [cs.CY]Detecting home locations from CDR data: introducing spatial uncertainty to the state-of-the-art
Maarten Vanhoof, Fernando Reis, Zbigniew Smoreda, Thomas Ploetz
http://arxiv.org/abs/1808.06398v1

Non-continuous location traces inferred from Call Detail Records (CDR) at population scale are increasingly becoming available for research and show great potential for automated detection of meaningful places. Yet, a majority of Home Detection Algorithms (HDAs) suffer from "blind" deployment of criteria to define homes and from limited possibilities for validation. In this paper, we investigate the performance and capabilities of five popular criteria for home detection based on a very large mobile phone dataset from France (~18 million users, 6 months). Furthermore, we construct a data-driven framework to assess the spatial uncertainty related to the application of HDAs. Our findings appropriate spatial uncertainty in HDA and, in extension, for detection of meaningful places. We show how spatial uncertainties on the individuals' level can be assessed in absence of ground truth annotation, how they relate to traditional, high-level validation practices and how they can be used to improve results for, e.g., nation-wide population estimation.

• [cs.CY]New Approaches and Trends in the Philosophy of Educational Technology for Learning and Teaching Environments
Ismail Ipek, Rushan Ziatdinov
http://arxiv.org/abs/1808.06063v1

The purpose of this study is to discuss instructional design and technology (IDT) model strategies for developing learning and teaching environments, based on philosophical approaches to educational technology theory. The study begins with a discussion of IDT models to define the history of educational technology or instructional technology theories, based on instructional strategies and improvements. In the study, authors discuss the strategies and steps that a design team should follow when designing learning environments in industry, business and military scenarios, based on the philosophy of educational technology and latest technologies, which should give way to effective learning environments. The steps include recognising terminology in educational technology concepts, psychological and instructional foundations in instructional design (ID), as well as approaches to educational technology. To recap, our purpose is to combine necessary IDT model strategies for the pedagogical design of learning environments, with new technologies. We will also discuss powerful IDT models that aim to meet the very high expectations of digital and humanist education. To develop a high-quality learning environment, we will explain technology design steps and practice in order to improve the learning of tasks, complex cognitive skills, attitudes, motivations and competencies in the future trends of educational technology. At the end of the study, integrated technologies in e-learning were discussed and presented, based on foundations of IDT and the philosophy of educational technology.

• [cs.CY]The Effect of Security Education and Expertise on Security Assessments: the Case of Software Vulnerabilities
Luca Allodi, Marco Cremonini, Fabio Massacci, Woohyun Shim
http://arxiv.org/abs/1808.06547v1

In spite of the growing importance of software security and the industry demand for more cyber security expertise in the workforce, the effect of security education and experience on the ability to assess complex software security problems has only been recently investigated. As proxy for the full range of software security skills, we considered the problem of assessing the severity of software vulnerabilities by means of a structured analysis methodology widely used in industry (i.e. the Common Vulnerability Scoring System (\CVSS) v3), and designed a study to compare how accurately individuals with background in information technology but different professional experience and education in cyber security are able to assess the severity of software vulnerabilities. Our results provide some structural insights into the complex relationship between education or experience of assessors and the quality of their assessments. In particular we find that individual characteristics matter more than professional experience or formal education; apparently it is the \emph{combination} of skills that one owns (including the actual knowledge of the system under study), rather than the specialization or the years of experience, to influence more the assessment quality. Similarly, we find that the overall advantage given by professional expertise significantly depends on the composition of the individual security skills as well as on the available information.

• [cs.CY]What do the US West Coast Public Libraries Post on Twitter?
Amir Karami, Matthew Collins
http://arxiv.org/abs/1808.06021v1

Twitter has provided a great opportunity for public libraries to disseminate information for a variety of purposes. Twitter data have been applied in different domains such as health, politics, and history. There are thousands of public libraries in the US, but no study has yet investigated the content of their social media posts like tweets to find their interests. Moreover, traditional content analysis of Twitter content is not an efficient task for exploring thousands of tweets. Therefore, there is a need for automatic methods to overcome the limitations of manual methods. This paper proposes a computational approach to collecting and analyzing using Twitter Application Programming Interfaces (API) and investigates more than 138,000 tweets from 48 US west coast libraries using topic modeling. We found 20 topics and assigned them to five categories including public relations, book, event, training, and social good. Our results show that the US west coast libraries are more interested in using Twitter for public relations and book-related events. This research has both practical and theoretical applications for libraries as well as other organizations to explore social media actives of their customer and themselves.

• [cs.DC]GPU PaaS Computation Model in Aneka Cloud Computing Environment
Shashikant Ilager, Rajeev Wankar, Raghavendra Kune, Rajkumar Buyya
http://arxiv.org/abs/1808.06332v1

Due to the surge in the volume of data generated and rapid advancement in Artificial Intelligence (AI) techniques like machine learning and deep learning, the existing traditional computing models have become inadequate to process an enormous volume of data and the complex application logic for extracting intrinsic information. Computing accelerators such as Graphics processing units (GPUs) have become de facto SIMD computing system for many big data and machine learning applications. On the other hand, the traditional computing model has gradually switched from conventional ownership-based computing to subscription-based cloud computing model. However, the lack of programming models and frameworks to develop cloud-native applications in a seamless manner to utilize both CPU and GPU resources in the cloud has become a bottleneck for rapid application development. To support this application demand for simultaneous heterogeneous resource usage, programming models and new frameworks are needed to manage the underlying resources effectively. Aneka is emerged as a popular PaaS computing model for the development of Cloud applications using multiple programming models like Thread, Task, and MapReduce in a single container .NET platform. Since, Aneka addresses MIMD application development that uses CPU based resources and GPU programming like CUDA is designed for SIMD application development, here, the chapter discusses GPU PaaS computing model for Aneka Clouds for rapid cloud application development for .NET platforms. The popular opensource GPU libraries are utilized and integrated it into the existing Aneka task programming model. The scheduling policies are extended that automatically identify GPU machines and schedule respective tasks accordingly. A case study on image processing is discussed to demonstrate the system, which has been built using PaaS Aneka SDKs and CUDA library.

• [cs.DC]Pangea: Monolithic Distributed Storage for Data Analytics
Jia Zou, Arun Iyengar, Chris Jermaine
http://arxiv.org/abs/1808.06094v1

Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and non- shared execution data in separate systems such as distributed file system like HDFS, in-memory file system like Alluxio and computation framework like Spark. Such layering introduces significant performance and management costs for copying data across layers redundantly and deciding proper resource allocation for all layers. In this paper we propose a single system called Pangea that can manage all data---both intermediate and long-lived data, and their buffer/caching, data placement optimization, and failure recovery---all in one monolithic storage system, without any layering. We present a detailed performance evaluation of Pangea and show that its performance compares favorably with several widely used layered systems such as Spark.

• [cs.DS]Scalable Edge Partitioning
Sebastian Schlag, Christian Schulz, Daniel Seemaier, Darren Strash
http://arxiv.org/abs/1808.06411v1

Edge-centric distributed computations have appeared as a recent technique to improve the shortcomings of think-like-a-vertex algorithms on large scale-free networks. In order to increase parallelism on this model, edge partitioning - partitioning edges into roughly equally sized blocks - has emerged as an alternative to traditional (node-based) graph partitioning. In this work, we give a distributed memory parallel algorithm to compute high-quality edge partitions in a scalable way. Our algorithm scales to networks with billions of edges, and runs efficiently on thousands of PEs. Our technique is based on a fast parallelization of split graph construction and a use of advanced node partitioning algorithms. Our extensive experiments show that our algorithm has high quality on large real-world networks and large hyperbolic random graphs, which have a power law degree distribution and are therefore specifically targeted by edge partitioning

• [cs.IR]Attainment Ratings for Graph-Query Recommendation
Hal Cooper, Garud Iyengar, Ching-Yung Lin
http://arxiv.org/abs/1808.05988v1

The video game industry is larger than both the film and music industries combined. Recommender systems for video games have received relatively scant academic attention, despite the uniqueness of the medium and its data. In this paper, we introduce a graph-based recommender system that makes use of interactivity, arguably the most significant feature of video gaming. We show that the use of implicit data that tracks user-game interactions and levels of attainment (e.g. Sony Playstation Trophies, Microsoft Xbox Achievements) has high predictive value when making recommendations. Furthermore, we argue that the characteristics of the video gaming hobby (low cost, high duration, socially relevant) make clear the necessity of personalized, individual recommendations that can incorporate social networking information. We demonstrate the natural suitability of graph-query based recommendation for this purpose.

• [cs.IR]Dynamic Intention-Aware Recommendation with Self-Attention
Shuai Zhang, Yi Tay, Lina Yao, Aixin Sun
http://arxiv.org/abs/1808.06414v1

Predicting the missing values given the observed interaction matrix is a predominant task in the recommendation research field, which is well-suited to the case when long-term user preference is considered. However, ignoring timestamp information makes it impossible to detect interest drifts of individual users over time, while in many practical applications, both long and short-term intents are critical to the success of recommendation algorithms. In this paper, we aim to tackle the sequential recommendation problem by modeling these two types of intents in an integrated manner. Our model is structured into two components, one for short-term intents learning and the other one for long-term preference modeling. We propose capturing user's short-term interest with a self-attention mechanism which attends over the past behaviors in a supervised learning approach. The model is finally learned in a metric learning framework, which could overcome the weakness of dot product. Experiments on a wide range of datasets on different domains demonstrate that our approach outperforms the state-of-the-art by a wide margin.

• [cs.IR]Heuristics for publishing dynamic content as structured data with schema.org
Elias Kärle, Dieter Fensel
http://arxiv.org/abs/1808.06012v1

Publishing fast changing dynamic data as open data on the web in a scalable manner is not trivial. So far the only approaches describe publishing as much data as possible, which then leads to problems, like server capacity overload, network latency or unwanted knowledge disclosure. With this paper we show ways how to publish dynamic data in a scalable, meaningful manner by applying context-dependent publication heuristics. The outcome shows that the application of the right publication heuristics in the right domain can improve the publication performance significantly. Good knowledge about the domain help choosing the right publication heuristic and hence lead to very good publication results.

• [cs.IR]Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up and Enhance Recommendations
Emanuel Lacic, Dominik Kowald, Elisabeth Lex
http://arxiv.org/abs/1808.06417v1

In this paper, we present work-in-progress on applying user pre-filtering to speed up and enhance recommendations based on Collaborative Filtering. We propose to pre-filter users in order to extract a smaller set of candidate neighbors, who exhibit a high number of overlapping entities and to compute the final user similarities based on this set. To realize this, we exploit features of the high-performance search engine Apache Solr and integrate them into a scalable recommender system. We have evaluated our approach on a dataset gathered from Foursquare and our evaluation results suggest that our proposed user pre-filtering step can help to achieve both a better runtime performance as well as an increase in overall recommendation accuracy.

• [cs.IR]The Deconfounded Recommender: A Causal Inference Approach to Recommendation
Yixin Wang, Dawen Liang, Laurent Charlin, David M. Blei
http://arxiv.org/abs/1808.06581v1

The goal of a recommender system is to show its users items that they will like. In forming its prediction, the recommender system tries to answer: "what would the rating be if we 'forced' the user to watch the movie?" This is a question about an intervention in the world, a causal question, and so traditional recommender systems are doing causal inference from observational data. This paper develops a causal inference approach to recommendation. Traditional recommenders are likely biased by unobserved confounders, variables that affect both the "treatment assignments" (which movies the users watch) and the "outcomes" (how they rate them). We develop the deconfounded recommender, a strategy to leverage classical recommendation models for causal predictions. The deconfounded recommender uses Poisson factorization on which movies users watched to infer latent confounders in the data; it then augments common recommendation models to correct for potential confounding bias. The deconfounded recommender improves recommendation and it enjoys stable performance against interventions on test sets.

• [cs.IT]Amplitude Quantization for Type-2 Codebook Based CSI Feedback in New Radio System
Honglei Miao, Markus D. Mueck, Michael Faerber
http://arxiv.org/abs/1808.06402v1

In 3GPP new radio system, two types of codebook, namely Type-1 and Type-2 codebook, have been standardized for the channel state information (CSI) feedback in the support of advanced MIMO operation. Both types of codebook are constructed from 2-D DFT based grid of beams, and enable the CSI feedback of beam selection as well as PSK based co-phase combining between two polarizations. Moreover, Type-2 codebook based CSI feedback reports the wideband and subband amplitude information of the selected beams. As a result, it is envisioned that more accurate CSI shall be obtained from the Type-2 codebook based CSI feedback so that better precoded MIMO transmission can be employed by the network. To reduce the CSI feedback signaling, 1 bit based subband amplitude with only two quantization levels is supported in combination to 3 bits based wideband amplitude feedback. Typically, wideband amplitude shall be calculated as the linear average amplitude of the beam over all subbands. However, due to the coarse subband amplitude quantization, it has been observed in case of joint wideband and subband amplitude feedback, the average based wideband amplitude can lead to a large amplitude quantization errors. In this paper, we study two methods for joint wideband and subband amplitude calculations. Specifically, both optimal and sub-optimal methods are proposed. The optimal method can achieve the minimum amplitude quantization errors at the cost of a relatively large computation complexity. And by virtue of a derived scaling factor, the sub-optimal method exhibits clearly smaller quantization error than the conventional linear average based method especially for the channel with large frequency selectivity.

• [cs.IT]Configurable Distributed Physical Downlink Control Channel for 5G New Radio: ResourceBundling and Diversity Trade-off
Honglei Miao, Michael Faerber
http://arxiv.org/abs/1808.06397v1

New radio technologies for the fifth generation of wireless system have been extensively studied globally. Specifically, air interface protocols for 5G radio access network will be standardized in coming years by 3GPP. Due to its crucial function in scheduled system, physical layer downlink control channel (PDCCH) is a core element to enable all physical layer data transmissions. Recently, configurable distributed PDCCH with the intention to cope with different scenarios has been developed in 3GPP. To have comprehensive understanding of respective technical advantages and potential scenario dependent limitations, detailed performance analysis and evaluations of configurable distributed PDCCH are thoroughly studied in this paper. In particular, exponential effective SNR mapping (EESM) has been employed as the performance metric of configurable distributed PDCCH in different scenarios. It is demonstrated from EESM results that configurable distributed PDCCH offers additional degree of freedom for the trade-off between achieved frequency diversity and channel estimation gain by adjusting resource bundling level according to the channel and interference scenario experienced by the control channel transmission.

• [cs.IT]Contract-based Incentive Mechanism for LTE over Unlicensed Channels
Kenza Hamidouche, Walid Saad, Méroaune Debbah, My T. Thai, Zhu Han
http://arxiv.org/abs/1808.06579v1

In this paper, a novel economic approach, based on the framework of contract theory, is proposed for providing incentives for LTE over unlicensed channels (LTE-U) in cellular-networks. In this model, a mobile network operator (MNO) designs and offers a set of contracts to the users to motivate them to accept being served over the unlicensed bands. A practical model in which the information about the quality-of-service (QoS) required by every user is not known to the MNO and other users, is considered. For this contractual model, the closed-form expression of the price charged by the MNO for every user is derived and the problem of spectrum allocation is formulated as a matching game with incomplete information. For the matching problem, a distributed algorithm is proposed to assign the users to the licensed and unlicensed spectra. Simulation results show that the proposed pricing mechanism can increase the fraction of users that achieve their QoS requirements by up to 45% compared to classical algorithms that do not account for users requirements. Moreover, the performance of the proposed algorithm in the case of incomplete information is shown to approach the performance of the same mechanism with complete information.

• [cs.IT]Improved Latency-Communication Trade-Off for Map-Shuffle-Reduce Systems with Stragglers
Jingjing Zhang, Osvaldo Simeone
http://arxiv.org/abs/1808.06583v1

In a distributed computing system operating according to the map-shuffle-reduce framework, coding data prior to storage can be useful both to reduce the latency caused by straggling servers and to decrease the inter-server communication load in the shuffling phase. In prior work, a concatenated coding scheme was proposed for a matrix multiplication task. In this scheme, the outer Maximum Distance Separable (MDS) code is leveraged to correct erasures caused by stragglers, while the inner repetition code is used to improve the communication efficiency in the shuffling phase by means of coded multicasting. In this work, it is demonstrated that it is possible to leverage the redundancy created by repetition coding in order to increase the rate of the outer MDS code and hence to increase the multicasting opportunities in the shuffling phase. As a result, the proposed approach is shown to improve over the best known latency-communication overhead trade-off.

• [cs.IT]Non-Asymptotic and Asymptotic Fundamental Limits of Guessing Subject to Distortion
Shota Saito, Toshiyasu Matsushima
http://arxiv.org/abs/1808.06190v1

This paper considers the problem of guessing random variables subject to distortion. We investigate both non-asymptotic and asymptotic fundamental limits of the minimum t-th moment of the number of guesses. These fundamental limits are characterized by the quantity related to the R'enyi entropy. To derive the non-asymptotic bounds, we use our techniques developed in lossy compression. Moreover, using the results obtained in the non-asymptotic setting, we derive the asymptotic result and show that it coincides with the previous result by Arikan and Merhav.

• [cs.IT]On cyclic codes of length 2^e over finite fields
Binbin Pang, Shixin Zhu, Ping Li
http://arxiv.org/abs/1808.06338v1

Professor Cunsheng Ding gave cyclotomic constructions of cyclic codes with length being the product of two primes. In this paper, we study the cyclic codes of length n=2^e and dimension k=2^{e-1}. Clearly, Ding's construction is not hold in this place. We describe two new types of generalized cyclotomy of order two, which are different from Ding's. Furthermore, we study two classes of cyclic codes of length n and dimension k. We get the enumeration of these cyclic codes. What's more, all of the codes from our construction are among the best cyclic codes. Furthermore, we study the hull of cyclic codes of length n over \mathbb{F}_q. We obtain the range of \ell=\dim({\rm Hull}(C)). We construct and enumerate cyclic codes of length n having hull of given dimension.

• [cs.IT]On the compression of messages in the multi-party setting
Anurag Anshu, Penghui Yao
http://arxiv.org/abs/1808.06449v1

We consider the following communication task in the multi-party setting, which involves a joint random variable XYZMN with the property that M is independent of YZN conditioned on X and N is independent of XZM conditioned on Y. Three parties Alice, Bob and Charlie, respectively, observe samples x,y and z from XYZ. Alice and Bob communicate messages to Charlie with the goal that Charlie can output a sample from MN having correct correlation with XYZ. This task reflects the simultaneous message passing model of communication complexity. Furthermore, it is a generalization of some well studied problems in information theory, such as distributed source coding, source coding with a helper and one sender and one receiver message compression. It is also closely related to the lossy distributed source coding task. Our main result is an achievable communication region for this task in the one-shot setting, through which we obtain a near optimal characterization using auxiliary random variables of bounded size. We employ our achievability result to provide a near-optimal one-shot communication region for the task of lossy distributed source coding, in terms of auxiliary random variables of bounded size. Finally, we show that interaction is necessary to achieve the optimal expected communication cost for our main task.

• [cs.IT]Optimized Rate-Adaptive Protograph-Based LDPC Codes for Source Coding with Side Information
Fangping Ye, Elsa Dupraz, Zeina Mheich, Karine Amis
http://arxiv.org/abs/1808.06509v1

This paper considers the problem of source coding with side information at the decoder, also called Slepian-Wolf source coding scheme. In practical applications of this coding scheme, the statistical relation between the source and the side information can vary from one data transmission to another, and there is a need to adapt the coding rate depending on the current statistical relation. In this paper, we propose a novel rate-adaptive code construction based on LDPC codes for the Slepian-Wolf source coding scheme. The proposed code design method allows to optimize the code degree distributions at all the considered rates, while minimizing the amount of short cycles in the parity check matrices at all rates. Simulation results show that the proposed method greatly reduces the source coding rate compared to the standard LDPCA solution.

• [cs.IT]The Capacity of Some Pólya String Models
Ohad Elishco, Farzad Farnoud, Moshe Schwartz, Jehoshua Bruck
http://arxiv.org/abs/1808.06062v1

We study random string-duplication systems, which we call P'olya string models. These are motivated by DNA storage in living organisms, and certain random mutation processes that affect their genome. Unlike previous works that study the combinatorial capacity of string-duplication systems, or various string statistics, this work provides exact capacity or bounds on it, for several probabilistic models. In particular, we study the capacity of noisy string-duplication systems, including the tandem-duplication, end-duplication, and interspersed-duplication systems. Interesting connections are drawn between some systems and the signature of random permutations, as well as to the beta distribution common in population genetics.

• [cs.IT]Ultra Reliable, Low Latency Vehicle-to-Infrastructure Wireless Communications with Edge Computing
Md Mostofa Kamal Tareq, Omid Semiari, Mohsen Amini Salehi, Walid Saad
http://arxiv.org/abs/1808.06015v1

Ultra reliable, low latency vehicle-to-infrastructure (V2I) communications is a key requirement for seamless operation of autonomous vehicles (AVs) in future smart cities. To this end, cellular small base stations (SBSs) with edge computing capabilities can reduce the end-to-end (E2E) service delay by processing requested tasks from AVs locally, without forwarding the tasks to a remote cloud server. Nonetheless, due to the limited computational capabilities of the SBSs, coupled with the scarcity of the wireless bandwidth resources, minimizing the E2E latency for AVs and achieving a reliable V2I network is challenging. In this paper, a novel algorithm is proposed to jointly optimize AVs-to-SBSs association and bandwidth allocation to maximize the reliability of the V2I network. By using tools from labor matching markets, the proposed framework can effectively perform distributed association of AVs to SBSs, while accounting for the latency needs of AVs as well as the limited computational and bandwidth resources of SBSs. Moreover, the convergence of the proposed algorithm to a core allocation between AVs and SBSs is proved and its ability to capture interdependent computational and transmission latencies for AVs in a V2I network is characterized. Simulation results show that by optimizing the E2E latency, the proposed algorithm substantially outperforms conventional cell association schemes, in terms of service reliability and latency.

• [cs.LG]A Distribution Similarity Based Regularizer for Learning Bayesian Networks
Weirui Kong, Wenyi Wang
http://arxiv.org/abs/1808.06347v1

Probabilistic graphical models compactly represent joint distributions by decomposing them into factors over subsets of random variables. In Bayesian networks, the factors are conditional probability distributions. For many problems, common information exists among those factors. Adding similarity restrictions can be viewed as imposing prior knowledge for model regularization. With proper restrictions, learned models usually generalize better. In this work, we study methods that exploit such high-level similarities to regularize the learning process and apply them to the task of modeling the wave propagation in inhomogeneous media. We propose a novel distribution-based penalization approach that encourages similar conditional probability distribution rather than force the parameters to be similar explicitly. We show in experiment that our proposed algorithm solves the modeling wave propagation problem, which other baseline methods are not able to solve.

• [cs.LG]A Semi-Supervised and Inductive Embedding Model for Churn Prediction of Large-Scale Mobile Games
Xi Liu, Muhe Xie, Xidao Wen, Rui Chen, Yong Ge, Nick Duffield, Na Wang
http://arxiv.org/abs/1808.06573v1

Mobile gaming has emerged as a promising market with billion-dollar revenues. A variety of mobile game platforms and services have been developed around the world. One critical challenge for these platforms and services is to understand user churn behavior in mobile games. Successful churn prediction will benefit many stakeholders such as game developers and platform operators. In this paper, we present the first large-scale churn prediction solution for mobile games. In view of the common limitations of the state-of-the-art methods built upon traditional machine learning models, we devise a novel semi-supervised and inductive embedding model that jointly learns the prediction function and the embedding function for user-app relationships. We model these two functions by deep neural networks with a unique edge embedding technique that is able to capture both contextual information and relationship dynamics. We also design a novel attributed random walk technique that takes into consideration both topological adjacency and attributes similarities. To evaluate the performance of our solution, we collect the real-world data from a commercial mobile gaming platform that includes tens of thousands of games and hundreds of millions of user-app interactions. The experimental results with this data demonstrate the superiority of our proposed model against existing state-of-the-art methods.

• [cs.LG]Effect of secular trend in drug effectiveness study in real world data
Sharon Hensley Alford, Piyush Madan, Shilpa Mahatma, Italo Buleje, Yanyan Han, Fang Lu
http://arxiv.org/abs/1808.06117v1

We discovered secular trend bias in a drug effectiveness study for a recently approved drug. We compared treatment outcomes between patients who received the newly approved drug and patients exposed to the standard treatment. All patients diagnosed after the new drug's approval date were considered. We built a machine learning causal inference model to determine patient subpopulations likely to respond better to the newly approved drug. After identifying the presence of secular trend bias in our data, we attempted to adjust for the bias in two different ways. First, we matched patients on the number of days from the new drug's approval date that the patient's treatment (new or standard) began. Second, we included a covariate in the model for the number of days between the date of approval of the new drug and the treatment (new or standard) start date. Neither approach completely mitigated the bias. Residual bias we attribute to differences in patient disease severity or other unmeasured patient characteristics. Had we not identified the secular trend bias in our data, the causal inference model would have been interpreted without consideration for this underlying bias. Being aware of, testing for, and handling potential bias in the data is essential to diminish the uncertainty in AI modeling.

• [cs.LG]Exact Passive-Aggressive Algorithms for Learning to Rank Using Interval Labels
Naresh Manwani, Mohit Chandra
http://arxiv.org/abs/1808.06107v1

In this paper, we propose exact passive-aggressive (PA) online algorithms for learning to rank. The proposed algorithms can be used even when we have interval labels instead of actual labels for examples. The proposed algorithms solve a convex optimization problem at every trial. We find exact solution to those optimization problems to determine the updated parameters. We propose support class algorithm (SCA) which finds the active constraints using the KKT conditions of the optimization problems. These active constrains form support set which determines the set of thresholds that need to be updated. We derive update rules for PA, PA-I and PA-II. We show that the proposed algorithms maintain the ordering of the thresholds after every trial. We provide the mistake bounds of the proposed algorithms in both ideal and general settings. We also show experimentally that the proposed algorithms successfully learn accurate classifiers using interval labels as well as exact labels. Proposed algorithms also do well compared to other approaches.

• [cs.LG]Faster Support Vector Machines
Sebastian Schlag, Matthias Schmitt, Christian Schulz
http://arxiv.org/abs/1808.06394v1

The time complexity of support vector machines (SVMs) prohibits training on huge data sets with millions of samples. Recently, multilevel approaches to train SVMs have been developed to allow for time efficient training on huge data sets. While regular SVMs perform the entire training in one - time consuming - optimization step, multilevel SVMs first build a hierarchy of problems decreasing in size that resemble the original problem and then train an SVM model for each hierarchy level benefiting from the solved models of previous levels. We present a faster multilevel support vector machine that uses a label propagation algorithm to construct the problem hierarchy. Extensive experiments show that our new algorithm achieves speed-ups up to two orders of magnitude while having similar or better classification quality over state-of-the-art algorithms.

• [cs.LG]Fourier analysis perspective for sufficient dimension reduction problem
Rustem Takhanov
http://arxiv.org/abs/1808.06191v1

A theory of sufficient dimension reduction (SDR) is developed from an optimizational perspective. In our formulation of the problem, instead of dealing with raw data, we assume that our ground truth includes a mapping {\mathbf f}: {\mathbb R}^n\rightarrow {\mathbb R}^m and a probability distribution function p over {\mathbb R}^n, both given analytically. We formulate SDR as a problem of finding a function {\mathbf g}: {\mathbb R}^k\rightarrow {\mathbb R}^m and a matrix P\in {\mathbb R}^{k\times n} such that {\mathbb E}_{{\mathbf x}\sim p({\mathbf x})} \left|{\mathbf f}({\mathbf x}) - {\mathbf g}(P{\mathbf x})\right|^2 is minimal. It turns out that the latter problem allows a reformulation in the dual space, i.e. instead of searching for {\mathbf g}(P{\mathbf x}) we suggest searching for its Fourier transform. First, we characterize all tempered distributions that can serve as the Fourier transform of such functions. The reformulation in the dual space can be interpreted as a problem of finding a k-dimensional linear subspace S and a tempered distribution {\mathbf t} supported in S such that {\mathbf t} is "close" in a certain sense to the Fourier transform of {\mathbf f}. Instead of optimizing over generalized functions with a k-dimensional support, we suggest minimizing over ordinary functions but with an additional term R that penalizes a strong distortion of the support from any k-dimensional linear subspace. For a specific case of R, we develop an algorithm that can be formulated for functions given in the initial form as well as for their Fourier transforms. Eventually, we report results of numerical experiments with a discretized version of the latter algorithm.

• [cs.LG]Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies
Alessandro Achille, Tom Eccles, Loic Matthey, Christopher P. Burgess, Nick Watters, Alexander Lerchner, Irina Higgins
http://arxiv.org/abs/1808.06508v1

Intelligent behaviour in the real-world requires the ability to acquire new knowledge from an ongoing sequence of experiences while preserving and reusing past knowledge. We propose a novel algorithm for unsupervised representation learning from piece-wise stationary visual data: Variational Autoencoder with Shared Embeddings (VASE). Based on the Minimum Description Length principle, VASE automatically detects shifts in the data distribution and allocates spare representational capacity to new knowledge, while simultaneously protecting previously learnt representations from catastrophic forgetting. Our approach encourages the learnt representations to be disentangled, which imparts a number of desirable properties: VASE can deal sensibly with ambiguous inputs, it can enhance its own representations through imagination-based exploration, and most importantly, it exhibits semantically meaningful sharing of latents between different datasets. Compared to baselines with entangled representations, our approach is able to reason beyond surface-level statistics and perform semantically meaningful cross-domain inference.

• [cs.LG]Optimizing Deep Neural Network Architecture: A Tabu Search Based Approach
Tarun Kumar Gupta, Khalid Raza
http://arxiv.org/abs/1808.05979v1

The performance of Feedforward neural network (FNN) fully de-pends upon the selection of architecture and training algorithm. FNN architecture can be tweaked using several parameters, such as the number of hidden layers, number of hidden neurons at each hidden layer and number of connections between layers. There may be exponential combinations for these architectural attributes which may be unmanageable manually, so it requires an algorithm which can automatically design an optimal architecture with high generalization ability. Numerous optimization algorithms have been utilized for FNN architecture determination. This paper proposes a new methodology which can work on the estimation of hidden layers and their respective neurons for FNN. This work combines the advantages of Tabu search (TS) and Gradient descent with momentum backpropagation (GDM) training algorithm to demonstrate how Tabu search can automatically select the best architecture from the populated architectures based on minimum testing error criteria. The proposed approach has been tested on four classification benchmark dataset of different size.

• [cs.LG]PAC-learning is Undecidable
Sairaam Venkatraman, S Balasubramanian, R Raghunatha Sarma
http://arxiv.org/abs/1808.06324v1

The problem of attempting to learn the mapping between data and labels is the crux of any machine learning task. It is, therefore, of interest to the machine learning community on practical as well as theoretical counts to consider the existence of a test or criterion for deciding the feasibility of attempting to learn. We investigate the existence of such a criterion in the setting of PAC-learning, basing the feasibility solely on whether the mapping to be learnt lends itself to approximation by a given class of hypothesis functions. We show that no such criterion exists, exposing a fundamental limitation in the decidability of learning. In other words, we prove that testing for PAC-learnability is undecidable in the Turing sense. We also briefly discuss some of the probable implications of this result to the current practice of machine learning.

• [cs.LG]Reproducible evaluation of classification methods in Alzheimer's disease: framework and application to MRI and PET data
Jorge Samper-González, Ninon Burgos, Simona Bottani, Sabrina Fontanella, Pascal Lu, Arnaud Marcoux, Alexandre Routier, Jérémy Guillon, Michael Bacci, Junhao Wen, Anne Bertrand, Hugo Bertin, Marie-Odile Habert, Stanley Durrleman, Theodoros Evgeniou, Olivier Colliot, for the Alzheimer's Disease Neuroimaging Initiative, the Australian Imaging Biomarkers, Lifestyle flagship study of ageing
http://arxiv.org/abs/1808.06452v1

A large number of papers have introduced novel machine learning and feature extraction methods for automatic classification of AD. However, they are difficult to reproduce because key components of the validation are often not readily available. These components include selected participants and input data, image preprocessing and cross-validation procedures. The performance of the different approaches is also difficult to compare objectively. In particular, it is often difficult to assess which part of the method provides a real improvement, if any. We propose a framework for reproducible and objective classification experiments in AD using three publicly available datasets (ADNI, AIBL and OASIS). The framework comprises: i) automatic conversion of the three datasets into BIDS format, ii) a modular set of preprocessing pipelines, feature extraction and classification methods, together with an evaluation framework, that provide a baseline for benchmarking the different components. We demonstrate the use of the framework for a large-scale evaluation on 1960 participants using T1 MRI and FDG PET data. In this evaluation, we assess the influence of different modalities, preprocessing, feature types, classifiers, training set sizes and datasets. Performances were in line with the state-of-the-art. FDG PET outperformed T1 MRI for all classification tasks. No difference in performance was found for the use of different atlases, image smoothing, partial volume correction of FDG PET images, or feature type. Linear SVM and L2-logistic regression resulted in similar performance and both outperformed random forests. The classification performance increased along with the number of subjects used for training. Classifiers trained on ADNI generalized well to AIBL and OASIS. All the code of the framework and the experiments is publicly available at: https://gitlab.icm-institute.org/aramislab/AD-ML.

• [cs.LG]Synthetic Patient Generation: A Deep Learning Approach Using Variational Autoencoders
Ally Salim Jr
http://arxiv.org/abs/1808.06444v1

Artificial Intelligence in healthcare is a new and exciting frontier and the possibilities are endless. With deep learning approaches beating human performances in many areas, the logical next step is to attempt their application in the health space. For these and other Machine Learning approaches to produce good results and have their potential realized, the need for, and importance of, large amounts of accurate data is second to none. This is a challenge faced by many industries and more so in the healthcare space. We present an approach of using Variational Autoencoders (VAE's) as an approach to generating more data for training deeper networks, as well as uncovering underlying patterns in diagnoses and the patients suffering from them. By training a VAE, on available data, it was able to learn the latent distribution of the patient features given the diagnosis. It is then possible, after training, to sample from the learnt latent distribution to generate new accurate patient records given the patient diagnosis.

• [cs.LG]TLR: Transfer Latent Representation for Unsupervised Domain Adaptation
Pan Xiao, Bo Du, Jia Wu, Lefei Zhang, Ruimin Hu, Xuelong Li
http://arxiv.org/abs/1808.06206v1

Domain adaptation refers to the process of learning prediction models in a target domain by making use of data from a source domain. Many classic methods solve the domain adaptation problem by establishing a common latent space, which may cause the loss of many important properties across both domains. In this manuscript, we develop a novel method, transfer latent representation (TLR), to learn a better latent space. Specifically, we design an objective function based on a simple linear autoencoder to derive the latent representations of both domains. The encoder in the autoencoder aims to project the data of both domains into a robust latent space. Besides, the decoder imposes an additional constraint to reconstruct the original data, which can preserve the common properties of both domains and reduce the noise that causes domain shift. Experiments on cross-domain tasks demonstrate the advantages of TLR over competing methods.

• [cs.LG]Tangent-Normal Adversarial Regularization for Semi-supervised Learning
Bing Yu, Jingfeng Wu, Zhanxing Zhu
http://arxiv.org/abs/1808.06088v1

The ever-increasing size of modern datasets combined with the difficulty of obtaining label information has made semi-supervised learning of significant practical importance in modern machine learning applications. Compared with supervised learning, the key difficulty in semi-supervised learning is how to make full use of the unlabeled data. In order to utilize manifold information provided by unlabeled data, we propose a novel regularization called the tangent-normal adversarial regularization, which is composed by two parts. The two terms complement with each other and jointly enforce the smoothness along two different directions that are crucial for semi-supervised learning. One is applied along the tangent space of the data manifold, aiming to enforce local invariance of the classifier on the manifold, while the other is performed on the normal space orthogonal to the tangent space, intending to impose robustness on the classifier against the noise causing the observed data deviating from the underlying data manifold. Both of the two regularizers are achieved by the strategy of virtual adversarial training. Our method has achieved state-of-the-art performance on semi-supervised learning tasks on both artificial dataset and FashionMNIST dataset.

• [cs.LG]Triangle Lasso for Simultaneous Clustering and Optimization in Graph Datasets
Yawei Zhao, Kai Xu, Xinwang Liu, En Zhu, Xinzhong Zhu, Jianping Yin
http://arxiv.org/abs/1808.06556v1

Recently, network lasso has drawn many attentions due to its remarkable performance on simultaneous clustering and optimization. However, it usually suffers from the imperfect data (noise, missing values etc), and yields sub-optimal solutions. The reason is that it finds the similar instances according to their features directly, which is usually impacted by the imperfect data, and thus returns sub-optimal results. In this paper, we propose triangle lasso to avoid its disadvantage. Triangle lasso finds the similar instances according to their neighbours. If two instances have many common neighbours, they tend to become similar. Although some instances are profiled by the imperfect data, it is still able to find the similar counterparts. Furthermore, we develop an efficient algorithm based on Alternating Direction Method of Multipliers (ADMM) to obtain a moderately accurate solution. In addition, we present a dual method to obtain the accurate solution with the low additional time consumption. We demonstrate through extensive numerical experiments that triangle lasso is robust to the imperfect data. It usually yields a better performance than the state-of-the-art method when performing data analysis tasks in practical scenarios.

• [cs.NE]Progressive Operational Perceptron with Memory
Dat Thanh Tran, Serkan Kiranyaz, Moncef Gabbouj, Alexandros Iosifidis
http://arxiv.org/abs/1808.06377v1

Generalized Operational Perceptron (GOP) was proposed to generalize the linear neuron model in the traditional Multilayer Perceptron (MLP) and this model can mimic the synaptic connections of the biological neurons that have nonlinear neurochemical behaviours. Progressive Operational Perceptron (POP) is a multilayer network composing of GOPs which is formed layer-wise progressively. In this work, we propose major modifications that can accelerate as well as augment the progressive learning procedure of POP by incorporating an information-preserving, linear projection path from the input to the output layer at each progressive step. The proposed extensions can be interpreted as a mechanism that provides direct information extracted from the previously learned layers to the network, hence the term "memory". This allows the network to learn deeper architectures with better data representations. An extensive set of experiments show that the proposed modifications can surpass the learning capability of the original POPs and other related algorithms.

• [cs.NI]Energy Efficiency of Server-Centric PON Data Center Architecture for Fog Computing
Sanaa Hamid Mohamed, Taisir E. H. El-Gorashi, Jaafar M. H. Elmirghani
http://arxiv.org/abs/1808.06113v1

In this paper, we utilize Mixed Integer Linear Programming (MILP) models to compare the energy efficiency and performance of a server-centric Passive Optical Networks (PON)-based data centers design with different data centers networking topologies for the use in fog computing. For representative MapReduce workloads, completion time results indicate that the server-centric PON-based design achieves 67% reduction in the energy consumption compared to DCell with equivalent performance.

• [cs.NI]Impact of Link Failures on the Performance of MapReduce in Data Center Networks
Sanaa Hamid Mohamed, Taisir E. H. El-Gorashi, Jaafar M. H. Elmirghani
http://arxiv.org/abs/1808.06115v1

In this paper, we utilize Mixed Integer Linear Programming (MILP) models to determine the impact of link failures on the performance of shuffling operations in MapReduce when different data center network (DCN) topologies are used. For a set of non-fatal single and multi-links failures, the results indicate that different DCNs experience different completion time degradations ranging between 5% and 40%. The best performance under links failures is achieved by a server-centric PON-based DCN.

• [cs.NI]Towards Fine Grained Network Flow Prediction
Patrick Jahnke, Emmanuel Stapf, Jonas Mieseler, Gerhard Neumann, Patrick Eugster
http://arxiv.org/abs/1808.06453v1

One main challenge for the design of networks is that traffic load is not generally known in advance. This makes it hard to adequately devote resources such as to best prevent or mitigate bottlenecks. While several authors have shown how to predict traffic in a coarse grained manner by aggregating flows, fine grained prediction of traffic at the level of individual flows, including bursty traffic, is widely considered to be impossible. This paper shows, to the best of our knowledge, the first approach to fine grained per flow traffic prediction. In short, we introduce the Frequency-based Kernel Kalman Filter (FKKF), which predicts individual flows' behavior based on measurements. Our FKKF relies on the well known Kalman Filter in combination with a kernel to support the prediction of non linear functions. Furthermore we change the operating space from time to frequency space. In this space, into which we transform the input data via a Short-Time Fourier Transform (STFT), the peak structures of flows can be predicted after gleaning their key characteristics, with a Principal Component Analysis (PCA), from past and ongoing flows that stem from the same socket-to-socket connection. We demonstrate the effectiveness of our approach on popular benchmark traces from a university data center. Our approach predicts traffic on average across 17 out of 20 groups of flows with an average prediction error of 6.43% around 0.49 (average) seconds in advance, whilst existing coarse grained approaches exhibit prediction errors of 77% at best.

• [cs.RO]Proprioceptive Sonomyographic Control: A novel method of intuitive proportional control of multiple degrees of freedom for upper-extremity amputees
Ananya S. Dhawan, Biswarup Mukherjee, Shriniwas Patwardhan, Nima Akhlaghi, Gyorgy Levay, Rahsaan Holley, Wilsaan Joiner, Michelle Harris-Love, Siddhartha Sikdar
http://arxiv.org/abs/1808.06543v1

Technological advances in multi-articulated prosthetic hands have outpaced the methods available to amputees to intuitively control these devices. Amputees often cite difficulty of use as a key contributing factor for abandoning their prosthesis, creating a pressing need for improved control technology. A major challenge of traditional myoelectric control strategies using surface electromyography electrodes has been the difficulty in achieving intuitive and robust proportional control of multiple degrees of freedom. In this paper, we describe a new control method, proprioceptive sonomyographic control that overcomes several limitations of myoelectric control. In sonomyography, muscle mechanical deformation is sensed using ultrasound, as compared to electrical activation, and therefore the resulting control signals can directly control the position of the end effector. Compared to myoelectric control which controls the velocity of the end-effector device, sonomyographic control is more congruent with residual proprioception in the residual limb. We tested our approach with 5 upper-extremity amputees and able-bodied subjects using a virtual target achievement and holding task. Amputees and able-bodied participants demonstrated the ability to achieve positional control for 5 degrees of freedom with an hour of training. Our results demonstrate the potential of proprioceptive sonomyographic control for intuitive dexterous control of multiarticulated prostheses.

• [cs.RO]What Stands-in for a Missing Tool? A Prototypical Grounded Knowledge-based Approach to Tool Substitution
Madhura Thosar, Christian A. Mueller, Sebastian Zug
http://arxiv.org/abs/1808.06423v1

When a robot is operating in a dynamic environment, it cannot be assumed that a tool required to solve a given task will always be available. In case of a missing tool, an ideal response would be to find a substitute to complete the task. In this paper, we present a proof of concept of a grounded knowledge-based approach to tool substitution. In order to validate the suitability of a substitute, we conducted experiments involving 22 substitution scenarios. The substitutes computed by the proposed approach were validated on the basis of the experts' choices for each scenario. Our evaluation showed, in 63% scenarios, the approach identified exactly the same substitutes as experts.

• [cs.SE]Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks
Liang Bao, Xin Liu, Weizhao Chen
http://arxiv.org/abs/1808.06008v1

Big data analytics frameworks (BDAFs) have been widely used for data processing applications. These frameworks provide a large number of configuration parameters to users, which leads to a tuning issue that overwhelms users. To address this issue, many automatic tuning approaches have been proposed. However, it remains a critical challenge to generate enough samples in a high-dimensional parameter space within a time constraint. In this paper, we present AutoTune--an automatic parameter tuning system that aims to optimize application execution time on BDAFs. AutoTune first constructs a smaller-scale testbed from the production system so that it can generate more samples, and thus train a better prediction model, under a given time constraint. Furthermore, the AutoTune algorithm produces a set of samples that can provide a wide coverage over the high-dimensional parameter space, and searches for more promising configurations using the trained prediction model. AutoTune is implemented and evaluated using the Spark framework and HiBench benchmark deployed on a public cloud. Extensive experimental results illustrate that AutoTune improves on default configurations by 63.70% on average, and on the five state-of-the-art tuning algorithms by 6%-23%.

• [cs.SE]Towards Anticipation of Architectural Smells using Link Prediction Techniques
J. Andrés Díaz-Pace, Antonela Tommasel, Daniela Godoy
http://arxiv.org/abs/1808.06362v1

Software systems naturally evolve, and this evolution often brings design problems that cause system degradation. Architectural smells are typical symptoms of such problems, and several of these smells are related to undesired dependencies among modules. The early detection of these smells is important for developers, because they can plan ahead for maintenance or refactoring efforts, thus preventing system degradation. Existing tools for identifying architectural smells can detect the smells once they exist in the source code. This means that their undesired dependencies are already created. In this work, we explore a forward-looking approach that is able to infer groups of likely module dependencies that can anticipate architectural smells in a future system version. Our approach considers the current module structure as a network, along with information from previous versions, and applies link prediction techniques (from the field of social network analysis). In particular, we focus on dependency-related smells, such as Cyclic Dependency and Hublike Dependency, which fit well with the link prediction model. An initial evaluation with two open-source projects shows that, under certain considerations, the predictions of our approach are satisfactory. Furthermore, the approach can be extended to other types of dependency-based smells or metrics.

• [cs.SI]An incremental local-first community detection method for dynamic graphs
Hiroki Kanezashi, Toyotaro Suzumura
http://arxiv.org/abs/1808.06251v1

Community detections for large-scale real world networks have been more popular in social analytics. In particular, dynamically growing network analyses become important to find long-term trends and detect anomalies. In order to analyze such networks, we need to obtain many snapshots and apply same analytic methods to them. However, it is inefficient to extract communities from these whole newly generated networks with little differences every time, and then it is impossible to follow the network growths in the real time. We proposed an incremental community detection algorithm for high-volume graph streams. It is based on the top of a well-known batch-oriented algorithm named DEMON[1]. We also evaluated performance and precisions of our proposed incremental algorithm with real-world big networks with up to 410,236 vertices and 2,439,437 edges and computed in less than one second to detect communities in an incremental fashion - which achieves up to 107 times faster than the original algorithm without sacrificing accuracies.

• [cs.SI]Community detection in networks with unobserved edges
Till Hoffmann, Leto Peel, Renaud Lambiotte, Nick S. Jones
http://arxiv.org/abs/1808.06079v1

We develop a Bayesian hierarchical model to identify communities in networks for which we do not observe the edges directly, but instead observe a series of interdependent signals for each of the nodes. Fitting the model provides an end-to-end community detection algorithm that does not extract information as a sequence of point estimates but propagates uncertainties from the raw data to the community labels. Our approach naturally supports multiscale community detection as well as the selection of an optimal scale using model comparison. We study the properties of the algorithm using synthetic data and apply it to daily returns of constituents of the S&P100 index as well as climate data from US cities.

• [cs.SI]Detecting Core-Periphery Structure in Spatial Networks
Junteng Jia, Austin R. Benson
http://arxiv.org/abs/1808.06544v1

The core-periphery structure, which decompose a network into a densely-connected core and a sparsely-connected periphery, constantly emerges from spatial networks such as traffic, biological and social networks. In this paper, we propose a random network model for spatial networks with core-periphery structure, which is inspired by the Kleinberg small-world model. In this model, we use a vertex core score to indicate the "coreness" of each vertex, and we connect each pair of vertices with a probability parameterized by their distance and core scores. We compute the optimal vertex core scores in a network by fitting it to our model using a maximum likelihood estimation. Results in real-world networks indicate that the fitted vertex core scores are informative machine learning features for vertex metadata prediction and network classification. Furthermore, we develop near linear-time algorithms for network generation and model inference by using the fast multipole method, which allow us to scale to networks with millions of vertices with minor tradeoffs in accuracy.

• [cs.SI]Ensemble-based Overlapping Community Detection using Disjoint Community Structures
Tanmoy Chakraborty, Saptarshi Ghosh, Noseong Park
http://arxiv.org/abs/1808.06200v1

While there has been a plethora of approaches for detecting disjoint communities from real-world complex networks, some methods for detecting overlapping community structures have also been recently proposed. In this work, we argue that, instead of developing separate approaches for detecting overlapping communities, a promising alternative is to infer the overlapping communities from multiple disjoint community structures. We propose an ensemble-based approach, called EnCoD, that leverages the solutions produced by various disjoint community detection algorithms to discover the overlapping community structure. Specifically, EnCoD generates a feature vector for each vertex from the results of the base algorithms and learns which features lead to detect densely connected overlapping regions in an unsupervised way. It keeps on iterating until the likelihood of each vertex belonging to its own community maximizes. Experiments on both synthetic and several real-world networks (with known ground-truth community structures) reveal that EnCoD significantly outperforms nine state-of-the-art overlapping community detection algorithms. Finally, we show that EnCoD is generic enough to be applied to networks where the vertices are associated with explicit semantic features. To the best of our knowledge, EnCoD is the second ensemble-based overlapping community detection approach after MEDOC [1].

• [cs.SI]Multi-dimensional Graph Convolutional Networks
Yao Ma, Suhang Wang, Charu C. Aggarwal, Dawei Yin, Jiliang Tang
http://arxiv.org/abs/1808.06099v1

Convolutional neural networks (CNNs) leverage the great power in representation learning on regular grid data such as image and video. Recently, increasing attention has been paid on generalizing CNNs to graph or network data which is highly irregular. Some focus on graph-level representation learning while others aim to learn node-level representations. These methods have been shown to boost the performance of many graph-level tasks such as graph classification and node-level tasks such as node classification. Most of these methods have been designed for single-dimensional graphs where a pair of nodes can only be connected by one type of relation. However, many real-world graphs have multiple types of relations and they can be naturally modeled as multi-dimensional graphs with each type of relation as a dimension. Multi-dimensional graphs bring about richer interactions between dimensions, which poses tremendous challenges to the graph convolutional neural networks designed for single-dimensional graphs. In this paper, we study the problem of graph convolutional networks for multi-dimensional graphs and propose a multi-dimensional convolutional neural network model mGCN aiming to capture rich information in learning node-level representations for multi-dimensional graphs. Comprehensive experiments on real-world multi-dimensional graphs demonstrate the effectiveness of the proposed framework.

• [cs.SI]Signed Graph Convolutional Network
Tyler Derr, Yao Ma, Jiliang Tang
http://arxiv.org/abs/1808.06354v1

Due to the fact much of today's data can be represented as graphs, there has been a demand for generalizing neural network models for graph data. One recent direction that has shown fruitful results, and therefore growing interest, is the usage of graph convolutional neural networks (GCNs). They have been shown to provide a significant improvement on a wide range of tasks in network analysis, one of which being node representation learning. The task of learning low-dimensional node representations has shown to increase performance on a plethora of other tasks from link prediction and node classification, to community detection and visualization. Simultaneously, signed networks (or graphs having both positive and negative links) have become ubiquitous with the growing popularity of social media. However, since previous GCN models have primarily focused on unsigned networks (or graphs consisting of only positive links), it is unclear how they could be applied to signed networks due to the challenges presented by negative links. The primary challenges are based on negative links having not only a different semantic meaning as compared to positive links, but their principles are inherently different and they form complex relations with positive links. Therefore we propose a dedicated and principled effort that utilizes balance theory to correctly aggregate and propagate the information across layers of a signed GCN model. We perform empirical experiments comparing our proposed signed GCN against state-of-the-art baselines for learning node representations in signed networks. More specifically, our experiments are performed on four real-world datasets for the classical link sign prediction problem that is commonly used as the benchmark for signed network embeddings algorithms.

• [cs.SY]Optimized Path Planning for Inspection by Unmanned Aerial Vehicles Swarm with Energy Constraints
Momena Monwar, Omid Semiari, Walid Saad
http://arxiv.org/abs/1808.06018v1

Autonomous inspection of large geographical areas is a central requirement for efficient hazard detection and disaster management in future cyber-physical systems such as smart cities. In this regard, exploiting unmanned aerial vehicle (UAV) swarms is a promising solution to inspect vast areas efficiently and with low cost. In fact, UAVs can easily fly and reach inspection points, record surveillance data, and send this information to a wireless base station (BS). Nonetheless, in many cases, such as operations at remote areas, the UAVs cannot be guided directly by the BS in real-time to find their path. Moreover, another key challenge of inspection by UAVs is the limited battery capacity. Thus, realizing the vision of autonomous inspection via UAVs requires energy-efficient path planning that takes into account the energy constraint of each individual UAV. In this paper, a novel path planning algorithm is proposed for performing energy-efficient inspection, under stringent energy availability constraints for each UAV. The developed framework takes into account all aspects of energy consumption for a UAV swarm during the inspection operations, including energy required for flying, hovering, and data transmission. It is shown that the proposed algorithm can address the path planning problem efficiently in polynomial time. Simulation results show that the proposed algorithm can yield substantial performance gains in terms of minimizing the overall inspection time and energy. Moreover, the results provide guidelines to determine parameters such as the number of required UAVs and amount of energy, while designing an autonomous inspection system.

• [eess.AS]Multimodal speech synthesis architecture for unsupervised speaker adaptation
Hieu-Thi Luong, Junichi Yamagishi
http://arxiv.org/abs/1808.06288v1

This paper proposes a new architecture for speaker adaptation of multi-speaker neural-network speech synthesis systems, in which an unseen speaker's voice can be built using a relatively small amount of speech data without transcriptions. This is sometimes called "unsupervised speaker adaptation". More specifically, we concatenate the layers to the audio inputs when performing unsupervised speaker adaptation while we concatenate them to the text inputs when synthesizing speech from text. Two new training schemes for the new architecture are also proposed in this paper. These training schemes are not limited to speech synthesis, other applications are suggested. Experimental results show that the proposed model not only enables adaptation to unseen speakers using untranscribed speech but it also improves the performance of multi-speaker modeling and speaker adaptation using transcribed audio files.

• [eess.SP]On Geometric Analysis of Affine Sparse Subspace Clustering
Chun-Guang Li, Chong You, René Vidal
http://arxiv.org/abs/1808.05965v1

Sparse subspace clustering (SSC) is a state-of-the-art method for segmenting a set of data points drawn from a union of subspaces into their respective subspaces. It is now well understood that SSC produces subspace-preserving data affinity under broad geometric conditions but suffers from a connectivity issue. In this paper, we develop a novel geometric analysis for a variant of SSC, named affine SSC (ASSC), for the problem of clustering data from a union of affine subspaces. Our contributions include a new concept called affine independence for capturing the arrangement of a collection of affine subspaces. Under the affine independence assumption, we show that ASSC is guaranteed to produce subspace-preserving affinity. Moreover, inspired by the phenomenon that the \ell_1 regularization no longer induces sparsity when the solution is nonnegative, we further show that subspace-preserving recovery can be achieved under much weaker conditions for all data points other than the extreme points of samples from each subspace. In addition, we confirm a curious observation that the affinity produced by ASSC may be subspace-dense---which could guarantee the subspace-preserving affinity of ASSC to produce correct clustering under rather weak conditions. We validate the theoretical findings on carefully designed synthetic data and evaluate the performance of ASSC on several real data sets.

• [math.CO]Reed-Solomon codes over small fields with constrained generator matrices
Gary Greaves, Jeven Syatriadi
http://arxiv.org/abs/1808.06306v1

We give constructive proofs of the existence of [n,k] Reed-Solomon codes over finite fields of size at least n and n+1 whose generator matrices have constrained support. Furthermore, we consider a generalisation of the GM-MDS conjecture proposed by Lovett in 2018. We show that Lovett's conjecture is false in general and we specify when the conjecture is true.

• [math.OC]Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions
Zaiyi Chen, Tianbao Yang, Jinfeng Yi, Bowen Zhou, Enhong Chen
http://arxiv.org/abs/1808.06296v1

Although stochastic gradient descent (\sgd) method and its variants (e.g., stochastic momentum methods, \ada) are the choice of algorithms for solving non-convex problems (especially deep learning), there still remain big gaps between the theory and the practice with many questions unresolved. For example, there is still a lack of theories of convergence for {\sgd} that uses stagewise step size and returns an averaged solution. In addition, theoretical insights of why adaptive step size of {\ada} could improve non-adaptive step size of {\sgd} is still missing for non-convex optimization. This paper aims to address these questions and fill the gap between theory and practice. We propose a universal stagewise optimization framework for a broad family of non-smooth non-convex problems with the following key features: (i) each stage calls a basic algorithm (e.g., {\sgd} or \ada) for a regularized convex problem that returns an averaged solution; (ii) the step size is decreased in a stagewise manner; (iii) an averaged solution is returned as the final solution that is selected from all stagewise averaged solutions with sampling probabilities {\it increasing} as the stage number. Our theoretical results of stagewise {\ada} exhibit its adaptive convergence, therefore shed insights on its faster convergence for problems with sparse stochastic gradients than stagewise \sgd. To the best of our knowledge, these new results are the first of their kind for addressing the unresolved issues of existing theories mentioned earlier.

• [math.PR]A General Framework of Multi-Armed Bandit Processes by Switching Restrictions
Wenqing Bao, Xiaoqiang Cai, Xianyi Wu
http://arxiv.org/abs/1808.06314v1

This paper proposes a general framework of multi-armed bandit (MAB) processes by introducing a type of restrictions on the switches among arms to the arms evolving in continuous time. The Gittins index process is developed for any single arm subject to the restrictions on stopping times and then the optimality of the corresponding Gittins index rule is established. The Gittins indices defined in this paper are consistent with the ones for MAB processes in continuous time, discrete time, and semi-Markovian setting so that the new theory covers the classical models as special cases and also applies to many other situations that have not yet been touched in the literature. While the proof of the optimality of Gittins index policies benefits from ideas in the existing theory of MAB processes in continuous time, new techniques are introduced which drastically simplifies the proof.

• [math.ST]Generalized Bregman and Jensen divergences which include some f-divergences
Tomohiro Nishiyama
http://arxiv.org/abs/1808.06148v1

In this paper, we introduce new classes of divergences by extending the definitions of the Bregman divergence and the skew Jensen divergence. These new divergence classes (g-Bregman divergence and skew g-Jensen divergence) satisfy some properties similar to the Bregman or skew Jensen divergence. We show these g-divergences include divergences which belong to a class of f-divergence (the Hellinger distance, the chi-square divergence and the alpha-divergence in addition to the Kullback-Leibler divergence). Moreover, we derive an inequality between the skew g-Jensen divergence and the g-Bregman divergence and show this inequality is a generalization of Lin's inequality.

• [math.ST]On the error in Laplace approximations of high-dimensional integrals
Helen Ogden
http://arxiv.org/abs/1808.06341v1

Laplace approximations are commonly used to approximate high-dimensional integrals in statistical applications, but the quality of such approximations as the dimension of the integral grows is not well understood. In this paper, we prove a new result on the size of the error in first- and higher-order Laplace approximations, and apply this result to investigate the quality of Laplace approximations to the likelihood in some generalized linear mixed models.

• [math.ST]Optimal proposals for Approximate Bayesian Computation
Justin Alsing, Benjamin D. Wandelt, Stephen M. Feeney
http://arxiv.org/abs/1808.06040v1

We derive the optimal proposal density for Approximate Bayesian Computation (ABC) using Sequential Monte Carlo (SMC) (or Population Monte Carlo, PMC). The criterion for optimality is that the SMC/PMC-ABC sampler maximise the effective number of samples per parameter proposal. The optimal proposal density represents the optimal trade-off between favoring high acceptance rate and reducing the variance of the importance weights of accepted samples. We discuss two convenient approximations of this proposal and show that the optimal proposal density gives a significant boost in the expected sampling efficiency compared to standard kernels that are in common use in the ABC literature, especially as the number of parameters increases.

• [math.ST]The Mismatch Principle: Statistical Learning Under Large Model Uncertainties
Martin Genzel, Gitta Kutyniok
http://arxiv.org/abs/1808.06329v1

We study the learning capacity of empirical risk minimization with regard to the squared loss and a convex hypothesis class consisting of linear functions. While these types of estimators were originally designed for noisy linear regression problems, it recently turned out that they are in fact capable of handling considerably more complicated situations, involving highly non-linear distortions. This work intends to provide a comprehensive explanation of this somewhat astonishing phenomenon. At the heart of our analysis stands the mismatch principle, which is a simple, yet generic recipe to establish theoretical error bounds for empirical risk minimization. The scope of our results is fairly general, permitting arbitrary sub-Gaussian input-output pairs, possibly with strongly correlated feature variables. Noteworthy, the mismatch principle also generalizes to a certain extent the classical orthogonality principle for ordinary least squares. This adaption allows us to investigate problem setups of recent interest, most importantly, high-dimensional parameter regimes and non-linear observation processes. In particular, our theoretical framework is applied to various scenarios of practical relevance, such as single-index models, variable selection, and strongly correlated designs. We thereby demonstrate the key purpose of the mismatch principle, that is, learning (semi-)parametric output rules under large model uncertainties and misspecifications.

• [nucl-th]Revisiting the proton-radius problem using constrained Gaussian processes
Shuang Zhou, P. Giuliani, J. Piekarewicz, Anirban Bhattacharya, Debdeep Pati
http://arxiv.org/abs/1808.05977v1

Background: The "proton radius puzzle" refers to an eight-year old problem that highlights major inconsistencies in the extraction of the charge radius of the proton from muonic Lamb-shift experiments as compared against experiments using elastic electron scattering. For the latter, the determination of the charge radius involves an extrapolation of the experimental form factor to zero momentum transfer. Purpose: To estimate the proton radius by introducing a novel non-parametric approach to model the electric form factor of the proton. Methods: Within a Bayesian paradigm, we develop a model flexible enough to fit the data without any parametric assumptions on the form factor. The Bayesian estimation is guided by imposing only two physical constraints on the form factor: (a) its value at zero momentum transfer (normalization) and (b) its overall shape, assumed to be a monotonically decreasing function of the momentum transfer. Variants of these assumptions are explored to assess the impact of these constraints. Results: So far our results are inconclusive in regard to the proton puzzle, as they depend on both, the assumed constrains and the range of experimental data used. For example, if only low momentum-transfer data is used, adopting only the normalization constraint provides a value compatible with the smaller muonic result, while imposing only the shape constraint favors the larger electronic value. Conclusions: We have presented a novel technique to estimate the proton radius from electron scattering data based on a non-parametric Gaussian process. We have shown the impact of the physical constraints imposed on the form factor and of the range of experimental data used. In this regard, we are hopeful that as this technique is refined and with the anticipated new results from the PRad experiment, we will get closer to resolve of the puzzle.

• [physics.med-ph]Translational Motion Compensation for Soft Tissue Velocity Images
Christina Koutsoumpa, Jennifer Keegan, David Firmin, Guang-Zhong Yang, Duncan Gillies
http://arxiv.org/abs/1808.06469v1

Purpose: Advancements in MRI Tissue Phase Velocity Mapping (TPM) allow for the acquisition of higher quality velocity cardiac images providing better assessment of regional myocardial deformation for accurate disease diagnosis, pre-operative planning and post-operative patient surveillance. Translation of TPM velocities from the scanner's reference coordinate system to the regional cardiac coordinate system requires decoupling of translational motion and motion due to myocardial deformation. Despite existing techniques for respiratory motion compensation in TPM, there is still a remaining translational velocity component due to the global motion of the beating heart. To compensate for translational motion in cardiac TPM, we propose an image-processing method, which we have evaluated on synthetic data and applied on in vivo TPM data. Methods: Translational motion is estimated from a suitable region of velocities automatically defined in the left-ventricular volume. The region is generated by dilating the medial axis of myocardial masks in each slice and the translational velocity is estimated by integration in this region. The method was evaluated on synthetic data and in vivo data corrupted with a translational velocity component (200% of the maximum measured velocity). Accuracy and robustness were examined and the method was applied on 10 in vivo datasets. Results: The results from synthetic and in vivo corrupted data show excellent performance with an estimation error less than 0.3% and high robustness in both cases. The effectiveness of the method is confirmed with visual observation of results from the 10 datasets. Conclusion: The proposed method is accurate and suitable for translational motion correction of the left ventricular velocity fields. The current method for translational motion compensation could be applied to any annular contracting (tissue) structure.

• [q-bio.QM]Peptide-Spectra Matching from Weak Supervision
Samuel S. Schoenholz, Sean Hackett, Laura Deming, Eugene Melamud, Navdeep Jaitly, Fiona McAllister, Jonathon O'Brien, George Dahl, Bryson Bennett, Andrew M. Dai, Daphne Kohler
http://arxiv.org/abs/1808.06576v1

As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain. Instead, we have access to imperfect hand-coded models crafted by domain experts. In this paper, we apply deep neural networks to an important step of the protein identification problem, the pairing of mass spectra with short sequences of amino acids called peptides. We train our model to differentiate between top scoring results from a state-of-the art classical system and hard-negative second and third place results. Our resulting model is much better at identifying peptides with spectra than the model used to generate its training data. In particular, we achieve a 43% improvement over standard matching methods and a 10% improvement over a combination of the matching method and an industry standard cross-spectra reranking tool. Importantly, in a more difficult experimental regime that reflects current challenges facing biologists, our advantage over the previous state-of-the-art grows to 15% even after reranking. We believe this approach will generalize to other challenging scientific problems.

• [stat.AP]Alzheimer's Disease Modelling and Staging through Independent Gaussian Process Analysis of Spatio-Temporal Brain Changes
Clement Abi Nader, Nicholas Ayache, Philippe Robert, Marco Lorenzi, for the Alzheimer's Disease Neuroimaging Initiative
http://arxiv.org/abs/1808.06367v1

Alzheimer's disease (AD) is characterized by complex and largely unknown progression dynamics affecting the brain's morphology. Although the disease evolution spans decades, to date we cannot rely on long-term data to model the pathological progression, since most of the available measures are on a short-term scale. It is therefore difficult to understand and quantify the temporal progression patterns affecting the brain regions across the AD evolution. In this work, we tackle this problem by presenting a generative model based on probabilistic matrix factorization across temporal and spatial sources. The proposed method addresses the problem of disease progression modelling by introducing clinically-inspired statistical priors. To promote smoothness in time and model plausible pathological evolutions, the temporal sources are defined as monotonic and independent Gaussian Processes. We also estimate an individual time-shift parameter for each patient to automatically position him/her along the sources time-axis. To encode the spatial continuity of the brain sub-structures, the spatial sources are modeled as Gaussian random fields. We test our algorithm on grey matter maps extracted from brain structural images. The experiments highlight differential temporal progression patterns mapping brain regions key to the AD pathology, and reveal a disease-specific time scale associated with the decline of volumetric biomarkers across clinical stages.

• [stat.AP]An Assessment of Covariates of Nonstationary Storm Surge Statistical Behavior by Bayesian Model Averaging
Tony E. Wong
http://arxiv.org/abs/1808.06440v1

Projections of storm surge return levels are a basic requirement for effective management of coastal risks. A common approach to estimate hazards posed by extreme sea levels is to use a statistical model, which may use a time series of a climate variable as a covariate to modulate the statistical model and account for potentially nonstationary storm surge behavior. Previous work using nonstationary statistical approaches, however, has demonstrated the importance of accounting for the many inherent modeling uncertainties. Additionally, previous assessments of coastal flood hazard using statistical modeling have typically relied on a single climate covariate, which likely leaves out important processes and leads to potential biases. Here, I employ upon a recently developed approach to integrate stationary and nonstationary statistical models, and examine the effects of choice of covariate time series on projected flood hazard. Furthermore, I expand upon this approach by developing a nonstationary storm surge statistical model that makes use of multiple covariate time series: global mean temperature, sea level, North Atlantic Oscillation index and time. I show that a storm surge model that accounts for additional processes raises the projected 100-year storm surge return level by up to about 7 cm relative to a stationary model or one that employs a single covariate time series. I find that global mean sea level is the covariate with the highest model marginal likelihood (47%), time has the lowest (0.2%), and a stationary model has ~12%. These results shed light on how best to account for potential nonstationary coastal surge behavior, and incorporate more processes into surge projections. By including a wider range of physical process information and considering nonstationary behavior, these methods will better enable modeling efforts to inform coastal risk management.

• [stat.AP]Analyzing within Garage Fuel Economy Gaps to Support Vehicle Purchasing Decisions - A Copula-Based Modeling & Forecasting Approach
Behram Wali, David Greene, Asad Khattak, Jun Liu
http://arxiv.org/abs/1808.05945v1

A key purpose of the U.S. government fuel economy ratings is to provide precise and unbiased fuel economy estimates to assist consumers in their vehicle purchase decisions. For the official fuel economy ratings to be useful, the numbers must be relatively reliable. This study focuses on quantifying the variations of on-road fuel economy relative to official government ratings (fuel economy gap) and seeks proper characterizations for the degree of stochastic dependence between the fuel economy gaps of pairs of vehicles.By using unique data reported by customers of the U.S. government website www.fueleconomy.gov, the study presents an innovative copula-based joint-modeling and forecasting framework for exploring the complex stochastic dependencies (both nonlinear and non-normal) between the fuel economy gaps of vehicles reported by the same person. While the EPA label estimates are similar to the average numbers reported by website customers, significant, non-linear variation exists in the fuel economy gaps for the two vehicles across the sample. In particular, a positive dependence, characterized by Student-t copula, is observed between the fuel economy gaps of the two vehicles with significant dependencies in the tails of the bivariate distribution; a pair in which one vehicle achieves better (worse) fuel economy is likely to contain a second vehicle getting better (worse) fuel economy as well. However, the results also suggest that the strength of overall association is weak (Kendall Tau = 0.28). This implies a lack of compelling agreement between fuel economy gaps which could weaken consumers confidence in making relative comparisons among vehicles.

• [stat.AP]Bayesian Hidden Markov Tree Models for Clustering Genes with Shared Evolutionary History
Yang Li, Shaoyang Ning, Sarah E. Calvo, Vamsi K. Mootha, Jun S. Liu
http://arxiv.org/abs/1808.06109v1

Determination of functions for poorly characterized genes is crucial for understanding biological processes and studying human diseases. Functionally associated genes are often gained and lost together through evolution. Therefore identifying co-evolution of genes can predict functional gene-gene associations. We describe here the full statistical model and computational strategies underlying the original algorithm, CLustering by Inferred Models of Evolution (CLIME 1.0) recently reported by us [Li et al., 2014]. CLIME 1.0 employs a mixture of tree-structured hidden Markov models for gene evolution process, and a Bayesian model-based clustering algorithm to detect gene modules with shared evolutionary histories (termed evolutionary conserved modules, or ECMs). A Dirichlet process prior was adopted for estimating the number of gene clusters and a Gibbs sampler was developed for posterior sampling. We further developed an extended version, CLIME 1.1, to incorporate the uncertainty on the evolutionary tree structure. By simulation studies and benchmarks on real data sets, we show that CLIME 1.0 and CLIME 1.1 outperform traditional methods that use simple metrics (e.g., the Hamming distance or Pearson correlation) to measure co-evolution between pairs of genes.

• [stat.AP]Spatio-temproal prediction of crimes using network analytic approach
Saroj Kumar Dash, Ilya Safro, Ravisutha Sakrepatna Srinivasamurthy
http://arxiv.org/abs/1808.06241v1

It is quite evident that majority of the population lives in urban area today than in any time of the human history. This trend seems to increase in coming years. A study [5] says that nearly 80.7% of total population in USA stays in urban area. By 2030 nearly 60% of the population in the world will live in or move to cities. With the increase in urban population, it is important to keep an eye on criminal activities. By doing so, governments can enforce intelligent policing systems and hence many government agencies and local authorities have made the crime data publicly available. In this paper, we analyze Chicago city crime data fused with other social information sources using network analytic techniques to predict criminal activity for the next year. We observe that as we add more layers of data which represent different aspects of the society, the quality of prediction is improved. Our prediction models not just predict total number of crimes for the whole Chicago city, rather they predict number of crimes for all types of crimes and for different regions in City of Chicago.

• [stat.ME]A Stepwise Approach for High-Dimensional Gaussian Graphical Models
Ginette Lafit, Francisco J. Nogales, Marcelo Ruiz, Ruben H. Zamar
http://arxiv.org/abs/1808.06016v1

We present a stepwise approach to estimate high dimensional Gaussian graphical models. We exploit the relation between the partial correlation coefficients and the distribution of the prediction errors, and parametrize the model in terms of the Pearson correlation coefficients between the prediction errors of the nodes' best linear predictors. We propose a novel stepwise algorithm for detecting pairs of conditionally dependent variables. We show that the proposed algorithm outperforms existing methods such as the graphical lasso and CLIME in simulation studies and real life applications. In our comparison we report different performance measures that look at different desirable features of the recovered graph and consider several model settings.

• [stat.ME]A Structural-Factor Approach to Modeling High-Dimensional Time Series
Zhaoxing Gao, Ruey S Tsay
http://arxiv.org/abs/1808.06518v1

This paper considers a structural-factor approach to modeling high-dimensional time series where individual series are decomposed into trend, seasonal, and irregular components. For ease in analyzing many time series, we employ a time polynomial for the trend, a linear combination of trigonometric series for the seasonal component, and a new factor model for the irregular components. The new factor model can simplify the modeling process and achieve parsimony in parameterization. We propose a Bayesian Information Criterion (BIC) to consistently determine the order of the polynomial trend and the number of trigonometric functions. A test statistic is used to determine the number of common factors. The convergence rates for the estimators of the trend and seasonal components and the limiting distribution of the test statistic are established under the setting that the number of time series tends to infinity with the sample size, but at a slower rate. We use simulation to study the performance of the proposed analysis in finite samples and apply the proposed approach to two real examples. The first example considers modeling weekly PM_{2.5} data of 15 monitoring stations in the southern region of Taiwan and the second example consists of monthly value-weighted returns of 12 industrial portfolios.

• [stat.ME]A general approach to detect gene (G)-environment (E) additive interaction leveraging G-E independence in case-control studies
Eric J. Tchetgen Tchetgen, Xu Shi, Tamar Sofer, Benedict H. W. Wong
http://arxiv.org/abs/1808.06038v1

It is increasingly of interest in statistical genetics to test for the presence of a mechanistic interaction between genetic (G) and environmental (E) risk factors by testing for the presence of an additive GxE interaction. In case-control studies involving a rare disease, a statistical test of no additive interaction typically entails a test of no relative excess risk due to interaction (RERI). It is also well known that a test of multiplicative interaction exploiting G-E independence can be dramatically more powerful than standard logistic regression for case-control data. Likewise, it has recently been shown that a likelihood ratio test of a null RERI incorporating the G-E independence assumption (RERI-LRT) outperforms the standard RERI approach. In this paper, the authors describe a general, yet relatively straightforward approach to test for GxE additive interaction exploiting G-E independence. The approach which relies on regression models for G and E is particularly attractive because, unlike the RERI-LRT, it allows the regression model for the binary outcome to remain unrestricted. Therefore, the new methodology is completely robust to possible mis-specification in the outcome regression. This is particularly important for settings not easily handled by RERI-LRT, such as when E is a count or a continuous exposure with multiple components, or when there are several auxiliary covariates in the regression model. While the proposed approach avoids fitting an outcome regression, it nonetheless still allows for straightforward covariate adjustment. The methods are illustrated through an extensive simulation study and an ovarian cancer empirical application.

• [stat.ME]Analysis of "Learn-As-You-Go" (LAGO) Studies
Daniel Nevo, Judith J. Lok, Donna Spiegelman
http://arxiv.org/abs/1808.06310v1

In learn-as-you-go (LAGO) adaptive designs, the intervention is a package consisting of multiple components, and is adapted in stages during the study based on past outcomes. This formalizes standard practice, and desires for practice, in public health intervention studies. Typically, an effective intervention package is sought, while minimizing cost. The main complication when analyzing data from a learn-as-you-go design is that interventions in later stages depend upon the outcomes in the previous stages. Therefore, conditioning on the interventions would lead to effectively conditioning on the earlier stages' outcomes, which violates common statistical principles. We develop a method to estimate intervention effects from a learn-as-you-go study. We prove consistency and asymptotic normality using a novel coupling argument, and ensure the validity of the test for the hypothesis of no overall intervention effect. We further develop a confidence set for the optimal intervention package and confidence bands for the success probabilities under different package compositions. We illustrate our methods by applying them to the BetterBirth Study, which aimed to improve maternal and neonatal outcomes in India.

• [stat.ME]Bayesian Regression for a Dirichlet Distributed Response using Stan
Holger Sennhenn-Reulen
http://arxiv.org/abs/1808.06399v1

For an observed response that is composed by a set - or vector - of positive values that sum up to 1, the Dirichlet distribution (Bol'shev, 2018) is a helpful mathematical construction for the quantification of the data-generating mechanics underlying this process. In applications, these response-sets are usually denoted as proportions, or compositions of proportions, and by means of covariates, one wishes to manifest the underlying signal - by changes in the value of these covariates - leading to differently distributed response compositions. This article gives a brief introduction into this class of regression models, and based on a recently developed formulation (Maier, 2014), illustrates the implementation in the Bayesian inference framework Stan.

• [stat.ME]On Design of Problem Token Questions in Quality of Experience Surveys
Jayant Gupchup, Ebrahim Beyrami, Martin Ellis, Yasaman Hosseinkashi, Sam Johnson, Ross Cutler
http://arxiv.org/abs/1808.06152v1

User surveys for Quality of Experience (QoE) are a critical source of information. In addition to the common "star rating" used to estimate Mean Opinion Score (MOS), more detailed survey questions (problem tokens) about specific areas provide valuable insight into the factors impacting QoE. This paper explores two aspects of the problem token questionnaire design. First, we study the bias introduced by fixed question order, and second, we study the challenge of selecting a subset of questions to keep the token set small. Based on 900,000 calls gathered using a randomized controlled experiment from a live system, we find that the order bias can be significantly reduced by randomizing the display order of tokens. The difference in response rate varies based on token position and display design. It is worth noting that the users respond to the randomized-order variant at levels that are comparable to the fixed-order variant. The effective selection of a subset of token questions is achieved by extracting tokens that provide the highest information gain over user ratings. This selection is known to be in the class of NP-hard problems. We apply a well-known greedy submodular maximization method on our dataset to capture 94% of the information using just 30% of the questions.

• [stat.ME]Semiparametric estimation of structural failure time model in continuous-time processes
Shu Yang, Karen Pieper, Frank Cools
http://arxiv.org/abs/1808.06408v1

Structural failure time models are causal models for estimating the effect of time-varying treatments on a survival outcome. G-estimation and artificial censoring have been proposed to estimate the model parameters in the presence of time-dependent confounding and administrative censoring. However, most of existing methods require preprocessing data into regularly spaced data such as monthly data, and the computation and inference are challenging due to the non-smoothness of artificial censoring. We propose a class of continuous-time structural failure time models and semiparametric estimators, which do not restrict to regularly spaced data. We show that our estimators are doubly robust, in the sense that the estimators are consistent if either the model for the treatment process is correctly specified or the failure time model is correctly specified, but not necessarily both. Moreover, we propose using inverse probability of censoring weighting to deal with dependent censoring. In contrast to artificial censoring, our weighting strategy does not introduce non-smoothness in estimation and ensures that the resampling methods can be used to make inference.

• [stat.ME]Spillover Effects in Cluster Randomized Trials with Noncompliance
Hyunseung Kang, Luke Keele
http://arxiv.org/abs/1808.06418v1

Clustered randomized trials (CRTs) are popular in the social sciences to evaluate the efficacy of a new policy or program by randomly assigning one set of clusters to the new policy and the other set to the usual policy. Often, many individuals within a cluster fail to take advantage of the new policy, resulting in noncompliance behaviors. Also, individuals within a cluster may influence each other through treatment spillovers where those who comply with the new policy may affect the outcomes of those who do not. Here, we study the identification of causal effects in CRTs when both noncompliance and treatment spillovers are present. We first show that the standard analysis of CRT data with noncompliance using instrumental variables does not identify the usual complier average causal effect under treatment spillovers. We extend this result and show that no analysis of CRT data can unbiasedly estimate local network causal effects. Finally, we develop bounds for these network causal effects that only require standard instrumental variables analysis. We demonstrate these results with an empirical study of a new deworming intervention in Kenya. We find that given high levels of compliance, we can place informative bounds on the total effect among compliers and that the deworming intervention reduced infections among those complied with their treatment assignment.

• [stat.ME]The empirical likelihood prior applied to bias reduction of general estimating equations
Albert Vexler, Li Zou, Alan D. Hutson
http://arxiv.org/abs/1808.06222v1

The practice of employing empirical likelihood (EL) components in place of parametric likelihood functions in the construction of Bayesian-type procedures has been well-addressed in the modern statistical literature. We rigorously derive the EL prior, a Jeffreys-type prior, which asymptotically maximizes the Shannon mutual information between data and the parameters of interest. The focus of our approach is on an integrated Kullback-Leibler distance between the EL-based posterior and prior density functions. The EL prior density is the density function for which the corresponding posterior form is asymptotically negligibly different from the EL. We show that the proposed result can be used to develop a methodology for reducing the asymptotic bias of solutions of general estimating equations and M-estimation schemes by removing the first-order term. This technique is developed in a similar manner to methods employed to reduce the asymptotic bias of maximum likelihood estimates via penalizing the underlying parametric likelihoods by their Jeffreys invariant priors. A real data example related to a study of myocardial infarction illustrates the attractiveness of the proposed technique in practical aspects. Keywords: Asymptotic bias, Biased estimating equations, Empirical likelihood, Expected Kullback-Leibler distance, Penalized likelihood, Reference prior.

• [stat.ML]Applying Machine Learning To Maize Traits Prediction
Binbin Shi, Xupeng Chen
http://arxiv.org/abs/1808.06275v1

Heterosis is the improved or increased function of any biological quality in a hybrid offspring. We have studied yet the largest maize SNP dataset for traits prediction. We develop linear and non-linear models which consider relationships between different hybrids as well as other effect. Specially designed model proved to be efficient and robust in prediction maize's traits.

• [stat.ML]Causal Discovery by Telling Apart Parents and Children
Alexander Marx, Jilles Vreeken
http://arxiv.org/abs/1808.06356v1

We consider the problem of inferring the directed, causal graph from observational data, assuming no hidden confounders. We take an information theoretic approach, and make three main contributions. First, we show how through algorithmic information theory we can obtain SCI, a highly robust, effective and computationally efficient test for conditional independence---and show it outperforms the state of the art when applied in constraint-based inference methods such as stable PC. Second, building upon on SCI, we show how to tell apart the parents and children of a given node based on the algorithmic Markov condition. We give the Climb algorithm to efficiently discover the directed, causal Markov blanket---and show it is at least as accurate as inferring the global network, while being much more efficient. Last, but not least, we detail how we can use the Climb score to direct those edges that state of the art causal discovery algorithms based on PC or GES leave undirected---and show this improves their precision, recall and F1 scores by up to 20%.

• [stat.ML]Multi-View Graph Embedding Using Randomized Shortest Paths
Anuththari Gamage, Brian Rappaport, Shuchin Aeron, Xiaozhe Hu
http://arxiv.org/abs/1808.06560v1

Real-world data sets often provide multiple types of information about the same set of entities. This data is well represented by multi-view graphs, which consist of several distinct sets of edges over the same nodes. These can be used to analyze how entities interact from different viewpoints. Combining multiple views improves the quality of inferences drawn from the underlying data, which has increased interest in developing efficient multi-view graph embedding methods. We propose an algorithm, C-RSP, that generates a common (C) embedding of a multi-view graph using Randomized Shortest Paths (RSP). This algorithm generates a dissimilarity measure between nodes by minimizing the expected cost of a random walk between any two nodes across all views of a multi-view graph, in doing so encoding both the local and global structure of the graph. We test C-RSP on both real and synthetic data and show that it outperforms benchmark algorithms at embedding and clustering tasks while remaining computationally efficient.

上一篇下一篇

猜你喜欢

热点阅读