今日学术视野(2017.07.15)

2017-07-15 本文已影响284人 ZQtGe6

cond-mat.stat-mech - 统计数学
cs.AI - 人工智能
cs.CL - 计算与语言
cs.CR - 加密与安全
cs.CV - 机器视觉与模式识别
cs.CY - 计算与社会
cs.DC - 分布式、并行与集群计算
cs.HC - 人机接口
cs.IR - 信息检索
cs.IT - 信息论
cs.LG - 自动学习
cs.NE - 神经与进化计算
cs.NI - 网络和互联网体系结构
cs.PL - 编程语言
cs.SI - 社交网络与信息网络
math.ST - 统计理论
physics.soc-ph - 物理学与社会
q-bio.QM - 定量方法
quant-ph - 量子物理
stat.AP - 应用统计
stat.CO - 统计计算
stat.ME - 统计方法论
stat.ML - (统计)机器学习

• [cond-mat.stat-mech]Inferring the parameters of a Markov process from snapshots of the steady state
• [cond-mat.stat-mech]Prediction and Power in Molecular Sensors: Uncertainty and Dissipation When Conditionally Markovian Channels Are Driven by Semi-Markov Environments
• [cs.AI]A Brief Study of In-Domain Transfer and Learning from Fewer Samples using A Few Simple Priors
• [cs.AI]A Formal Framework to Characterize Interpretability of Procedures
• [cs.AI]Armstrong's Axioms and Navigation Strategies
• [cs.AI]Autoencoder-augmented Neuroevolution for Visual Doom Playing
• [cs.AI]Automatic Mapping of NES Games with Mappy
• [cs.AI]Clingo goes Linear Constraints over Reals and Integers
• [cs.AI]Constraints, Lazy Constraints, or Propagators in ASP Solving: An Empirical Analysis
• [cs.AI]Dependency Injection for Programming by Optimization
• [cs.AI]Identification and Interpretation of Belief Structure in Dempster-Shafer Theory
• [cs.AI]Independence, Conditionality and Structure of Dempster-Shafer Belief Functions
• [cs.AI]Learning like humans with Deep Symbolic Networks
• [cs.AI]Lithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media
• [cs.AI]Mechanics Automatically Recognized via Interactive Observation: Jumping
• [cs.CL]A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?
• [cs.CL]A Web-Based Tool for Analysing Normative Documents in English
• [cs.CL]Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies
• [cs.CL]Do Convolutional Networks need to be Deep for Text Classification ?
• [cs.CL]Is writing style predictive of scientific fraud?
• [cs.CL]Learning Features from Co-occurrences: A Theoretical Analysis
• [cs.CL]Look Who's Talking: Bipartite Networks as Representations of a Topic Model of New Zealand Parliamentary Speeches
• [cs.CL]Negative Sampling Improves Hypernymy Extraction Based on Projection Learning
• [cs.CL]Parsing with Traces: An $O(n^4)$ Algorithm and a Structural Representation
• [cs.CL]Predicting Causes of Reformulation in Intelligent Assistants
• [cs.CL]Quasar: Datasets for Question Answering by Search and Reading
• [cs.CL]Representation Learning for Grounded Spatial Reasoning
• [cs.CR]Process Monitoring on Sequences of System Call Count Vectors
• [cs.CV]Automatic Recognition of Deceptive Facial Expressions of Emotion
• [cs.CV]Deep Learning with Topological Signatures
• [cs.CV]Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search
• [cs.CV]Disentangling Motion, Foreground and Background Features in Videos
• [cs.CV]Large-scale Video Classification guided by Batch Normalized LSTM Translator
• [cs.CV]Learning Photography Aesthetics with Deep CNNs
• [cs.CV]Leveraging the Path Signature for Skeleton-based Human Action Recognition
• [cs.CV]Merge or Not? Learning to Group Faces via Imitation Learning
• [cs.CV]Query-Aware Sparse Coding for Multi-Video Summarization
• [cs.CV]SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes
• [cs.CV]The Surfacing of Multiview 3D Drawings via Lofting and Occlusion Reasoning
• [cs.CV]Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks
• [cs.CV]UTS submission to Google YouTube-8M Challenge 2017
• [cs.CV]Unsupervised body part regression using convolutional neural network with self-organization
• [cs.CY]Arianna: towards a new paradigm for assistive technology at home
• [cs.CY]Spatial Data Infrastructures
• [cs.DC]Buffer Size for Routing Limited-Rate Adversarial Traffic
• [cs.HC]Large-scale Multiview 3D Hand Pose Dataset
• [cs.IR]Neural Networks for Information Retrieval
• [cs.IT]Cooperative HARQ Assisted NOMA Scheme in Large-scale D2D Networks
• [cs.IT]Correction to "The Generalized Stochastic Likelihood Decoder: Random Coding and Expurgated Bounds"
• [cs.IT]Gradient Coding from Cyclic MDS Codes and Expander Graphs
• [cs.IT]MAC Resolvability: First And Second Order Results
• [cs.IT]Multi-Antenna Assisted Full-Duplex Relaying with Reliability-Aware Iterative Decoding
• [cs.IT]Robust Geometry-Based User Scheduling for Large MIMO Systems Under Realistic Channel Conditions
• [cs.IT]Synchronization Strings: Channel Simulations and Interactive Coding for Insertions and Deletions
• [cs.IT]Universal Sparse Superposition Codes with Spatial Coupling and GAMP Decoding
• [cs.LG]Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting
• [cs.LG]Distral: Robust Multitask Reinforcement Learning
• [cs.LG]Estimating the unseen from multiple populations
• [cs.LG]Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models
• [cs.LG]On Measuring and Quantifying Performance: Error Rates, Surrogate Loss, and an Example in SSL
• [cs.LG]Stable Distribution Alignment Using the Dual of the Adversarial Distance
• [cs.NE]Backpropagation in matrix notation
• [cs.NE]Capacity, Fidelity, and Noise Tolerance of Associative Spatial-Temporal Memories Based on Memristive Neuromorphic Network
• [cs.NI]Cost-Effective Cache Deployment in Mobile Heterogeneous Networks
• [cs.PL]Hot-Rodding the Browser Engine: Automatic Configuration of JavaScript Compilers
• [cs.SI]Distinguishing humans from computers in the game of go: a complex network approach
• [cs.SI]Human Sexual Cycles are Driven by Culture and Match Collective Moods
• [cs.SI]UrbanFACET: Visually Profiling Cities from Mobile Device Recorded Movement Data of Millions of City Residents
• [math.ST]Pretest and Stein-Type Estimations in Quantile Regression Model
• [math.ST]Testing High-dimensional Covariance Matrices under the Elliptical Distribution and Beyond
• [math.ST]Variable selection in multivariate linear models with high-dimensional covariance matrix estimation
• [physics.soc-ph]Network dynamics of innovation processes
• [q-bio.QM]Modeling Hormesis Using a Non-Monotonic Copula Method
• [quant-ph]Tight uniform continuity bound for a family of entropies
• [stat.AP]A New High-Dimensional Time Series Approach for Wind Speed, Wind Direction and Air Pressure Forecasting
• [stat.CO]ClustGeo: an R package for hierarchical clustering with spatial constraints
• [stat.ME]Hypoelliptic diffusions: discretization, filtering and inference from complete and partial observations
• [stat.ME]Inference under Missing Data Conditions in the Stochastic Block Model
• [stat.ME]Iterative Updating of Model Error for Bayesian Inversion
• [stat.ME]Quantifying and Estimating the Predictive Accuracy for Censored Time-to-Event Data with Competing Risks
• [stat.ME]Randomization-based Inference for Bernoulli-Trial Experiments and Implications for Observational Studies
• [stat.ML]Automation of Feature Engineering for IoT Analytics
• [stat.ML]Deep Gaussian Embedding of Attributed Graphs: Unsupervised Inductive Learning via Ranking
• [stat.ML]Distributionally Robust Optimization Techniques in Batch Bayesian Optimization
• [stat.ML]Improving Sparsity in Kernel Adaptive Filters Using a Unit-Norm Dictionary
• [stat.ML]Influence of Resampling on Accuracy of Imbalanced Classification
• [stat.ML]Kafnets: kernel-based non-parametric activation functions for neural networks
• [stat.ML]Large Scale Variable Fidelity Surrogate Modeling
• [stat.ML]Model Selection for Anomaly Detection

·····································

• [cond-mat.stat-mech]Inferring the parameters of a Markov process from snapshots of the steady state
Simon Lee Dettmer, Johannes Berg
http://arxiv.org/abs/1707.04114v1

We seek to infer the parameters of an ergodic Markov process from samples taken independently from the steady state. Our focus is on non-equilibrium processes, where the steady state is not described by the Boltzmann measure, but is generally unknown and hard to compute. To this end, we propose a quantity we call propagator likelihood, which takes on the role of the likelihood in equilibrium processes. This propagator likelihood is based on fictitious transitions between those configurations of the system which occur in the samples. The propagator likelihood can be derived by minimising the relative entropy between the empirical distribution and a distribution generated by propagating the empirical distribution forward in time. Maximising this propagator likelihood leads to an efficient reconstruction of the parameters of the underlying model in different systems, both with discrete configurations and with continuous configurations. We apply the method to different non-equilibrium models from statistical physics and theoretical biology, including the asymmetric simple exclusion process (ASEP), the kinetic Ising model, and replicator dynamics.

• [cond-mat.stat-mech]Prediction and Power in Molecular Sensors: Uncertainty and Dissipation When Conditionally Markovian Channels Are Driven by Semi-Markov Environments
Sarah E. Marzen, James P. Crutchfield
http://arxiv.org/abs/1707.03962v1

Sensors often serve at least two purposes: predicting their input and minimizing dissipated heat. However, determining whether or not a particular sensor is evolved or designed to be accurate and efficient is difficult. This arises partly from the functional constraints being at cross purposes and partly since quantifying the predictive performance of even in silico sensors can require prohibitively long simulations. To circumvent these difficulties, we develop expressions for the predictive accuracy and thermodynamic costs of the broad class of conditionally Markovian sensors subject to unifilar hidden semi-Markov (memoryful) environmental inputs. Predictive metrics include the instantaneous memory and the mutual information between present sensor state and input future, while dissipative metrics include power consumption and the nonpredictive information rate. Success in deriving these formulae relies heavily on identifying the environment's causal states, the input's minimal sufficient statistics for prediction. Using these formulae, we study the simplest nontrivial biological sensor model---that of a Hill molecule, characterized by the number of ligands that bind simultaneously, the sensor's cooperativity. When energetic rewards are proportional to total predictable information, the closest cooperativity that optimizes the total energy budget generally depends on the environment's past hysteretically. In this way, the sensor gains robustness to environmental fluctuations. Given the simplicity of the Hill molecule, such hysteresis will likely be found in more complex predictive sensors as well. That is, adaptations that only locally optimize biochemical parameters for prediction and dissipation can lead to sensors that "remember" the past environment.

• [cs.AI]A Brief Study of In-Domain Transfer and Learning from Fewer Samples using A Few Simple Priors
Marc Pickett, Ayush Sekhari, James Davidson
http://arxiv.org/abs/1707.03979v1

Domain knowledge can often be encoded in the structure of a network, such as convolutional layers for vision, which has been shown to increase generalization and decrease sample complexity, or the number of samples required for successful learning. In this study, we ask whether sample complexity can be reduced for systems where the structure of the domain is unknown beforehand, and the structure and parameters must both be learned from the data. We show that sample complexity reduction through learning structure is possible for at least two simple cases. In studying these cases, we also gain insight into how this might be done for more complex domains.

• [cs.AI]A Formal Framework to Characterize Interpretability of Procedures
Amit Dhurandhar, Vijay Iyengar, Ronny Luss, Karthikeyan Shanmugam
http://arxiv.org/abs/1707.03886v1

We provide a novel notion of what it means to be interpretable, looking past the usual association with human understanding. Our key insight is that interpretability is not an absolute concept and so we define it relative to a target model, which may or may not be a human. We define a framework that allows for comparing interpretable procedures by linking it to important practical aspects such as accuracy and robustness. We characterize many of the current state-of-the-art interpretable methods in our framework portraying its general applicability.

• [cs.AI]Armstrong's Axioms and Navigation Strategies
Kaya Deuser, Pavel Naumov
http://arxiv.org/abs/1707.04106v1

The paper investigates navigability with imperfect information. It shows that the properties of navigability with perfect recall are exactly those captured by Armstrong's axioms from the database theory. If the assumption of perfect recall is omitted, then Armstrong's transitivity axiom is not valid, but it can be replaced by two new weaker principles. The main technical results are soundness and completeness theorems for the logical systems describing properties of navigability with and without perfect recall.

• [cs.AI]Autoencoder-augmented Neuroevolution for Visual Doom Playing
Samuel Alvernaz, Julian Togelius
http://arxiv.org/abs/1707.03902v1

Neuroevolution has proven effective at many reinforcement learning tasks, but does not seem to scale well to high-dimensional controller representations, which are needed for tasks where the input is raw pixel data. We propose a novel method where we train an autoencoder to create a comparatively low-dimensional representation of the environment observation, and then use CMA-ES to train neural network controllers acting on this input data. As the behavior of the agent changes the nature of the input data, the autoencoder training progresses throughout evolution. We test this method in the VizDoom environment built on the classic FPS Doom, where it performs well on a health-pack gathering task.

• [cs.AI]Automatic Mapping of NES Games with Mappy
Joseph C. Osborn, Adam Summerville, Michael Mateas
http://arxiv.org/abs/1707.03908v1

Game maps are useful for human players, general-game-playing agents, and data-driven procedural content generation. These maps are generally made by hand-assembling manually-created screenshots of game levels. Besides being tedious and error-prone, this approach requires additional effort for each new game and level to be mapped. The results can still be hard for humans or computational systems to make use of, privileging visual appearance over semantic information. We describe a software system, Mappy, that produces a good approximation of a linked map of rooms given a Nintendo Entertainment System game program and a sequence of button inputs exploring its world. In addition to visual maps, Mappy outputs grids of tiles (and how they change over time), positions of non-tile objects, clusters of similar rooms that might in fact be the same room, and a set of links between these rooms. We believe this is a necessary step towards developing larger corpora of high-quality semantically-annotated maps for PCG via machine learning and other applications.

• [cs.AI]Clingo goes Linear Constraints over Reals and Integers
Tomi Janhunen, Roland Kaminski, Max Ostrowski, Torsten Schaub, Sebastian Schellhorn, Philipp Wanko
http://arxiv.org/abs/1707.04053v1

The recent series 5 of the ASP system clingo provides generic means to enhance basic Answer Set Programming (ASP) with theory reasoning capabilities. We instantiate this framework with different forms of linear constraints, discuss the respective implementations, and present techniques of how to use these constraints in a reactive context. More precisely, we introduce extensions to clingo with difference and linear constraints over integers and reals, respectively, and realize them in complementary ways. Finally, we empirically evaluate the resulting clingo derivatives clingo[dl] and clingo[lp] on common fragments and contrast them to related ASP systems. This paper is under consideration for acceptance in TPLP.

• [cs.AI]Constraints, Lazy Constraints, or Propagators in ASP Solving: An Empirical Analysis
Bernardo Cuteri, Carmine Dodaro, Francesco Ricca, Peter Schüller
http://arxiv.org/abs/1707.04027v1

Answer Set Programming (ASP) is a well-established declarative paradigm. One of the successes of ASP is the availability of efficient systems. State-of-the-art systems are based on the ground+solve approach. In some applications this approach is infeasible because the grounding of one or few constraints is expensive. In this paper, we systematically compare alternative strategies to avoid the instantiation of problematic constraints, that are based on custom extensions of the solver. Results on real and synthetic benchmarks highlight some strengths and weaknesses of the different strategies. (Under consideration for acceptance in TPLP, ICLP 2017 Special Issue.)

• [cs.AI]Dependency Injection for Programming by Optimization
Zoltan A. Kocsis, Jerry Swan
http://arxiv.org/abs/1707.04016v1

Programming by Optimization tools perform automatic software configuration according to the specification supplied by a software developer. Developers specify design spaces for program components, and the onerous task of determining which configuration best suits a given use case is determined using automated analysis tools and optimization heuristics. However, in current approaches to Programming by Optimization, design space specification and exploration relies on external configuration algorithms, executable wrappers and fragile, preprocessed programming language extensions. Here we show that the architectural pattern of Dependency Injection provides a superior alternative to the traditional Programming by Optimization pipeline. We demonstrate that configuration tools based on Dependency Injection fit naturally into the software development process, while requiring less overhead than current wrapper-based mechanisms. Furthermore, the structural correspondence between Dependency Injection and context-free grammars yields a new class of evolutionary metaheuristics for automated algorithm configuration. We found that the new heuristics significantly outperform existing configuration algorithms on many problems of interest (in one case by two orders of magnitude). We anticipate that these developments will make Programming by Optimization immediately applicable to a large number of enterprise software projects.

• [cs.AI]Identification and Interpretation of Belief Structure in Dempster-Shafer Theory
Mieczysław A. Kłopotek
http://arxiv.org/abs/1707.03881v1

Mathematical Theory of Evidence called also Dempster-Shafer Theory (DST) is known as a foundation for reasoning when knowledge is expressed at various levels of detail. Though much research effort has been committed to this theory since its foundation, many questions remain open. One of the most important open questions seems to be the relationship between frequencies and the Mathematical Theory of Evidence. The theory is blamed to leave frequencies outside (or aside of) its framework. The seriousness of this accusation is obvious: (1) no experiment may be run to compare the performance of DST-based models of real world processes against real world data, (2) data may not serve as foundation for construction of an appropriate belief model. In this paper we develop a frequentist interpretation of the DST bringing to fall the above argument against DST. An immediate consequence of it is the possibility to develop algorithms acquiring automatically DST belief models from data. We propose three such algorithms for various classes of belief model structures: for tree structured belief networks, for poly-tree belief networks and for general type belief networks.

• [cs.AI]Independence, Conditionality and Structure of Dempster-Shafer Belief Functions
Mieczysław A. Kłopotek
http://arxiv.org/abs/1707.03872v1

Several approaches of structuring (factorization, decomposition) of Dempster-Shafer joint belief functions from literature are reviewed with special emphasis on their capability to capture independence from the point of view of the claim that belief functions generalize bayes notion of probability. It is demonstrated that Zhu and Lee's {Zhu:93} logical networks and Smets' {Smets:93} directed acyclic graphs are unable to capture statistical dependence/independence of bayesian networks {Pearl:88}. On the other hand, though Shenoy and Shafer's hypergraphs can explicitly represent bayesian network factorization of bayesian belief functions, they disclaim any need for representation of independence of variables in belief functions. Cano et al. {Cano:93} reject the hypergraph representation of Shenoy and Shafer just on grounds of missing representation of variable independence, but in their frameworks some belief functions factorizable in Shenoy/Shafer framework cannot be factored. The approach in {Klopotek:93f} on the other hand combines the merits of both Cano et al. and of Shenoy/Shafer approach in that for Shenoy/Shafer approach no simpler factorization than that in {Klopotek:93f} approach exists and on the other hand all independences among variables captured in Cano et al. framework and many more are captured in {Klopotek:93f} approach.%

• [cs.AI]Learning like humans with Deep Symbolic Networks
Qunzhi Zhang, Didier Sornette
http://arxiv.org/abs/1707.03377v2

We introduce the Deep Symbolic Network (DSN) model, which aims at becoming the white-box version of Deep Neural Networks (DNN). The DSN model provides a simple, universal yet powerful structure, similar to DNN, to represent any knowledge of the world, which is transparent to humans. The conjecture behind the DSN model is that any type of real world objects sharing enough common features are mapped into human brains as a symbol. Those symbols are connected by links, representing the composition, correlation, causality, or other relationships between them, forming a deep, hierarchical symbolic network structure. Powered by such a structure, the DSN model is expected to learn like humans, because of its unique characteristics. First, it is universal, using the same structure to store any knowledge. Second, it can learn symbols from the world and construct the deep symbolic networks automatically, by utilizing the fact that real world objects have been naturally separated by singularities. Third, it is symbolic, with the capacity of performing causal deduction and generalization. Fourth, the symbols and the links between them are transparent to us, and thus we will know what it has learned or not - which is the key for the security of an AI system. Fifth, its transparency enables it to learn with relatively small data. Sixth, its knowledge can be accumulated. Last but not least, it is more friendly to unsupervised learning than DNN. We present the details of the model, the algorithm powering its automatic learning ability, and describe its usefulness in different use cases. The purpose of this paper is to generate broad interest to develop it within an open source project centered on the Deep Symbolic Network (DSN) model towards the development of general AI.

• [cs.AI]Lithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media
Preeti Bhargava, Nemanja Spasojevic, Guoning Hu
http://arxiv.org/abs/1707.04244v1

In this paper, we describe the Lithium Natural Language Processing (NLP) system - a resource-constrained, high- throughput and language-agnostic system for information extraction from noisy user generated text on social media. Lithium NLP extracts a rich set of information including entities, topics, hashtags and sentiment from text. We discuss several real world applications of the system currently incorporated in Lithium products. We also compare our system with existing commercial and academic NLP systems in terms of performance, information extracted and languages supported. We show that Lithium NLP is at par with and in some cases, outperforms state- of-the-art commercial NLP systems.

• [cs.AI]Mechanics Automatically Recognized via Interactive Observation: Jumping
Adam Summerville, Joseph C. Osborn, Christoffer Holmgård, Daniel W. Zhang
http://arxiv.org/abs/1707.03865v1

Jumping has been an important mechanic since its introduction in Donkey Kong. It has taken a variety of forms and shown up in numerous games, with each jump having a different feel. In this paper, we use a modified Nintendo Entertainment System (NES) emulator to semi-automatically run experiments on a large subset (30%) of NES platform games. We use these experiments to build models of jumps from different developers, series, and games across the history of the console. We then examine these models to gain insights into different forms of jumping and their associated feel.

• [cs.CL]A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?
Minh Le
http://arxiv.org/abs/1707.03819v1

Critical evaluation of word similarity datasets is very important for computational lexical semantics. This short report concerns the sanity check proposed in Batchkarov et al. (2016) to evaluate several popular datasets such as MC, RG and MEN -- the first two reportedly failed. I argue that this test is unstable, offers no added insight, and needs major revision in order to fulfill its purported goal.

• [cs.CL]A Web-Based Tool for Analysing Normative Documents in English
John J. Camilleri, Mohammad Reza Haghshenas, Gerardo Schneider
http://arxiv.org/abs/1707.03997v1

Our goal is to use formal methods to analyse normative documents written in English, such as privacy policies and service-level agreements. This requires the combination of a number of different elements, including information extraction from natural language, formal languages for model representation, and an interface for property specification and verification. We have worked on a collection of components for this task: a natural language extraction tool, a suitable formalism for representing such documents, an interface for building models in this formalism, and methods for answering queries asked of a given model. In this work, each of these concerns is brought together in a web-based tool, providing a single interface for analysing normative texts in English. Through the use of a running example, we describe each component and demonstrate the workflow established by our tool.

• [cs.CL]Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies
Seppo Enarvi, Peter Smit, Sami Virpioja, Mikko Kurimo
http://arxiv.org/abs/1707.04227v1

Previously, very large vocabularies have been efficiently modeled in conventional n-gram language models either by splitting words into subword units or by clustering words into classes. While the vocabulary size is not anymore as critical in modern speech recognition systems, training time and memory consumption become an issue when state-of-the-art neural network language models are used. In this paper we investigate techniques that address the vocabulary size issue by reducing the effective vocabulary size and by processing large vocabularies more efficiently. The experimental results in conversational Finnish and Estonian speech recognition indicate that properly defined word classes improve recognition accuracy. Subword n-gram models are not better on evaluation data than word n-gram models constructed from a vocabulary that includes all the words in the training corpus. However, when recurrent neural network (RNN) language models are used, their ability to utilize long contexts gives a larger gain to subword-based modeling. Our best results are from RNN language models that are based on statistical morphs. We show that the suitable size for a subword vocabulary depends on the language. Using time delay neural network (TDNN) acoustic models, we were able to achieve new state of the art in Finnish and Estonian conversational speech recognition, 27.1 % word error rate in the Finnish task and 21.9 % in the Estonian task.

• [cs.CL]Do Convolutional Networks need to be Deep for Text Classification ?
Hoa T. Le, Christophe Cerisara, Alexandre Denis
http://arxiv.org/abs/1707.04108v1

We study in this work the importance of depth in convolutional models for text classification, either when character or word inputs are considered. We show on 5 standard text classification and sentiment analysis tasks that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms deep models such as DenseNet with word inputs. Our shallow word model further establishes new state-of-the-art performances on two datasets: Yelp Binary (95.9%) and Yelp Full (64.9%).

• [cs.CL]Is writing style predictive of scientific fraud?
Chloé Braud, Anders Søgaard
http://arxiv.org/abs/1707.04095v1

The problem of detecting scientific fraud using machine learning was recently introduced, with initial, positive results from a model taking into account various general indicators. The results seem to suggest that writing style is predictive of scientific fraud. We revisit these initial experiments, and show that the leave-one-out testing procedure they used likely leads to a slight over-estimate of the predictability, but also that simple models can outperform their proposed model by some margin. We go on to explore more abstract linguistic features, such as linguistic complexity and discourse structure, only to obtain negative results. Upon analyzing our models, we do see some interesting patterns, though: Scientific fraud, for examples, contains less comparison, as well as different types of hedging and ways of presenting logical reasoning.

• [cs.CL]Learning Features from Co-occurrences: A Theoretical Analysis
Yanpeng Li
http://arxiv.org/abs/1707.04218v1

Representing a word by its co-occurrences with other words in context is an effective way to capture the meaning of the word. However, the theory behind remains a challenge. In this work, taking the example of a word classification task, we give a theoretical analysis of the approaches that represent a word X by a function f(P(C|X)), where C is a context feature, P(C|X) is the conditional probability estimated from a text corpus, and the function f maps the co-occurrence measure to a prediction score. We investigate the impact of context feature C and the function f. We also explain the reasons why using the co-occurrences with multiple context features may be better than just using a single one. In addition, some of the results shed light on the theory of feature learning and machine learning in general.

• [cs.CL]Look Who's Talking: Bipartite Networks as Representations of a Topic Model of New Zealand Parliamentary Speeches
Ben Curran, Kyle Higham, Elisenda Ortiz, Demi Vasques Filho
http://arxiv.org/abs/1707.03095v2

Quantitative methods to measure the participation to parliamentary debate and discourse of elected Members of Parliament and the parties they belong to are lacking. This is an exploratory study in which we propose the development of a new approach for a quantitative analysis of such participation. We utilize the New Zealand governments Hansard database to construct a topic model of parliamentary speeches consisting of nearly 40 million words in the period 2003 to 2016. A Latent Dirichlet Allocation topic model is implemented in order to reveal the thematic structure of our set of documents. This enables the detection of major themes or topics that are publicly discussed in the New Zealand parliament, as well as permitting their classification by MP. We observe patterns arising from time-series analysis of topic frequencies which can be related to specific social, economic and legislative events.

• [cs.CL]Negative Sampling Improves Hypernymy Extraction Based on Projection Learning
Dmitry Ustalov, Nikolay Arefyev, Chris Biemann, Alexander Panchenko
http://arxiv.org/abs/1707.03903v1

We present a new approach to extraction of hypernyms based on projection learning and word embeddings. In contrast to classification-based approaches, projection-based methods require no candidate hyponym-hypernym pairs. While it is natural to use both positive and negative training examples in supervised relation extraction, the impact of negative examples on hypernym prediction was not studied so far. In this paper, we show that explicit negative examples used for regularization of the model significantly improve performance compared to the state-of-the-art approach of Fu et al. (2014) on three datasets from different languages.

• [cs.CL]Parsing with Traces: An $O(n^4)$ Algorithm and a Structural Representation
Jonathan K. Kummerfeld, Dan Klein
http://arxiv.org/abs/1707.04221v1

General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons. We propose a new representation and algorithm for a class of graph structures that is flexible enough to cover almost all treebank structures, while still admitting efficient learning and inference. In particular, we consider directed, acyclic, one-endpoint-crossing graph structures, which cover most long-distance dislocation, shared argumentation, and similar tree-violating linguistic phenomena. We describe how to convert phrase structure parses, including traces, to our new representation in a reversible manner. Our dynamic program uniquely decomposes structures, is sound and complete, and covers 97.3% of the Penn English Treebank. We also implement a proof-of-concept parser that recovers a range of null elements and trace types.

• [cs.CL]Predicting Causes of Reformulation in Intelligent Assistants
Shumpei Sano, Nobuhiro Kaji, Manabu Sassano
http://arxiv.org/abs/1707.03968v1

Intelligent assistants (IAs) such as Siri and Cortana conversationally interact with users and execute a wide range of actions (e.g., searching the Web, setting alarms, and chatting). IAs can support these actions through the combination of various components such as automatic speech recognition, natural language understanding, and language generation. However, the complexity of these components hinders developers from determining which component causes an error. To remove this hindrance, we focus on reformulation, which is a useful signal of user dissatisfaction, and propose a method to predict the reformulation causes. We evaluate the method using the user logs of a commercial IA. The experimental results have demonstrated that features designed to detect the error of a specific component improve the performance of reformulation cause detection.

• [cs.CL]Quasar: Datasets for Question Answering by Search and Reading
Bhuwan Dhingra, Kathryn Mazaitis, William W. Cohen
http://arxiv.org/abs/1707.03904v1

We present two new large-scale datasets aimed at evaluating systems designed to comprehend a natural language query and extract its answer from a large corpus of text. The Quasar-S dataset consists of 37000 cloze-style (fill-in-the-gap) queries constructed from definitions of software entity tags on the popular website Stack Overflow. The posts and comments on the website serve as the background corpus for answering the cloze questions. The Quasar-T dataset consists of 43000 open-domain trivia questions and their answers obtained from various internet sources. ClueWeb09 serves as the background corpus for extracting these answers. We pose these datasets as a challenge for two related subtasks of factoid Question Answering: (1) searching for relevant pieces of text that include the correct answer to a query, and (2) reading the retrieved text to answer the query. We also describe a retrieval system for extracting relevant sentences and documents from the corpus given a query, and include these in the release for researchers wishing to only focus on (2). We evaluate several baselines on both datasets, ranging from simple heuristics to powerful neural models, and show that these lag behind human performance by 16.4% and 32.1% for Quasar-S and -T respectively. The datasets are available at https://github.com/bdhingra/quasar .

• [cs.CL]Representation Learning for Grounded Spatial Reasoning
Michael Janner, Karthik Narasimhan, Regina Barzilay
http://arxiv.org/abs/1707.03938v1

The interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive rewards. The proposed model learns a representation of the world steered by instruction text. This design allows for precise alignment of local neighborhoods with corresponding verbalizations, while also handling global references in the instructions. We train our model with reinforcement learning using a variant of generalized value iteration. The model outperforms state-of-the-art approaches on several metrics, yielding a 45% reduction in goal localization error.

• [cs.CR]Process Monitoring on Sequences of System Call Count Vectors
Michael Dymshits, Ben Myara, David Tolpin
http://arxiv.org/abs/1707.03821v1

We introduce a methodology for efficient monitoring of processes running on hosts in a corporate network. The methodology is based on collecting streams of system calls produced by all or selected processes on the hosts, and sending them over the network to a monitoring server, where machine learning algorithms are used to identify changes in process behavior due to malicious activity, hardware failures, or software errors. The methodology uses a sequence of system call count vectors as the data format which can handle large and varying volumes of data. Unlike previous approaches, the methodology introduced in this paper is suitable for distributed collection and processing of data in large corporate networks. We evaluate the methodology both in a laboratory setting on a real-life setup and provide statistics characterizing performance and accuracy of the methodology.

• [cs.CV]Automatic Recognition of Deceptive Facial Expressions of Emotion
Ikechukwu Ofodile, Kaustubh Kulkarni, Ciprian Adrian Corneanu, Sergio Escalera, Xavier Baro, Sylwia Hyniewska, Juri Allik, Gholamreza Anbarjafari
http://arxiv.org/abs/1707.04061v1

Humans modify facial expressions in order to mislead observers regarding their true emotional states. Being able to recognize the authenticity of emotional displays is notoriously difficult for human observers. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposing SASE-FE, the first dataset of genuine and deceptive facial expressions of emotions for automatic recognition. We show that overall the problem of recognizing deceptive facial expressions can be successfully addressed by learning spatio-temporal representations of the data. For this purpose, we propose a method that aggregates features along fiducial trajectories in a deeply learnt feature space. Interesting additional results show that on average it is easier to distinguish among genuine expressions than deceptive ones and that certain emotion pairs are more difficult to distinguish than others.

• [cs.CV]Deep Learning with Topological Signatures
Christoph Hofer, Roland Kwitt, Marc Niethammer, Andreas Uhl
http://arxiv.org/abs/1707.04041v1

Inferring topological and geometrical information from data can offer an alternative perspective on machine learning problems. Methods from topological data analysis, e.g., persistent homology, enable us to obtain such information, typically in the form of summary representations of topological features. However, such topological signatures often come with an unusual structure (e.g., multisets of intervals) that is highly impractical for most machine learning techniques. While many strategies have been proposed to map these topological signatures into machine learning compatible representations, they suffer from being agnostic to the target learning task. In contrast, we propose a technique that enables us to input topological signatures to deep neural networks and learn a task-optimal representation during training. Our approach is realized as a novel input layer with favorable theoretical properties. Classification experiments on 2D object shapes and social network graphs demonstrate the versatility of the approach and, in case of the latter, we even outperform the state-of-the-art by a large margin.

• [cs.CV]Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search
Lei Zhu, Zi Huang, Xiaobai Liu, Xiangnan He, Jingkuan Song, Xiaofang Zhou
http://arxiv.org/abs/1707.04047v1

Mobile landmark search (MLS) recently receives increasing attention for its great practical values. However, it still remains unsolved due to two important challenges. One is high bandwidth consumption of query transmission, and the other is the huge visual variations of query images sent from mobile devices. In this paper, we propose a novel hashing scheme, named as canonical view based discrete multi-modal hashing (CV-DMH), to handle these problems via a novel three-stage learning procedure. First, a submodular function is designed to measure visual representativeness and redundancy of a view set. With it, canonical views, which capture key visual appearances of landmark with limited redundancy, are efficiently discovered with an iterative mining strategy. Second, multi-modal sparse coding is applied to transform visual features from multiple modalities into an intermediate representation. It can robustly and adaptively characterize visual contents of varied landmark images with certain canonical views. Finally, compact binary codes are learned on intermediate representation within a tailored discrete binary embedding model which preserves visual relations of images measured with canonical views and removes the involved noises. In this part, we develop a new augmented Lagrangian multiplier (ALM) based optimization method to directly solve the discrete binary codes. We can not only explicitly deal with the discrete constraint, but also consider the bit-uncorrelated constraint and balance constraint together. Experiments on real world landmark datasets demonstrate the superior performance of CV-DMH over several state-of-the-art methods.

• [cs.CV]Disentangling Motion, Foreground and Background Features in Videos
Xunyu Lin, Victor Campos, Xavier Giro-i-Nieto, Jordi Torres, Cristian Canton Ferrer
http://arxiv.org/abs/1707.04092v1

This paper introduces an unsupervised framework to extract semantically rich features for video representation. Inspired by how the human visual system groups objects based on motion cues, we propose a deep convolutional neural network that disentangles motion, foreground and background information. The proposed architecture consists of a 3D convolutional feature encoder for blocks of 16 frames, which is trained for reconstruction tasks over the first and last frames of the sequence. A preliminary supervised experiment was conducted to verify the feasibility of proposed method by training the model with a fraction of videos from the UCF-101 dataset taking as ground truth the bounding boxes around the activity regions. Qualitative results indicate that the network can successfully segment foreground and background in videos as well as update the foreground appearance based on disentangled motion features. The benefits of these learned features are shown in a discriminative classification task, where initializing the network with the proposed pretraining method outperforms both random initialization and autoencoder pretraining. Our model and source code are publicly available at https://allenovo.github.io/cvprw17_webpage/ .

• [cs.CV]Large-scale Video Classification guided by Batch Normalized LSTM Translator
Jae Hyeon Yoo
http://arxiv.org/abs/1707.04045v1

Youtube-8M dataset enhances the development of large-scale video recognition technology as ImageNet dataset has encouraged image classification, recognition and detection of artificial intelligence fields. For this large video dataset, it is a challenging task to classify a huge amount of multi-labels. By change of perspective, we propose a novel method by regarding labels as words. In details, we describe online learning approaches to multi-label video classification that are guided by deep recurrent neural networks for video to sentence translator. We designed the translator based on LSTMs and found out that a stochastic gating before the input of each LSTM cell can help us to design the structural details. In addition, we adopted batch normalizations into our models to improve our LSTM models. Since our models are feature extractors, they can be used with other classifiers. Finally we report improved validation results of our models on large-scale Youtube-8M datasets and discussions for the further improvement.

• [cs.CV]Learning Photography Aesthetics with Deep CNNs
Gautam Malu, Raju S. Bapi, Bipin Indurkhya
http://arxiv.org/abs/1707.03981v1

Automatic photo aesthetic assessment is a challenging artificial intelligence task. Existing computational approaches have focused on modeling a single aesthetic score or a class (good or bad), however these do not provide any details on why the photograph is good or bad, or which attributes contribute to the quality of the photograph. To obtain both accuracy and human interpretation of the score, we advocate learning the aesthetic attributes along with the prediction of the overall score. For this purpose, we propose a novel multitask deep convolution neural network, which jointly learns eight aesthetic attributes along with the overall aesthetic score. We report near human performance in the prediction of the overall aesthetic score. To understand the internal representation of these attributes in the learned model, we also develop the visualization technique using back propagation of gradients. These visualizations highlight the important image regions for the corresponding attributes, thus providing insights about model's representation of these attributes. We showcase the diversity and complexity associated with different attributes through a qualitative analysis of the activation maps.

• [cs.CV]Leveraging the Path Signature for Skeleton-based Human Action Recognition
Weixin Yang, Terry Lyons, Hao Ni, Cordelia Schmid, Lianwen Jin, Jiawei Chang
http://arxiv.org/abs/1707.03993v1

Human action recognition in videos is one of the most challenging tasks in computer vision. One important issue is how to design discriminative features for representing spatial context and temporal dynamics. Here, we introduce a path signature feature to encode information from intra-frame and inter-frame contexts. A key step towards leveraging this feature is to construct the proper trajectories (paths) for the data steam. In each frame, the correlated constraints of human joints are treated as small paths, then the spatial path signature features are extracted from them. In video data, the evolution of these spatial features over time can also be regarded as paths from which the temporal path signature features are extracted. Eventually, all these features are concatenated to constitute the input vector of a fully connected neural network for action classification. Experimental results on four standard benchmark action datasets, J-HMDB, SBU Dataset, Berkeley MHAD, and NTURGB+D demonstrate that the proposed approach achieves state-of-the-art accuracy even in comparison with recent deep learning based models.

• [cs.CV]Merge or Not? Learning to Group Faces via Imitation Learning
Yue He, Kaidi Cao, Cheng Li, Chen Change Loy
http://arxiv.org/abs/1707.03986v1

Given a large number of unlabeled face images, face grouping aims at clustering the images into individual identities present in the data. This task remains a challenging problem despite the remarkable capability of deep learning approaches in learning face representation. In particular, grouping results can still be egregious given profile faces and a large number of uninteresting faces and noisy detections. Often, a user needs to correct the erroneous grouping manually. In this study, we formulate a novel face grouping framework that learns clustering strategy from ground-truth simulated behavior. This is achieved through imitation learning (a.k.a apprenticeship learning or learning by watching) via inverse reinforcement learning (IRL). In contrast to existing clustering approaches that group instances by similarity, our framework makes sequential decision to dynamically decide when to merge two face instances/groups driven by short- and long-term rewards. Extensive experiments on three benchmark datasets show that our framework outperforms unsupervised and supervised baselines.

• [cs.CV]Query-Aware Sparse Coding for Multi-Video Summarization
Zhong Ji, Yaru Ma, Yanwei Pang, Xuelong Li
http://arxiv.org/abs/1707.04021v1

Given the explosive growth of online videos, it is becoming increasingly important to relieve the tedious work of browsing and managing the video content of interest. Video summarization aims at providing such a technique by transforming one or multiple videos into a compact one. However, conventional multi-video summarization methods often fail to produce satisfying results as they ignore the user's search intent. To this end, this paper proposes a novel query-aware approach by formulating the multi-video summarization in a sparse coding framework, where the web images searched by the query are taken as the important preference information to reveal the query intent. To provide a user-friendly summarization, this paper also develops an event-keyframe presentation structure to present keyframes in groups of specific events related to the query by using an unsupervised multi-graph fusion method. We release a new public dataset named MVS1K, which contains about 1, 000 videos from 10 queries and their video tags, manual annotations, and associated web images. Extensive experiments on MVS1K dataset validate our approaches produce superior objective and subjective results against several recently proposed approaches.

• [cs.CV]SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes
Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto, Noel E. O'Connor
http://arxiv.org/abs/1707.03123v3

We introduce SaltiNet, a deep neural network for scanpath prediction trained on 360-degree images. The first part of the network consists of a model trained to generate saliency volumes, whose parameters are learned by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency volumes. Sampling strategies over these volumes are used to generate scanpaths over the 360-degree images. Our experiments show the advantages of using saliency volumes, and how they can be used for related tasks. Our source code and trained models available at https://github.com/massens/saliency-360salient-2017 .

• [cs.CV]The Surfacing of Multiview 3D Drawings via Lofting and Occlusion Reasoning
Anil Usumezbas, Ricardo Fabbri, Benjamin Kimia
http://arxiv.org/abs/1707.03946v1

The three-dimensional reconstruction of scenes from multiple views has made impressive strides in recent years, chiefly by methods correlating isolated feature points, intensities, or curvilinear structure. In the general setting, i.e., without requiring controlled acquisition, limited number of objects, abundant patterns on objects, or object curves to follow particular models, the majority of these methods produce unorganized point clouds, meshes, or voxel representations of the reconstructed scene, with some exceptions producing 3D drawings as networks of curves. Many applications, e.g., robotics, urban planning, industrial design, and hard surface modeling, however, require structured representations which make explicit 3D curves, surfaces, and their spatial relationships. Reconstructing surface representations can now be constrained by the 3D drawing acting like a scaffold to hang on the computed representations, leading to increased robustness and quality of reconstruction. This paper presents one way of completing such 3D drawings with surface reconstructions, by exploring occlusion reasoning through lofting algorithms.

• [cs.CV]Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks
Hui Li, Peng Wang, Chunhua Shen
http://arxiv.org/abs/1707.03985v1

In this work, we jointly address the problem of text detection and recognition in natural scene images based on convolutional recurrent neural networks. We propose a unified network that simultaneously localizes and recognizes text with a single forward pass, avoiding intermediate processes like image cropping and feature re-calculation, word separation, or character grouping. In contrast to existing approaches that consider text detection and recognition as two distinct tasks and tackle them one by one, the proposed framework settles these two tasks concurrently. The whole framework can be trained end-to-end, requiring only images, the ground-truth bounding boxes and text labels. Through end-to-end training, the learned features can be more informative, which improves the overall performance. The convolutional features are calculated only once and shared by both detection and recognition, which saves processing time. Our proposed method has achieved competitive performance on several benchmark datasets.

• [cs.CV]UTS submission to Google YouTube-8M Challenge 2017
Linchao Zhu, Yanbin Liu, Yi Yang
http://arxiv.org/abs/1707.04143v1

In this paper, we present our solution to Google YouTube-8M Video Classification Challenge 2017. We leveraged both video-level and frame-level features in the submission. For video-level classification, we simply used a 200-mixture Mixture of Experts (MoE) layer, which achieves GAP 0.802 on the validation set with a single model. For frame-level classification, we utilized several variants of recurrent neural networks, sequence aggregation with attention mechanism and 1D convolutional models. We achieved GAP 0.8408 on the private testing set with the ensemble model. The source code of our models can be found in \url{https://github.com/ffmpbgrnn/yt8m}.

• [cs.CV]Unsupervised body part regression using convolutional neural network with self-organization
Ke Yan, Le Lu, Ronald M. Summers
http://arxiv.org/abs/1707.03891v1

Automatic body part recognition for CT slices can benefit various medical image applications. Existing methods require large-scale datasets of labeled images. However, the intrinsic structure information in CT volumes was still not fully leveraged. In this paper, we propose unsupervised body part regressor (UBR) to utilize this information. A novel learning framework based on a convolutional neural network and two inter-sample loss functions was designed. UBR builds a coordinate system for the body and outputs a continuous score for each slice, which represents the normalized position of the body part in the slice. The training process of UBR resembles a self-organization process: slice scores are learned from inter-slice relationships. The training samples are unlabeled volumes that are abundant in every hospital's database, thus zero annotation effort is needed. UBR is simple, fast, and accurate. Qualitative and quantitative experiments proved its effectiveness. Besides, we show potential applications of UBR in network initialization and anomaly detection.

• [cs.CY]Arianna: towards a new paradigm for assistive technology at home
Luca Buoncompagni, Barbara Bruno, Antonella Giuni, Fulvio Mastrogiovanni, Renato Zaccaria
http://arxiv.org/abs/1707.03988v1

Providing elderly and people with special needs to retain their independence as long as possible is one of the biggest challenges of the society of tomorrow. Teseo, a startup company spinoff from the University of Genoa, aims at accelerating the transition towards a sustainable healthcare system. Teseo's first concept and product, Arianna, allows for the automated recognition of activities of daily living at home and acts as a wellbeing and healthcare personalized assistant. This abstract outlines the main concepts underlying its features and capabilities.

• [cs.CY]Spatial Data Infrastructures
Yingjie Hu, Wenwen Li
http://arxiv.org/abs/1707.03969v1

Spatial data infrastructure (SDI) is the infrastructure that facilitates the discovery, access, management, distribution, reuse, and preservation of digital geospatial resources. These resources may include maps, data, geospatial services, and tools. As cyberinfrastructures, SDIs are similar to other infrastructures, such as water supplies and transportation networks, since they play fundamental roles in many aspects of the society. These roles have become even more significant in today's big data age, when a large volume of geospatial data and Web services are available. From a technological perspective, SDIs mainly consist of data, hardware, and software. However, a truly functional SDI also needs the efforts of people, supports from organizations, government policies, data and software standards, and many others. In this chapter, we will present the concepts and values of SDIs, as well as a brief history of SDI development in the U.S. We will also discuss the components of a typical SDI, and will specifically focus on three key components: geoportals, metadata, and search functions. Examples of the existing SDI implementations will also be discussed.

• [cs.DC]Buffer Size for Routing Limited-Rate Adversarial Traffic
Avery Miller, Boaz Patt-Shamir
http://arxiv.org/abs/1707.03856v1

We consider the slight variation of the adversarial queuing theory model, in which an adversary injects packets with routes into the network subject to the following constraint: For any link $e$, the total number of packets injected in any time window $[t,t')$ and whose route contains $e$, is at most $\rho(t'-t)+\sigma$, where $\rho$ and $\sigma$ are non-negative parameters. Informally, $\rho$ bounds the long-term rate of injections and $\sigma$ bounds the "burstiness" of injection: $\sigma=0$ means that the injection is as smooth as it can be. It is known that greedy scheduling of the packets (under which a link is not idle if there is any packet ready to be sent over it) may result in $\Omega(n)$ buffer size even on an $n$-line network and very smooth injections ($\sigma=0$). In this paper we propose a simple non-greedy scheduling policy and show that, in a tree where all packets are destined at the root, no buffer needs to be larger than $\sigma+2\rho$ to ensure that no overflows occur, which is optimal in our model. The rule of our algorithm is to forward a packet only if its next buffer is completely empty. The policy is centralized: in a single step, a long "train" of packets may progress together. We show that in some sense central coordination is required, by presenting an injection pattern with $\sigma=0$ for the $n$-node line that results in $\Omega(n)$ packets in a buffer if local control is used, even for the more sophisticated "downhill" algorithm, which forwards a packet only if its next buffer is less occupied than its current one.

• [cs.HC]Large-scale Multiview 3D Hand Pose Dataset
Francisco Gomez-Donoso, Sergio Orts-Escolano, Miguel Cazorla
http://arxiv.org/abs/1707.03742v2

Accurate hand pose estimation at joint level has several uses on human-robot interaction, user interfacing and virtual reality applications. Yet, it currently is not a solved problem. The novel deep learning techniques could make a great improvement on this matter but they need a huge amount of annotated data. The hand pose datasets released so far present some issues that make them impossible to use on deep learning methods such as the few number of samples, high-level abstraction annotations or samples consisting in depth maps. In this work, we introduce a multiview hand pose dataset in which we provide color images of hands and different kind of annotations for each, i.e the bounding box and the 2D and 3D location on the joints in the hand. Besides, we introduce a simple yet accurate deep learning architecture for real-time robust 2D hand pose estimation.

• [cs.IR]Neural Networks for Information Retrieval
Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke, Bhaskar Mitra
http://arxiv.org/abs/1707.04242v1

Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many different approaches for many different IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. Additionally, it is interesting to see what key insights into IR problems the new technologies are able to give us. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research. It covers key architectures, as well as the most promising future directions.

• [cs.IT]Cooperative HARQ Assisted NOMA Scheme in Large-scale D2D Networks
Zheng Shi, Shaodan Ma, Hesham ElSawy, Guanghua Yang, Mohamed-Slim Alouini
http://arxiv.org/abs/1707.03945v1

This paper develops an interference aware design for cooperative hybrid automatic repeat request (HARQ) assisted non-orthogonal multiple access (NOMA) scheme for large-scale device-to-device (D2D) networks. Specifically, interference aware rate selection and power allocation are considered to maximize long term average throughput (LTAT) and area spectral efficiency (ASE). The design framework is based on stochastic geometry that jointly accounts for the spatial interference correlation at the NOMA receivers as well as the temporal interference correlation across HARQ transmissions. It is found that ignoring the effect of the aggregate interference, or overlooking the spatial and temporal correlation in interference, highly overestimates the NOMA performance and produces misleading design insights. An interference oblivious selection for the power and/or transmission rates leads to violating the network outage constraints. To this end, the results demonstrate the effectiveness of NOMA transmission and manifest the importance of the cooperative HARQ to combat the negative effect of the network aggregate interference. For instance, comparing to the non-cooperative HARQ assisted NOMA, the proposed scheme can yield an outage probability reduction by $32$%. Furthermore, an interference aware optimal design that maximizes the LTAT given outage constraints leads to $47$% throughput improvement over HARQ-assisted orthogonal multiple access (OMA) scheme.

• [cs.IT]Correction to "The Generalized Stochastic Likelihood Decoder: Random Coding and Expurgated Bounds"
Neri Merhav
http://arxiv.org/abs/1707.03987v1

This purpose of this letter is to handle a gap that was found in the proof of Theorem 2 in the paper "The generalized stochastic likelihood decoder: random coding and expurgated bounds."

• [cs.IT]Gradient Coding from Cyclic MDS Codes and Expander Graphs
Netanel Raviv, Itzhak Tamo, Rashish Tandon, Alexandros G. Dimakis
http://arxiv.org/abs/1707.03858v1

Gradient Descent, and its variants, are a popular method for solving empirical risk minimization problems in machine learning. However, if the size of the training set is large, a computational bottleneck is the computation of the gradient, and hence, it is common to distribute the training set among worker nodes. Doing this in a synchronous fashion faces yet another challenge of stragglers (i.e., slow or unavailable nodes) which might cause a considerable delay, and hence, schemes for mitigation of stragglers are essential. It was recently shown by Tandon et al. that stragglers can be avoided by carefully assigning redundant computations to the worker nodes and coding across partial gradients, and a randomized construction for the coding was given. In this paper we obtain a comparable deterministic scheme by employing cyclic MDS codes. In addition, we propose replacing the exact computation of the gradient with an approximate one; a technique which drastically increases the straggler tolerance, and stems from adjacency matrices of expander graphs.

• [cs.IT]MAC Resolvability: First And Second Order Results
Matthias Frey, Igor Bjelakovic, Slawomir Stanczak
http://arxiv.org/abs/1707.04156v1

Building upon previous work on the relation between secrecy and channel resolvability, we revisit a secrecy proof for the multiple-access channel from the perspective of resolvability. We then refine the approach in order to obtain some novel results on the second-order achievable rates.

• [cs.IT]Multi-Antenna Assisted Full-Duplex Relaying with Reliability-Aware Iterative Decoding
Jiancao Hou, Sandeep Narayanan, Yi Ma, Mohammad Shikh-Bahaei
http://arxiv.org/abs/1707.04202v1

In this paper, a multi-antenna assisted full-duplex (FD) relaying scheme with reliability-aware iterative decoding at the destination node is proposed to improve system spectral efficiency and reliability. This scheme allows the multi-antenna based FD relay node cancel the self-interference with QR-decomposition and forward its decoded signal to the destination node regardless the decoding errors. Then, by deploying our proposed reliability-aware iterative detection/decoding process, the multi-antenna destination node mitigates inter-frame interference and error propagation effect. Simulation results show that, without extra cost of time delay and signalling overhead, our proposed scheme outperforms the conventional selective decode-and-forward (S-DF) relaying schemes, such as cyclic redundancy check based S-DF relaying and threshold based S-DF relaying, by up to 8 dB in terms of bit-error-rate performances.

• [cs.IT]Robust Geometry-Based User Scheduling for Large MIMO Systems Under Realistic Channel Conditions
Manijeh Bashar, Alister G. Burr, Katsuyuki Haneda, Kanapathippillai Cumanan
http://arxiv.org/abs/1707.04088v1

The problem of user scheduling with reduced overhead of channel estimation in the uplink of Massive multiple-input multiple-output (MIMO) systems has been considered. A geometry-based stochastic channel model (GSCM), called the COST 2100 channel model has been used for realistic analysis of channels. In this paper, we propose a new user selection algorithm based on knowledge of the geometry of the service area and location of clusters, without having full channel state information (CSI) at the base station (BS). The multi-user link correlation in the GSCMs arises from the common clusters in the area. The throughput depends on the position of clusters in the GSCMs and users in the system. Simulation results show that although the BS does not require the channel information of all users, by the proposed geometry-based user scheduling algorithm the sum-rate of the system is only slightly less than the well-known greedy weight clique scheme. {Finally, the robustness of the proposed algorithm to the inaccuracy of cluster localization is verified by the simulation results.

• [cs.IT]Synchronization Strings: Channel Simulations and Interactive Coding for Insertions and Deletions
Bernhard Haeupler, Amirbehshad Shahrasbi, Ellen Vitercik
http://arxiv.org/abs/1707.04233v1

We present many new results related to reliable (interactive) communication over insertion-deletion channels. Synchronization errors, such as insertions and deletions, strictly generalize the usual symbol corruption errors and are much harder to protect against. We show how to hide the complications of synchronization errors in many applications by introducing very general channel simulations which efficiently transform an insertion-deletion channel into a regular symbol corruption channel with an error rate larger by a constant factor and a slightly smaller alphabet. We generalize synchronization string based methods which were recently introduced as a tool to design essentially optimal error correcting codes for insertion-deletion channels. Our channel simulations depend on the fact that, at the cost of increasing the error rate by a constant factor, synchronization strings can be decoded in a streaming manner that preserves linearity of time. We also provide a lower bound showing that this constant factor cannot be improved to $1+\epsilon$, in contrast to what is achievable for error correcting codes. Our channel simulations drastically generalize the applicability of synchronization strings. We provide new interactive coding schemes which simulate any interactive two-party protocol over an insertion-deletion channel. Our results improve over the interactive coding scheme of Braverman et al. which achieves a small constant rate and requires exponential time computations. We provide the first computationally efficient interactive coding schemes for synchronization errors, the first coding scheme with a rate approaching one for small noise rates, and also improve over the maximal tolerable error rate. We also show tight connections between synchronization strings and edit-distance tree codes which allow us to transfer results from tree codes directly to edit-distance tree codes.

• [cs.IT]Universal Sparse Superposition Codes with Spatial Coupling and GAMP Decoding
Jean Barbier, Mohamad Dia, Nicolas Macris
http://arxiv.org/abs/1707.04203v1

Sparse superposition codes, or sparse regression codes, constitute a new class of codes which was first introduced for communication over the additive white Gaussian noise (AWGN) channel. It has been shown that such codes are capacity-achieving over the AWGN channel under optimal maximum-likelihood decoding as well as under various efficient iterative decoding schemes equipped with power allocation or spatially coupled constructions. Here, we generalize the analysis of these codes to a much broader setting that includes all memoryless channels. We show, for a large class of memoryless channels, that spatial coupling allows an efficient decoder, based on the Generalized Approximate Message-Passing (GAMP) algorithm, to reach the potential (or Bayes optimal) threshold of the underlying (or uncoupled) code ensemble. Moreover, we argue that spatially coupled sparse superposition codes universally achieve capacity under GAMP decoding by showing that the error floor vanishes and the potential threshold tends to capacity as one of the code parameter goes to infinity. Furthermore, we provide a closed form formula for the algorithmic threshold of the underlying code ensemble in terms of a Fisher information. Relating an algorithmic threshold to a Fisher information has theoretical as well as practical importance. Our proof relies on the state evolution analysis and uses the potential method developed in the theory of low-density parity-check codes and compressed sensing.

• [cs.LG]Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting
Anders Oland, Aayush Bansal, Roger B. Dannenberg, Bhiksha Raj
http://arxiv.org/abs/1707.04199v1

In this work, we show that saturating output activation functions, such as the softmax, impede learning on a number of standard classification tasks. Moreover, we present results showing that the utility of softmax does not stem from the normalization, as some have speculated. In fact, the normalization makes things worse. Rather, the advantage is in the exponentiation of error gradients. This exponential gradient boosting is shown to speed up convergence and improve generalization. To this end, we demonstrate faster convergence and better performance on diverse classification tasks: image classification using CIFAR-10 and ImageNet, and semantic segmentation using PASCAL VOC 2012. In the latter case, using the state-of-the-art neural network architecture, the model converged 33% faster with our method (roughly two days of training less) than with the standard softmax activation, and with a slightly better performance to boot.

• [cs.LG]Distral: Robust Multitask Reinforcement Learning
Yee Whye Teh, Victor Bapst, Wojciech Marian Czarnecki, John Quan, James Kirkpatrick, Raia Hadsell, Nicolas Heess, Razvan Pascanu
http://arxiv.org/abs/1707.04175v1

Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (Distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a "distilled" policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning.

• [cs.LG]Estimating the unseen from multiple populations
Aditi Raghunathan, Greg Valiant, James Zou
http://arxiv.org/abs/1707.03854v1

Given samples from a distribution, how many new elements should we expect to find if we continue sampling this distribution? This is an important and actively studied problem, with many applications ranging from unseen species estimation to genomics. We generalize this extrapolation and related unseen estimation problems to the multiple population setting, where population $j$ has an unknown distribution $D_j$ from which we observe $n_j$ samples. We derive an optimal estimator for the total number of elements we expect to find among new samples across the populations. Surprisingly, we prove that our estimator's accuracy is independent of the number of populations. We also develop an efficient optimization algorithm to solve the more general problem of estimating multi-population frequency distributions. We validate our methods and theory through extensive experiments. Finally, on a real dataset of human genomes across multiple ancestries, we demonstrate how our approach for unseen estimation can enable cohort designs that can discover interesting mutations with greater efficiency.

• [cs.LG]Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models
Jonas Rauber, Wieland Brendel, Matthias Bethge
http://arxiv.org/abs/1707.04131v1

Even todays most advanced machine learning models are easily fooled by almost imperceptible perturbations of their inputs. Foolbox is a new Python package to generate such adversarial perturbations and to quantify and compare the robustness of machine learning models. It is build around the idea that the most comparable robustness measure is the minimum perturbation needed to craft an adversarial example. To this end, Foolbox provides reference implementations of most published adversarial attack methods alongside some new ones, all of which perform internal hyperparameter tuning to find the minimum adversarial perturbation. Additionally, Foolbox interfaces with most popular deep learning frameworks such as PyTorch, Keras, TensorFlow, Theano and MXNet, provides a straight forward way to add support for other frameworks and allows different adversarial criteria such as targeted misclassification and top-k misclassification as well as different distance measures. The code is licensed under the MIT license and is openly available at https://github.com/bethgelab/foolbox

• [cs.LG]On Measuring and Quantifying Performance: Error Rates, Surrogate Loss, and an Example in SSL
Marco Loog, Jesse H. Krijthe, Are C. Jensen
http://arxiv.org/abs/1707.04025v1

In various approaches to learning, notably in domain adaptation, active learning, learning under covariate shift, semi-supervised learning, learning with concept drift, and the like, one often wants to compare a baseline classifier to one or more advanced (or at least different) strategies. In this chapter, we basically argue that if such classifiers, in their respective training phases, optimize a so-called surrogate loss that it may also be valuable to compare the behavior of this loss on the test set, next to the regular classification error rate. It can provide us with an additional view on the classifiers' relative performances that error rates cannot capture. As an example, limited but convincing empirical results demonstrates that we may be able to find semi-supervised learning strategies that can guarantee performance improvements with increasing numbers of unlabeled data in terms of log-likelihood. In contrast, the latter may be impossible to guarantee for the classification error rate.

• [cs.LG]Stable Distribution Alignment Using the Dual of the Adversarial Distance
Ben Usman, Kate Saenko, Brian Kulis
http://arxiv.org/abs/1707.04046v1

Learning to align distributions by minimizing an adversarial distance between them has recently achieved impressive results. However, such models are difficult to optimize with gradient descent and they often do not converge without very careful parameter tuning and initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment. Our empirical results suggest that using the dual formulation for linear and kernelized discriminators results in a more stable convergence to a desirable solution. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem using a subset of MNIST and USPS. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.

• [cs.NE]Backpropagation in matrix notation
N. M. Mishachev
http://arxiv.org/abs/1707.02746v2

In this note we calculate the gradient of the network function in matrix notation.

• [cs.NE]Capacity, Fidelity, and Noise Tolerance of Associative Spatial-Temporal Memories Based on Memristive Neuromorphic Network
Dmitri Gavrilov, Dmitri Strukov, Konstantin K. Likharev
http://arxiv.org/abs/1707.03855v1

We have calculated the key characteristics of associative (content-addressable) spatial-temporal memories based on neuromorphic networks with restricted connectivity - "CrossNets". Such networks may be naturally implemented in nanoelectronic hardware using hybrid CMOS/memristor circuits, which may feature extremely high energy efficiency, approaching that of biological cortical circuits, at much higher operation speed. Our numerical simulations, in some cases confirmed by analytical calculations, have shown that the characteristics depend substantially on the method of information recording into the memory. Of the four methods we have explored, two look especially promising - one based on the quadratic programming, and the other one being a specific discrete version of the gradient descent. The latter method provides a slightly lower memory capacity (at the same fidelity) then the former one, but it allows local recording, which may be more readily implemented in nanoelectronic hardware. Most importantly, at the synchronous retrieval, both methods provide a capacity higher than that of the well-known Ternary Content-Addressable Memories with the same number of nonvolatile memory cells (e.g., memristors), though the input noise immunity of the CrossNet memories is somewhat lower.

• [cs.NI]Cost-Effective Cache Deployment in Mobile Heterogeneous Networks
Shan Zhang, Ning Zhang, Peng Yang, Xuemin Shen
http://arxiv.org/abs/1707.04179v1

This paper investigates one of the fundamental issues in cache-enabled heterogeneous networks (HetNets): how many cache instances should be deployed at different base stations, in order to provide guaranteed service in a cost-effective manner. Specifically, we consider two-tier HetNets with hierarchical caching, where the most popular files are cached at small cell base stations (SBSs) while the less popular ones are cached at macro base stations (MBSs). For a given network cache deployment budget, the cache sizes for MBSs and SBSs are optimized to maximize network capacity while satisfying the file transmission rate requirements. As cache sizes of MBSs and SBSs affect the traffic load distribution, inter-tier traffic steering is also employed for load balancing. Based on stochastic geometry analysis, the optimal cache sizes for MBSs and SBSs are obtained, which are threshold-based with respect to cache budget in the networks constrained by SBS backhauls. Simulation results are provided to evaluate the proposed schemes and demonstrate the applications in cost-effective network deployment.

• [cs.PL]Hot-Rodding the Browser Engine: Automatic Configuration of JavaScript Compilers
Chris Fawcett, Lars Kotthoff, Holger H. Hoos
http://arxiv.org/abs/1707.04245v1

Modern software systems in many application areas offer to the user a multitude of parameters, switches and other customisation hooks. Humans tend to have difficulties determining the best configurations for particular applications. Modern optimising compilers are an example of such software systems; their many parameters need to be tuned for optimal performance, but are often left at the default values for convenience. In this work, we automatically determine compiler parameter settings that result in optimised performance for particular applications. Specifically, we apply a state-of-the-art automated parameter configuration procedure based on cutting-edge machine learning and optimisation techniques to two prominent JavaScript compilers and demonstrate that significant performance improvements, more than 35% in some cases, can be achieved over the default parameter settings on a diverse set of benchmarks.

• [cs.SI]Distinguishing humans from computers in the game of go: a complex network approach
C. Coquidé, B. Georgeot, O. Giraud
http://arxiv.org/abs/1707.04044v1

We compare complex networks built from the game of go and obtained from databases of human-played games with those obtained from computer-played games. Our investigations show that statistical features of the human-based networks and the computer-based networks differ, and that these differences can be statistically significant on a relatively small number of games using specific estimators. We show that the deterministic or stochastic nature of the computer algorithm playing the game can also be distinguished from these quantities. This can be seen as tool to implement a Turing-like test for go simulators.

• [cs.SI]Human Sexual Cycles are Driven by Culture and Match Collective Moods
Ian B. Wood, Pedro Leal Varela, Johan Bollen, Luis M. Rocha, Joana Gonçalves-Sá
http://arxiv.org/abs/1707.03959v1

It is a long-standing question whether human sexual and reproductive cycles are affected predominantly by biology or culture. The literature is mixed with respect to whether biological or cultural factors best explain the reproduction cycle phenomenon, with biological explanations dominating the argument. The biological hypothesis proposes that human reproductive cycles are an adaptation to the seasonal cycles caused by hemisphere positioning, while the cultural hypothesis proposes that conception dates vary mostly due to cultural factors, such as vacation schedule or religious holidays. However, for many countries, common records used to investigate these hypotheses are incomplete or unavailable, biasing existing analysis towards primarily Christian countries in the Northern Hemisphere. Here we show that interest in sex peaks sharply online during major cultural and religious celebrations, regardless of hemisphere location. This online interest, when shifted by nine months, corresponds to documented human birth cycles, even after adjusting for numerous factors such as language, season, and amount of free time due to holidays. We further show that mood, measured independently on Twitter, contains distinct collective emotions associated with those cultural celebrations, and these collective moods correlate with sex search volume outside of these holidays as well. Our results provide converging evidence that the cyclic sexual and reproductive behavior of human populations is mostly driven by culture and that this interest in sex is associated with specific emotions, characteristic of, but not limited to, major cultural and religious celebrations.

• [cs.SI]UrbanFACET: Visually Profiling Cities from Mobile Device Recorded Movement Data of Millions of City Residents
Lei Shi, Tao Jiang, Ye Zhao, Xiatian Zhang, Yao Lu
http://arxiv.org/abs/1707.04210v1

Cities are living systems where urban infrastructures and their functions are defined and evolved due to population behaviors. Profiling the cities and functional regions has been an important topic in urban design and planning. This paper studies a unique big data set which includes daily movement data of tens of millions of city residents, and develop a visual analytics system, namely UrbanFACET, to discover and visualize the dynamical profiles of multiple cities and their residents. This big user movement data set, acquired from mobile users' agnostic check-ins at thousands of phone APPs, is well utilized in an integrative study and visualization together with urban structure (e.g., road network) and POI (Point of Interest) distributions. In particular, we novelly develop a set of information-theory based metrics to characterize the mobility patterns of city areas and groups of residents. These multifaceted metrics including Fluidity, vibrAncy, Commutation, divErsity, and densiTy (FACET) which categorize and manifest hidden urban functions and behaviors. UrbanFACET system further allows users to visually analyze and compare the metrics over different areas and cities in metropolitan scales. The system is evaluated through both case studies on several big and heavily populated cities, and user studies involving real-world users.

• [math.ST]Pretest and Stein-Type Estimations in Quantile Regression Model
Bahadır Yüzbaşı, Yasin Aşar, M. Şamil Şık
http://arxiv.org/abs/1707.03820v1

In this study, we consider preliminary test and shrinkage estimation strategies for quantile regression models. In classical Least Squares Estimation (LSE) method, the relationship between the explanatory and explained variables in the coordinate plane is estimated with a mean regression line. In order to use LSE, there are three main assumptions on the error terms showing white noise process of the regression model, also known as Gauss-Markov Assumptions, must be met: (1) The error terms have zero mean, (2) The variance of the error terms is constant and (3) The covariance between the errors is zero i.e., there is no autocorrelation. However, data in many areas, including econometrics, survival analysis and ecology, etc. does not provide these assumptions. First introduced by Koenker, quantile regression has been used to complement this deficiency of classical regression analysis and to improve the least square estimation. The aim of this study is to improve the performance of quantile regression estimators by using pre-test and shrinkage strategies. A Monte Carlo simulation study including a comparison with quantile $L_1$--type estimators such as Lasso, Ridge and Elastic Net are designed to evaluate the performances of the estimators. Two real data examples are given for illustrative purposes. Finally, we obtain the asymptotic results of suggested estimators

• [math.ST]Testing High-dimensional Covariance Matrices under the Elliptical Distribution and Beyond
Xinxin Yang, Xinghua Zheng, Jiaqi Chen, Hua Li
http://arxiv.org/abs/1707.04010v1

We study testing high-dimensional covariance matrices when data exhibit heteroskedasticity. The observations are modeled as $\mathbf Y_i=\omega_i\mathbf Z_i$, where $\mathbf Z_i$'s are i.i.d. $p$-dimensional random vectors with mean $\textbf{0}$ and covariance $\boldsymbol{\Sigma}$, and $\omega_i$'s are random scalars reflecting heteroskedasticity. The model is an extension of the elliptical distribution, and accommodates several stylized facts of real data including heteroskedasticity, heavy-tailedness, asymmetry, etc. We aim to test $H_{0}: \boldsymbol{\Sigma}\propto\boldsymbol{\Sigma}0$, in the high-dimensional setting where both the dimension $p$ and the sample size $n$ grow to infinity proportionally. We remove the heteroskedasticity by self-normalizing the observations, and establish a CLT for the linear spectral statistic (LSS) of $\widetilde{\mathbf S}n:=\frac{p}{n}\sum{i=1}^n \mathbf Y_i\mathbf Y_i^T/|\mathbf Y_i|^2=\frac{p}{n}\sum{i=1}^n \mathbf Z_i\mathbf Z_i^T/|\mathbf Z_i|^2$. The CLT is different from the existing ones for the LSS of the usual sample covariance matrix $\mathbf S_n:=\frac{1}{n}\sum_{i=1}^n \mathbf Z_i\mathbf Z_i^T $ (Bai and Silverstein[Ann. Probab. 32(2004) 553--605], Najim and Yao [Ann. Appl. Probab. 26(2016) 1837--1887]). Our tests based on the new CLT neither assume a specific parametric distribution nor involve the fourth moment of $\mathbf Z_i$. Numerical studies show that our tests work well even when $\mathbf Z_i$'s are heavy-tailed.

• [math.ST]Variable selection in multivariate linear models with high-dimensional covariance matrix estimation
Marie Perrot-Dockès, Céline Lévy-Leduc, Laure Sansonnet, Julien Chiquet
http://arxiv.org/abs/1707.04145v1

In this paper, we propose a novel variable selection approach in the framework of multivariate linear models taking into account the dependence that may exist between the responses. It consists in estimating beforehand the covariance matrix of the responses and to plug this estimator in a Lasso criterion, in order to obtain a sparse estimator of the coefficient matrix. The properties of our approach are investigated both from a theoretical and a numerical point of view. More precisely, we give general conditions that the estimators of the covariance matrix and its inverse have to satisfy in order to recover the positions of the null and non null entries of the coefficient matrix when the size of the covariance matrix is not fixed and can tend to infinity. We prove that these conditions are satisfied in the particular case of some Toeplitz matrices. Our approach is implemented in the R package MultiVarSel available from the Comprehensive R Archive Network (CRAN) and is very attractive since it benefits from a low computational load. We also assess the performance of our methodology using synthetic data and compare it with alternative approaches. Our numerical experiments show that including the estimation of the covariance matrix in the Lasso criterion dramatically improves the variable selection performance in many cases.

• [physics.soc-ph]Network dynamics of innovation processes
Iacopo Iacopini, Staša Milojević, Vito Latora
http://arxiv.org/abs/1707.04239v1

We introduce a model for the emergence of innovations, in which cognitive processes are described as random walks on the network of links among ideas or concepts, and an innovation corresponds to the first visit of a node. The transition matrix of the random walk depends on the network weights, while on turn the weight of an edge is reinforced by the passage of a walker. The presence of the network naturally accounts for the mechanism of the adjacent possible, and the model reproduces both the rate at which novelties emerge and the correlations among them observed empirically. We show this by using synthetic networks and by studying real data sets on the growth of knowledge in different scientific disciplines. Edge-reinforced random walks on complex topologies offer a new modeling framework for the dynamics of correlated novelties and are another example of co-evolution of processes and networks.

• [q-bio.QM]Modeling Hormesis Using a Non-Monotonic Copula Method
Farzaneh Ghasemi Tahrir
http://arxiv.org/abs/1707.04171v1

This paper presents a probabilistic method for capturing non-monotonic behavior under the biphasic dose-response regime observed in many biological systems experiencing different types of stress. The proposed method is based on the rolling-pin method introduced earlier to estimate highly nonlinear and non-monotonic joint probability distributions from continuous domain data. We show that the proposed method outperforms the conventional parametric methods in terms of the error (namely RMSE) and it needs fewer parameters to be estimated a priori, while offering high flexibility. The application and performance of the proposed method are shown through an example.

• [quant-ph]Tight uniform continuity bound for a family of entropies
Eric P. Hanson, Nilanjana Datta
http://arxiv.org/abs/1707.04249v1

We prove a tight uniform continuity bound for a family of entropies which includes the the von Neumann entropy, the Tsallis entropy and the $\alpha$-R'enyi entropy, $S_\alpha$, for $\alpha\in (0,1)$. We establish necessary and sufficient conditions for equality in the continuity bound and prove that these conditions are the same for every member of the family. Our result builds on recent work in which we constructed a state which was majorized by every state in a neighbourhood ($\varepsilon$-ball) of a given state, and thus was the minimal state in majorization order in the $\varepsilon$-ball. This minimal state satisfies a particular semigroup property, which we exploit to prove our bound.

• [stat.AP]A New High-Dimensional Time Series Approach for Wind Speed, Wind Direction and Air Pressure Forecasting
Daniel Ambach, Wolfgang Schmid
http://arxiv.org/abs/1707.03258v2

Many wind speed forecasting approaches have been proposed in literature. In this paper a new statistical approach for jointly predicting wind speed, wind direction and air pressure is introduced. The wind direction and the air pressure are important to extend the forecasting accuracy of wind speed forecasts. A good forecast for the wind direction helps to bring the turbine into the predominant wind direction. We combine a multivariate seasonal time varying threshold autoregressive model with interactions (TVARX) with a threshold seasonal autoregressive conditional heteroscedastic (TARCHX) model. The model includes periodicity, conditional heteroscedasticity, interactions of different dependent variables and a complex autoregressive structure with non-linear impacts. In contrast to ordinary likelihood estimation approaches, we apply a high-dimensional shrinkage technique instead of a distributional assumption for the dependent variables. The iteratively re-weighted least absolute shrinkage and selection operator (LASSO) method allows to capture conditional heteroscedasticity and a comparatively fast computing time. The proposed approach yields accurate predictions of wind speed, wind direction and air pressure for a short-term period. Prediction intervals up to twenty-four hours are presented.

• [stat.CO]ClustGeo: an R package for hierarchical clustering with spatial constraints
Marie Chavent, Vanessa Kuentz-Simonet, Amaury Labenne, Jérôme Saracco
http://arxiv.org/abs/1707.03897v1

In this paper, we propose a Ward-like hierarchical clustering algorithm including spatial/geographical constraints. Two dissimilarity matrices $D_0$ and $D_1$ are inputted, along with a mixing parameter $\alpha \in [0,1]$. The dissimilarities can be non euclidean and the weights of the observations can be non uniform. The first matrix gives the dissimilarities in the "feature space" and the second matrix gives the dissimilarities in the "constraint space". The criterion minimized at each stage is a convex combination of the homogeneity criterion calculated with $D_0$ and the homogeneity criterion calculated with $D_1$. The idea is then to determine a value of $\alpha$ which increases the spatial contiguity without deteriorating too much the quality of the solution based on the variables of interest i.e. those of the feature space. This procedure is illustrated on a real dataset using the R package ClustGeo.

• [stat.ME]Hypoelliptic diffusions: discretization, filtering and inference from complete and partial observations
Susanne Ditlevsen, Adeline Samson
http://arxiv.org/abs/1707.04235v1

The statistical problem of parameter estimation in partially observed hypoelliptic diffusion processes is naturally occurring in many applications. However, due to the noise structure, where the noise components of the different coordinates of the multi-dimensional process operate on different time scales, standard inference tools are ill conditioned. In this paper, we propose to use a higher order scheme to discretize the process and approximate the likelihood, such that the different time scales are appropriately accounted for. We show consistency and asymptotic normality with non-typical convergence rates. When only partial observations are available, we embed the approximation into a filtering algorithm for the unobserved coordinates, and use this as a building block in a Stochastic Approximation Expectation Maximization algorithm. We illustrate on simulated data from three models; the Harmonic Oscillator, the FitzHugh-Nagumo model used to model the membrane potential evolution in neuroscience, and the Synaptic Inhibition and Excitation model used for determination of neuronal synaptic input.

• [stat.ME]Inference under Missing Data Conditions in the Stochastic Block Model
Timothée Tabouy, Pierre Barbillon, Julien Chiquet
http://arxiv.org/abs/1707.04141v1

This paper deals with non-observed edges during the sampling of a network and consecutive issues in the Stochastic Block Model (SBM) inference. We start by defining missing data conditions with latent state spaces and thereby Missing At Random (MAR) and Not Missing At Random (NMAR) conditions when sampling a network generated under an SBM. We then include sampling design properties and knowledge about observed and non-observed edges in the inference: by means of variational approximations of the distributions of the missing and latent variables, we introduce several variants of the variational EM (VEM) algorithm for SBM that deal with various sampling designs (MAR and NMAR). Model selection criteria based on Integrated Classification Likelihood (ICL) are also derived for selecting both the number of blocks and the sampling design. We investigate the accuracy and the range of applicability of these algorithms with simulations. We finally explore two real-world networks from biology (protein-protein interaction network) and ethnology (seed circulation network), where the interpretations considerably change when the missingness is taken into account.

• [stat.ME]Iterative Updating of Model Error for Bayesian Inversion
Daniela Calvetti, Matthew M. Dunlop, Erkki Somersalo, Andrew M. Stuart
http://arxiv.org/abs/1707.04246v1

In computational inverse problems, it is common that a detailed and accurate forward model is approximated by a computationally less challenging substitute. The model reduction may be necessary to meet constraints in computing time when optimization algorithms are used to find a single estimate, or to speed up Markov chain Monte Carlo (MCMC) calculations in the Bayesian framework. The use of an approximate model introduces a discrepancy, or modeling error, that may have a detrimental effect on the solution of the ill-posed inverse problem, or it may severely distort the estimate of the posterior distribution. In the Bayesian paradigm, the modeling error can be considered as a random variable, and by using an estimate of the probability distribution of the unknown, one may estimate the probability distribution of the modeling error and incorporate it into the inversion. We introduce an algorithm which iterates this idea to update the distribution of the model error, leading to a sequence of posterior distributions that are demonstrated empirically to capture the underlying truth with increasing accuracy. Since the algorithm is not based on rejections, it requires only limited full model evaluations. We show analytically that, in the linear Gaussian case, the algorithm converges geometrically fast with respect to the number of iterations. For more general models, we introduce particle approximations of the iteratively generated sequence of distributions; we also prove that each element of the sequence converges in the large particle limit. We show numerically that, as in the linear case, rapid convergence occurs with respect to the number of iterations. Additionally, we show through computed examples that point estimates obtained from this iterative algorithm are superior to those obtained by neglecting the model error.

• [stat.ME]Quantifying and Estimating the Predictive Accuracy for Censored Time-to-Event Data with Competing Risks
Cai Wu, Liang Li
http://arxiv.org/abs/1707.03971v1

This paper focuses on quantifying and estimating the predictive accuracy of prognostic models for time-to-event outcomes with competing events. We consider the time-dependent discrimination and calibration metrics, including the receiver operating characteristics curve and the Brier score, in the context of competing risks. To address censoring, we propose a unified nonparametric estimation framework for both discrimination and calibration measures, by weighting the censored subjects with the conditional probability of the event of interest given the observed data. We demonstrate through simulations that the proposed estimator is unbiased, efficient and robust against model misspecification in comparison to other methods published in the literature. In addition, the proposed method can be extended to time-dependent predictive accuracy metrics constructed from a general class of loss functions. We apply the methodology to a data set from the African American Study of Kidney Disease and Hypertension to evaluate the predictive accuracy of a prognostic risk score in predicting end-stage renal disease (ESRD), accounting for the competing risk of pre-ESRD death.

• [stat.ME]Randomization-based Inference for Bernoulli-Trial Experiments and Implications for Observational Studies
Zach Branson, Marie-Abele Bind
http://arxiv.org/abs/1707.04136v1

We present a randomization-based inferential framework for experiments characterized by a strongly ignorable assignment mechanism where units have independent probabilities of receiving treatment. Previous works on randomization tests often assume these probabilities are equal within blocks of units. We consider the general case where they differ across units and show how to perform randomization tests and obtain point estimates and confidence intervals. Furthermore, we develop a rejection-sampling algorithm to conduct randomization-based inference conditional on ancillary statistics, covariate balance, or other statistics of interest. Through simulation we demonstrate how our algorithm can yield powerful randomization tests and thus precise inference. Our work also has implications for observational studies, which commonly assume a strongly ignorable assignment mechanism. Most methodologies for observational studies make additional modeling or asymptotic assumptions, while our framework only assumes the strongly ignorable assignment mechanism, and thus can be considered a minimal-assumption approach.

• [stat.ML]Automation of Feature Engineering for IoT Analytics
Snehasis Banerjee, Tanushyam Chattopadhyay, Arpan Pal, Utpal Garain
http://arxiv.org/abs/1707.04067v1

This paper presents an approach for automation of interpretable feature selection for Internet Of Things Analytics (IoTA) using machine learning (ML) techniques. Authors have conducted a survey over different people involved in different IoTA based application development tasks. The survey reveals that feature selection is the most time consuming and niche skill demanding part of the entire workflow. This paper shows how feature selection is successfully automated without sacrificing the decision making accuracy and thereby reducing the project completion time and cost of hiring expensive resources. Several pattern recognition principles and state of art (SoA) ML techniques are followed to design the overall approach for the proposed automation. Three data sets are considered to establish the proof-of-concept. Experimental results show that the proposed automation is able to reduce the time for feature selection to $2$ days instead of $4-6$ months which would have been required in absence of the automation. This reduction in time is achieved without any sacrifice in the accuracy of the decision making process. Proposed method is also compared against Multi Layer Perceptron (MLP) model as most of the state of the art works on IoTA uses MLP based Deep Learning. Moreover the feature selection method is compared against SoA feature reduction technique namely Principal Component Analysis (PCA) and its variants. The results obtained show that the proposed method is effective.

• [stat.ML]Deep Gaussian Embedding of Attributed Graphs: Unsupervised Inductive Learning via Ranking
Aleksandar Bojchevski, Stephan Günnemann
http://arxiv.org/abs/1707.03815v2

Methods that learn representations of graph nodes play a critical role in network analysis since they enable many downstream learning tasks. We propose Graph2Gauss - an approach that can efficiently learn versatile node embeddings on large scale (attributed) graphs that show strong performance on tasks such as link prediction and node classification. Unlike most approaches that represent nodes as (point) vectors in a lower-dimensional continuous space, we embed each node as a Gaussian distribution, allowing us to capture uncertainty about the representation. Furthermore, in contrast to previous approaches we propose a completely unsupervised method that is also able to handle inductive learning scenarios and is applicable to different types of graphs (plain, attributed, directed, undirected). By leveraging both the topological network structure and the associated node attributes, we are able to generalize to unseen nodes without additional training. To learn the embeddings we adopt a personalized ranking formulation w.r.t. the node distances that exploits the natural ordering between the nodes imposed by the network structure. Experiments on real world networks demonstrate the high performance of our approach, outperforming state-of-the-art network embedding methods on several different tasks.

• [stat.ML]Distributionally Robust Optimization Techniques in Batch Bayesian Optimization
Nikitas Rontsis, Michael A. Osborne, Paul J. Goulart
http://arxiv.org/abs/1707.04191v1

We propose a novel, theoretically-grounded, acquisition function for batch Bayesian optimisation informed by insights from distributionally robust optimization. Our acquisition function is a lower bound on the well-known Expected Improvement function - which requires a multi-dimensional Gaussian Expectation over a piecewise affine function - and is computed by evaluating instead the best-case expectation over all probability distributions consistent with the same mean and variance as the original Gaussian distribution. We show that, unlike alternative approaches including Expected Improvement, our proposed acquisition function avoids multi-dimensional integrations entirely, and can be calculated exactly as the solution of a convex optimization problem in the form of a tractable semidefinite program (SDP). Moreover, we prove that the solution of this SDP also yields exact numerical derivatives, which enable efficient optimisation of the acquisition function. Numerical results suggest that our acquisition function performs very similar to the computationally intractable exact Expected Improvement and considerably better than other heuristics.

• [stat.ML]Improving Sparsity in Kernel Adaptive Filters Using a Unit-Norm Dictionary
Felipe Tobar
http://arxiv.org/abs/1707.04236v1

Kernel adaptive filters, a class of adaptive nonlinear time-series models, are known by their ability to learn expressive autoregressive patterns from sequential data. However, for trivial monotonic signals, they struggle to perform accurate predictions and at the same time keep computational complexity within desired boundaries. This is because new observations are incorporated to the dictionary when they are far from what the algorithm has seen in the past. We propose a novel approach to kernel adaptive filtering that compares new observations against dictionary samples in terms of their unit-norm (normalised) versions, meaning that new observations that look like previous samples but have a different magnitude are not added to the dictionary. We achieve this by proposing the unit-norm Gaussian kernel and define a sparsification criterion for this novel kernel. This new methodology is validated on two real-world datasets against standard KAF in terms of the normalised mean square error and the dictionary size.

• [stat.ML]Influence of Resampling on Accuracy of Imbalanced Classification
Evgeny Burnaev, Pavel Erofeev, Artem Papanov
http://arxiv.org/abs/1707.03905v1

In many real-world binary classification tasks (e.g. detection of certain objects from images), an available dataset is imbalanced, i.e., it has much less representatives of a one class (a minor class), than of another. Generally, accurate prediction of the minor class is crucial but it's hard to achieve since there is not much information about the minor class. One approach to deal with this problem is to preliminarily resample the dataset, i.e., add new elements to the dataset or remove existing ones. Resampling can be done in various ways which raises the problem of choosing the most appropriate one. In this paper we experimentally investigate impact of resampling on classification accuracy, compare resampling methods and highlight key points and difficulties of resampling.

• [stat.ML]Kafnets: kernel-based non-parametric activation functions for neural networks
Simone Scardapane, Steven Van Vaerenbergh, Aurelio Uncini
http://arxiv.org/abs/1707.04035v1

Neural networks are generally built by interleaving (adaptable) linear layers with (fixed) nonlinear activation functions. To increase their flexibility, several authors have proposed methods for adapting the activation functions themselves, endowing them with varying degrees of flexibility. None of these approaches, however, have gained wide acceptance in practice, and research in this topic remains open. In this paper, we introduce a novel family of flexible activation functions that are based on an inexpensive kernel expansion at every neuron. Leveraging over several properties of kernel-based models, we propose multiple variations for designing and initializing these kernel activation functions (KAFs), including a multidimensional scheme allowing to nonlinearly combine information from different paths in the network. The resulting KAFs can approximate any mapping defined over a subset of the real line, either convex or nonconvex. Furthermore, they are smooth over their entire domain, linear in their parameters, and they can be regularized using any known scheme, including the use of $\ell_1$ penalties to enforce sparseness. To the best of our knowledge, no other known model satisfies all these properties simultaneously. In addition, we provide a relatively complete overview on alternative techniques for adapting the activation functions, which is currently lacking in the literature. A large set of experiments validates our proposal.

• [stat.ML]Large Scale Variable Fidelity Surrogate Modeling
Evgeny Burnaev, Alexey Zaytsev
http://arxiv.org/abs/1707.03916v1

Engineers widely use Gaussian process regression framework to construct surrogate models aimed to replace computationally expensive physical models while exploring design space. Thanks to Gaussian process properties we can use both samples generated by a high fidelity function (an expensive and accurate representation of a physical phenomenon) and a low fidelity function (a cheap and coarse approximation of the same physical phenomenon) while constructing a surrogate model. However, if samples sizes are more than few thousands of points, computational costs of the Gaussian process regression become prohibitive both in case of learning and in case of prediction calculation. We propose two approaches to circumvent this computational burden: one approach is based on the Nystr"om approximation of sample covariance matrices and another is based on an intelligent usage of a blackbox that can evaluate a~low fidelity function on the fly at any point of a design space. We examine performance of the proposed approaches using a number of artificial and real problems, including engineering optimization of a rotating disk shape.

• [stat.ML]Model Selection for Anomaly Detection
Evgeny Burnaev, Pavel Erofeev, Dmitry Smolyakov
http://arxiv.org/abs/1707.03909v1

Anomaly detection based on one-class classification algorithms is broadly used in many applied domains like image processing (e.g. detection of whether a patient is "cancerous" or "healthy" from mammography image), network intrusion detection, etc. Performance of an anomaly detection algorithm crucially depends on a kernel, used to measure similarity in a feature space. The standard approaches (e.g. cross-validation) for kernel selection, used in two-class classification problems, can not be used directly due to the specific nature of a data (absence of a second, abnormal, class data). In this paper we generalize several kernel selection methods from binary-class case to the case of one-class classification and perform extensive comparison of these approaches using both synthetic and real-world data.

今日学术视野(2017.07.15)

猜你喜欢

热点阅读