papers-I-read I am trying a new initiative - a-paper-a-week. This repository will hold all those papers and related summaries and notes. List of papers Toolformer - Language Models Can Teach Themselves to Use Tools Hints for Computer System Design Synthesized Policies for Transfer and Adaptation across Tasks and Environments Deep Neural Networks for YouTube Recommendations The Tail at Scale Practical Lessons from Predicting Clicks on Ads at Facebook Ad Click Prediction - a View from the Trenches Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics When Do Curricula Work? Continual learning with hypernetworks Zero-shot Learning by Generating Task-specific Adapters HyperNetworks Energy-based Models for Continual Learning GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism Compositional Explanations of Neurons Design patterns for container-based distributed systems Cassandra - a decentralized structured storage system CAP twelve years later - How the rules have changed Consistency Tradeoffs in Modern Distributed Database System Design Exploring Simple Siamese Representation Learning Data Management for Internet-Scale Single-Sign-On Searching for Build Debt - Experiences Managing Technical Debt at Google One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL Learning Explanations That Are Hard To Vary Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting A Foliated View of Transfer Learning Harvest, Yield, and Scalable Tolerant Systems MONet - Unsupervised Scene Decomposition and Representation Revisiting Fundamentals of Experience Replay Deep Reinforcement Learning and the Deadly Triad Alpha Net: Adaptation with Composition in Classifier Space Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Gradient Surgery for Multi-Task Learning GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks TaskNorm: Rethinking Batch Normalization for Meta-Learning Averaging Weights leads to Wider Optima and Better Generalization Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions When to use parametric models in reinforcement learning? Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning On the Difficulty of Warm-Starting Neural Network Training Supervised Contrastive Learning CURL - Contrastive Unsupervised Representations for Reinforcement Learning Competitive Training of Mixtures of Independent Deep Generative Models What Does Classifying More Than 10,000 Image Categories Tell Us? mixup - Beyond Empirical Risk Minimization ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators Gradient based sample selection for online continual learning Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges Observational Overfitting in Reinforcement Learning Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour Superposition of many models into one Towards a Unified Theory of State Abstraction for MDPs ALBERT - A Lite BERT for Self-supervised Learning of Language Representations Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model Contrastive Learning of Structured World Models Gossip based Actor-Learner Architectures for Deep RL How to train your MAML PHYRE - A New Benchmark for Physical Reasoning Large Memory Layers with Product Keys Abductive Commonsense Reasoning Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models Assessing Generalization in Deep Reinforcement Learning Quantifying Generalization in Reinforcement Learning Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks Measuring abstract reasoning in neural networks Hamiltonian Neural Networks Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Meta-Reinforcement Learning of Structured Exploration Strategies Relational Reinforcement Learning Good-Enough Compositional Data Augmentation Multiple Model-Based Reinforcement Learning Towards a natural benchmark for continual learning Meta-Learning Update Rules for Unsupervised Representation Learning GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks Model Primitive Hierarchical Lifelong Reinforcement Learning TuckER - Tensor Factorization for Knowledge Graph Completion Linguistic Knowledge as Memory for Recurrent Neural Networks Diversity is All You Need - Learning Skills without a Reward Function Modular meta-learning Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies Efficient Lifelong Learningi with A-GEM Pre-training Graph Neural Networks with Kernels Smooth Loss Functions for Deep Top-k Classification Hindsight Experience Replay Representation Tradeoffs for Hyperbolic Embeddings Learned Optimizers that Scale and Generalize One-shot Learning with Memory-Augmented Neural Networks BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop Poincare Embeddings for Learning Hierarchical Representations When Recurrent Models Don’t Need To Be Recurrent HoME - a Household Multimodal Environment Emergence of Grounded Compositional Language in Multi-Agent Populations A Semantic Loss Function for Deep Learning with Symbolic Knowledge Hierarchical Graph Representation Learning with Differentiable Pooling Imagination-Augmented Agents for Deep Reinforcement Learning Kronecker Recurrent Units Learning Independent Causal Mechanisms Memory-based Parameter Adaptation Born Again Neural Networks Net2Net-Accelerating Learning via Knowledge Transfer Learning to Count Objects in Natural Images for Visual Question Answering Neural Message Passing for Quantum Chemistry Unsupervised Learning by Predicting Noise The Lottery Ticket Hypothesis - Training Pruned Neural Networks Cyclical Learning Rates for Training Neural Networks Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks Learning an SAT Solver from Single-Bit Supervision Neural Relational Inference for Interacting Systems Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks Get To The Point: Summarization with Pointer-Generator Networks StarSpace - Embed All The Things! Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory Exploring Models and Data for Image Question Answering How transferable are features in deep neural networks Distilling the Knowledge in a Neural Network Revisiting Semi-Supervised Learning with Graph Embeddings Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension Higher-order organization of complex networks Network Motifs - Simple Building Blocks of Complex Networks Word Representations via Gaussian Embedding HARP - Hierarchical Representation Learning for Networks Swish - a Self-Gated Activation Function Reading Wikipedia to Answer Open-Domain Questions Task-Oriented Query Reformulation with Reinforcement Learning Refining Source Representations with Relation Networks for Neural Machine Translation Pointer Networks Learning to Compute Word Embeddings On the Fly R-NET - Machine Reading Comprehension with Self-matching Networks ReasoNet - Learning to Stop Reading in Machine Comprehension Principled Detection of Out-of-Distribution Examples in Neural Networks Ask Me Anything: Dynamic Memory Networks for Natural Language Processing One Model To Learn Them All Two/Too Simple Adaptations of Word2Vec for Syntax Problems A Decomposable Attention Model for Natural Language Inference A Fast and Accurate Dependency Parser using Neural Networks Neural Module Networks Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Conditional Similarity Networks Simple Baseline for Visual Question Answering VQA: Visual Question Answering Learning to Generate Reviews and Discovering Sentiment Seeing the Arrow of Time End-to-end optimization of goal-driven and visually grounded dialogue systems GuessWhat?! Visual object discovery through multi-modal dialogue Semantic Parsing via Paraphrasing Traversing Knowledge Graphs in Vector Space PPDB: The Paraphrase Database NewsQA: A Machine Comprehension Dataset A Persona-Based Neural Conversation Model “Why Should I Trust You?” Explaining the Predictions of Any Classifier Conditional Generative Adversarial Nets Addressing the Rare Word Problem in Neural Machine Translation Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Improving Word Representations via Global Context and Multiple Word Prototypes Learning Phrase Representations using RNN Encoder?Decoder for Statistical Machine Translation Skip-Thought Vectors Deep Convolutional Generative Adversarial Nets Generative Adversarial Nets A Roadmap towards Machine Intelligence Smart Reply: Automated Response Suggestion for Email Convolutional Neural Network For Sentence Classification Conditional Image Generation with PixelCNN Decoders Pixel Recurrent Neural Networks Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Bag of Tricks for Efficient Text Classification GloVe: Global Vectors for Word Representation SimRank: A Measure of Structural-Context Similarity How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge WikiReading : A Novel Large-scale Language Understanding Task over Wikipedia WikiQA: A challenge dataset for open-domain question answering Teaching Machines to Read and Comprehend Evaluating Prerequisite Qualities for Learning End-to-end Dialog Systems Recurrent Neural Network Regularization Deep Math: Deep Sequence Models for Premise Selection A Neural Conversational Model Key-Value Memory Networks for Directly Reading Documents Advances In Optimizing Recurrent Networks Query Regression Networks for Machine Comprehension Sequence to Sequence Learning with Neural Networks The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training Question Answering with Subgraph Embeddings Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks Visualizing Large-scale and High-dimensional Data Visualizing Data using t-SNE Curriculum Learning End-To-End Memory Networks Memory Networks Learning To Execute Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud Large Scale Distributed Deep Networks Efficient Estimation of Word Representations in Vector Space Regularization and variable selection via the elastic net Fractional Max-Pooling TAO: Facebook’s Distributed Data Store for the Social Graph Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift The Unified Logging Infrastructure for Data Analytics at Twitter A Few Useful Things to Know about Machine Learning Hive ? A Petabyte Scale Data Warehouse Using Hadoop Kafka: a Distributed Messaging System for Log Processing Power-law distributions in Empirical data Pregel: A System for Large-Scale Graph Processing GraphX: Unifying Data-Parallel and Graph-Parallel Analytics Pig Latin: A Not-So-Foreign Language for Data Processing Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing MapReduce: Simplified Data Processing on Large Clusters BigTable: A Distributed Storage System for Structured Data Spark SQL: Relational Data Processing in Spark Spark: Cluster Computing with Working Sets Fast Data in the Era of Big Data: Twitter’s Real-Time Related Query Suggestion Architecture Scaling Memcache at Facebook Dynamo: Amazon’s Highly Available Key-value Store f4 : Facebook's Warm BLOB Storage System A Theoretician’s Guide to the Experimental Analysis of Algorithms Cuckoo Hashing Never Ending Learning