Main Conference Accepted Papers
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining. Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder. Unsupervised Commonsense Question Answering with Self-Talk. BERT-ATTACK: Adversarial Attack Against BERT Using BERT. Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT. Imitation Attacks and Defenses for Black-box Machine Translation Systems. Zero-Shot Cross-Lingual Transfer with Meta Learning. ALICE: Active Learning with Contrastive Natural Language Explanations. BLEU might be Guilty but References are not Innocent. How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking. Event Extraction by Answering (Almost) Natural Questions. Towards Medical Machine Reading Comprehension with Structural Knowledge and Plain Text. Structured Attention for Unsupervised Dialogue Structure Induction. Shallow-to-Deep Training for Neural Machine Translation. XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization. Augmented Natural Language for Generative Sequence Labeling. Be More with Less: Hypergraph Attention Networks for Inductive Text Classification. Double Graph Based Reasoning for Document-level Relation Extraction. Towards Persona-Based Empathetic Conversational Models. Facilitating the Communication of Politeness through Fine-Grained Paraphrasing. Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems. Retrofitting Structure-aware Transformer Language Model for End Tasks. Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(n^6) down to O(n^3). Incomplete Utterance Rewriting as Semantic Segmentation. Multi-turn Response Selection using Dialogue Dependency Relations. MedDialog: Large-scale Medical Dialogue Datasets. Connecting the Dots: Event Graph Schema Induction with Path Language Modeling. CSP:Code-Switching Pre-training for Neural Machine Translation. Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation. UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. Multi-document Summarization with Maximal Marginal Relevance-guided Reinforcement Learning. A Synset Relation-enhanced Framework with a Try-again Mechanism for Word Sense Disambiguation. Adaptive Attentional Network for Few-Shot Knowledge Graph Completion. Coreferential Reasoning Learning for Language Representation. The World is Not Binary: Learning to Rank with Grayscale Data for Dialogue Response Selection. Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning. Unsupervised Question Decomposition for Question Answering. Global-to-Local Neural Networks for Document-Level Relation Extraction. Near-imperceptible Neural Linguistic Steganography via Self-Adjusting Arithmetic Coding. F^2-Softmax: Diversifying Neural Text Generation via Frequency Factorized Softmax. Cold-start Active Learning through Self-supervised Language Modeling. DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset. Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias. VD-BERT: A Unified Vision and Dialog Transformer with BERT. Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training. Exploring Logically Dependent Multi-task Learning with Causal Inference. Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze. Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis. Relation-aware Graph Attention Networks with Relational Position Encodings for Emotion Recognition in Conversations. Counterfactual Generator: A Weakly-Supervised Method for Named Entity Recognition. Scene Restoring for Narrative Machine Reading Comprehension. Multi-modal Multi-label Emotion Detection with Modality and Label Dependence. Learn to Cross-lingual Transfer with Meta Graph Learning Across Heterogeneous Languages. Cross-lingual Spoken Language Understanding with Regularized Representation Alignment. Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness. Adversarial Self-Supervised Data-Free Distillation for Text Classification. Detecting Attackable Sentences in Arguments. Extracting Implicitly Asserted Propositions in Argumentation. Towards More Accurate Uncertainty Estimation In Text Classification. Non-Autoregressive Machine Translation with Latent Alignments. Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models. Q-learning with Language Model for Edit-based Unsupervised Summarization. MIME: MIMicking Emotions for Empathetic Response Generation. Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired Data. MedFilter: Improving Extraction of Task-relevant Utterances through Integration of Discourse Structure and Ontological Knowledge. Discontinuous Constituent Parsing as Sequence Labeling. Active Learning for BERT: An Empirical Study. A Matter of Framing: The Impact of Linguistic Formalism on Probing Results. MLSUM: The Multilingual Summarization Corpus. Language Model Prior for Low-Resource Neural Machine Translation. Information Seeking in the Spirit of Learning: A Dataset for Conversational Curiosity. Conditional Causal Relationships between Emotions and Causes in Texts. Attention Is All You Need for Chinese Word Segmentation. SRLGRN: Semantic Role Labeling Graph Reasoning Network. Interactive Refinement of Cross-Lingual Word Embeddings. The Grammar of Emergent Languages. SubjQA: A Dataset for Subjectivity and Review Comprehension. Exploring Semantic Capacity of Terms. Understanding Procedural Text using Interactive Entity Networks. Attention is Not Only a Weight: Analyzing Transformers with Vector Norms. Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation. Statistical Power and Translationese in Machine Translation Evaluation. XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning. Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos. Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News. Speakers Fill Lexical Semantic Gaps with Context. Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization. Where Are You? Localization from Embodied Dialog. Sparse Text Generation. Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction. DyERNIE: Dynamic Evolution of Riemannian Manifold Embeddings for Temporal Knowledge Graph Completion. Task-Completion Dialogue Policy Learning via Monte Carlo Tree Search with Dueling Network. Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning. Multi-view Story Characterization from Movie Plot Synopses and Reviews. Intrinsic Evaluation of Summarization Datasets. Dynamic Anticipation and Completion for Multi-Hop Reasoning over Sparse Knowledge Graph. Coarse-to-Fine Query Focused Multi-Document Summarization. Learning a Simple and Effective Model for Multi-turn Response Generation with Auxiliary Tasks. Towards Enhancing Faithfulness for Neural Machine Translation. VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles. Pre-training for Abstractive Document Summarization by Reinstating Source Text. Gradient-guided Unsupervised Lexically Constrained Text Generation. Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis. Transformer Based Multi-Source Domain Adaptation. UDapter: Language Adaptation for Truly Universal Dependency Parsing. Compositional Phrase Alignment and Beyond. Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses. AttnIO: Knowledge Graph Exploration with In-and-Out Attention Flow for Knowledge-Grounded Dialogue. On the Reliability and Validity of Detecting Approval of Political Actors in Tweets. Multi-Dimensional Gender Bias Classification. Modeling the Music Genre Perception across Language-Bound Cultures. Word class flexibility: A deep contextualized approach. Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications. Multilevel Text Alignment with Cross-Document Attention. Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation. COMET: A Neural Framework for MT Evaluation. Unsupervised Discovery of Implicit Gender Bias. Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic Parsing. Joint Estimation and Analysis of Risk Behavior Ratings in Movie Scripts. LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space. Writing Strategies for Science Communication: Data and Computational Analysis. Multilingual AMR-to-Text Generation. XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation. Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News. Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing. Uncertainty-Aware Semantic Augmentation for Neural Machine Translation. Can Automatic Post-Editing Improve NMT?. Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference. TeaForN: Teacher-Forcing with N-grams. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. Continuity of Topic, Interaction, and Query: Learning to Quote in Online Conversations. Acrostic Poem Generation. Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph. TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions. A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation. Information-Theoretic Probing with Minimum Description Length. Amalgamating Knowledge from Two Teachers for Task-oriented Dialogue System with Adversarial Training. Discourse Self-Attention for Discourse Element Identification in Argumentative Student Essays. BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance. A Diagnostic Study of Explainability Techniques for Text Classification. F1 is Not Enough! Models and Evaluation Towards User-Centered Explainable Question Answering. What Have We Achieved on Text Summarization?. Embedding Words in Non-Vector Space with Unsupervised Graph Learning. Slot Attention with Value Normalization for Multi-Domain Dialogue State Tracking. Enhancing Aspect Term Extraction with Soft Prototypes. Named Entity Recognition Only from Word Embeddings. Proﬁle Consistency Identiﬁcation for Open-domain Dialogue Agents. AnswerFact: Fact Checking in Product Question Answering. Knowledge-Grounded Dialogue Generation with Pre-trained Language Models. BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues. A Knowledge-Aware Sequence-to-Tree Network for Math Word Problem Solving. Knowledge Association with Hyperbolic Knowledge Graph Embeddings. UniConv: A Unified Conversational Neural Architecture for Multi-domain Task-oriented Dialogues. Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation. Context-Aware Answer Extraction in Question Answering. Few-Shot Learning for Opinion Summarization. STL-CQA: Structure-based Transformers with Localization and Encoding for Chart Question Answering. MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale. Local Additivity Based Data Augmentation for Semi-supervised NER. ToTTo: A Controlled Table-To-Text Generation Dataset. Asking without Telling: Exploring Latent Ontologies in Contextual Representations. Alignment-free Cross-lingual Semantic Role Labeling. SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery. Learning to Represent Image and Text with Denotation Graph. Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News Multi-Headline Generation. Iterative Domain-Repaired Back-Translation. GLUCOSE: GeneraLized and COntextualized Story Explanations. Grounded Adaptation for Zero-shot Executable Semantic Parsing. Joint Constrained Learning for Event-Event Relation Extraction. Multi-hop Inference for Question-driven Summarization. Chapter Captor: Text Segmentation in Novels. Grounded Compositional Outputs for Adaptive Language Modeling. What time is it? Temporal Analysis of Novels. GraphDialog: Integrating Graph Knowledge into End-to-End Task-Oriented Dialogue Systems. BERT-enhanced Relational Sentence Ordering Network. Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation. Learning from Context or Names? An Empirical Study on Neural Relation Extraction. Conversational Semantic Parsing. Neural Deepfake Detection with Factual Structure of Text. Towards Interpretable Reasoning over Paragraph Effects in Situation. TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue. Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models. Lifelong Language Knowledge Distillation. KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations. A Multi-Task Incremental Learning Framework with Category Name Embedding for Aspect-Category Sentiment Analysis. Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU. Masking as an Efficient Alternative to Finetuning for Pretrained Language Models. Uncertainty-Aware Label Reﬁnement for Sequence Labeling. An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing. Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space. Leveraging Declarative Knowledge in Text and First-Order Logic for Fine-Grained Propaganda Detection. Generating Radiology Reports via Memory-driven Transformer. Benchmarking Meaning Representations in Neural Semantic Parsing. Investigating Cross-Linguistic Adjective Ordering Tendencies with a Latent-Variable Model. Inducing Target-Specific Latent Structures for Aspect Sentiment Classification. CLIRMatrix: A massively large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval. Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding. NwQM: A neural quality assessment framework for Wikipedia. Zero-Shot Crosslingual Sentence Simplification. If beam search is the answer, what was the question?. X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset. Identifying Elements Essential for BERT’s Multilinguality. Visually Grounded Continual Learning of Compositional Phrases. Sequential Modelling of the Evolution of Word Representations for Semantic Change Detection. Debiasing knowledge graph embeddings. Simultaneous Machine Translation with Visual Context. Towards Debiasing NLU Models from Unknown Biases. Analogous Process Structure Induction for Sub-event Sequence Prediction. Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation. On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment. Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space. Experience Grounds Language. Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings. SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness. Recurrent Event Network: Autoregressive Structure Inferenceover Temporal Knowledge Graphs. KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation. Plug and Play Autoencoders for Conditional Text Generation. Surprisal Predicts Code-Switching in Chinese-English Bilingual Text. CoDEx: A Comprehensive Knowledge Graph Completion Benchmark. Unsupervised Parsing via Constituency Tests. An Imitation Game for Learning Semantic Parsers from User Interaction. Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models. A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT. Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning. Cross Copy Network for Dialogue Generation. Entities as Experts: Sparse Memory Access with Entity Supervision. What Can We Learn from Collective Human Opinions on Natural Language Inference Data?. SelfORE: Self-supervised Relational Feature Learning for Open Relation Extraction. SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge. Pareto Probing: Trading Off Accuracy for Complexity. Understanding the Difficulty of Training Transformers. Conversational Semantic Parsing for Dialog State Tracking. Dynamic Data Selection and Weighting for Iterative Back-Translation. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. SetConv: A New Approach for Learning from Imbalanced Data. Keep CALM and Explore: Language Models for Action Generation in Text-based Games. AmbigQA: Answering Ambiguous Open-domain Questions. Reformulating Unsupervised Style Transfer as Paraphrase Generation. CancerEmo: A Dataset for Fine-Grained Emotion Detection. GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems. Text Classification Using Label Names Only: A Language Model Self-Training Approach. Interpretation of NLP models through input marginalization. Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models. ETC: Encoding Long and Structured Inputs in Transformers. Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading. Hate-Speech and Offensive Language Detection in Roman Urdu. An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training. Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection. MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models. Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering. XL-AMR: Enabling Cross-Lingual AMR Parsing with Transfer Learning Techniques. Probing Task-Oriented Dialogue Representation from Language Models. Generating similes effortlessly like a Pro: A Style Transfer Approach for Simile Generation. Systematic Comparison of Neural Architectures and Training Approaches for Open Information Extraction. Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think!. Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding. Multi-task Learning for Multilingual Neural Machine Translation. PlotMachines: Outline-Conditioned Generation with Dynamic Plot State Tracking. Accurate Word Alignment Induction from Neural Machine Translation. Dialogue Response Ranking Training with Large-Scale Human Feedback Data. RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling. ENT-DESC: Entity Description Generation by Exploring Knowledge Graph. POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training. SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup. Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting. Coarse-to-Fine Pre-training for Named Entity Recognition. Point to the Expression: Solving Algebraic Word Problems using the Expression-Pointer Transformer Model. Incremental Event Detection via Knowledge Consolidation Networks. Graph Convolutions over Constituent Trees for Syntax-Aware Semantic Role Labeling. HIT: Nested Named Entity Recognition via Head-Tail Pair and Token Interaction. Convolution over Hierarchical Syntactic and Lexical Graphs for Aspect Level Sentiment Analysis. Token-level Adaptive Training for Neural Machine Translation. EXAMS: A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering. IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation. Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment. Look at the First Sentence: Position Bias in Question Answering. Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions. Multi-Unit Transformers for Neural Machine Translation. MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. AxCell: Automatic Extraction of Results from Machine Learning Papers. Latent Geographical Factors for Analyzing the Evolution of Dialects in Contact. Discriminatively-Tuned Generative Classifiers for Robust Natural Language Inference. On Extractive and Abstractive Neural Document Summarization with Transformer Language Models. Don't Read Too Much Into It: Adaptive Computation for Open-Domain Question Answering. MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems. Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!. Online Conversation Disentanglement with Pointer Networks. "What Do You Mean by That?" A Parser-Independent Interactive Approach for Enhancing Text-to-SQL. End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems. SLM: Learning a Discourse Language Representation with Sentence Unshuffling. Are All Good Word Vector Spaces Isomorphic?. Predicting Clinical Trial Results by Implicit Evidence Integration. New Protocols and Negative Results for Textual Entailment Data Collection. STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation. Character-level Representations Improve DRS-based Semantic Parsing Even in the Age of BERT. Explainable Clinical Decision Support from Text. Semantic Role Labeling as Syntactic Dependency Parsing. Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA. "I'd rather just go to bed": Understanding Indirect Answers. A Rigorous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land?. Question Directed Graph Attention Network for Numerical Reasoning over Text. Event Extraction as Machine Reading Comprehension. Revisiting Modularized Multilingual NMT to Meet Industrial Demands. Cold-Start and Interpretability: Turning Regular Expressions into Trainable Recurrent Neural Networks. Adversarial Semantic Collisions. De-Biased Court’s View Generation with Causality. Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification. Personal Information Leakage Detection in Conversations. MAVEN: A Massive General Domain Event Detection Dataset. Response Selection for Multi-Party Conversations with Dynamic Topic Tracking. PRover: Proof Generation for Interpretable Reasoning over Rules. Translation Artifacts in Cross-lingual Transfer Learning. Comparative Evaluation of Label-Agnostic Selection Bias in Multilingual Hate Speech Datasets. Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts. Automatic Extraction of Rules Governing Morphological Agreement. Improving Out-of-Scope Detection in Intent Classification by Using Embeddings of the Word Graph Space of the Classes. Textual Data Augmentation for Efficient Active Learning on Tiny Datasets. Variational Hierarchical Dialog Autoencoder for Dialog State Tracking Data Augmentation. PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction. TED-CDB: A Large-Scale Chinese Discourse Relation Dataset on TED Talks. When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models. Exploring and Predicting Transferability across NLP Tasks. BERT Knows Punta Cana is not just beautiful, it's gorgeous: Ranking Scalar Adjectives with Contextualised Representations. Improving AMR Parsing with Sequence-to-Sequence Pre-training. Interpreting Open-Domain Modifiers: Decomposition of Wikipedia Categories into Disambiguated Property-Value Pairs. Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks. Generating Fact Checking Briefs. Intrinsic Probing through Dimension Selection. Multi-Fact Correction in Abstractive Text Summarization. The Secret is in the Spectra: Predicting Cross-lingual Task Performance with Spectral Similarity Measures. APE: Argument Pair Extraction from Peer Review and Rebuttal via Multi-task Learning. doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset. Direct Segmentation Models for Streaming Speech Translation. An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels. COGS: A Compositional Generalization Challenge Based on Semantic Interpretation. H2KGAT: Hierarchical Hyperbolic Knowledge Graph Attention Network. Consistency of a Recurrent Language Model With Respect to Incomplete Decoding. MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision. Learning to Pronounce Chinese Without a Pronunciation Dictionary. Parallel Interactive Networks for Multi-Domain Dialogue State Generation. Bridging the Gap between Prior and Posterior Knowledge Selection for Knowledge-Grounded Dialogue Generation. When BERT Plays the Lottery, All Tickets Are Winning. Solving Historical Dictionary Codes with a Neural Language Model. Digital Voicing of Silent Speech. DORB: Dynamically Optimizing Multiple Rewards with Bandits. Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering. With Little Power Comes Great Responsibility. With More Contexts Comes Better Performance: Contextualized Sense Embeddings for All-Round Word Sense Disambiguation. Multi-Step Inference for Reasoning Over Paragraphs. What do Models Learn from Question Answering Datasets?. Multi-resolution Annotations for Emoji Prediction. The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions. LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool. Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems. META: Metadata-Empowered Weak Supervision for Text Classification. Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation. An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction. Learning a Cost-Effective Annotation Policy for Question Answering. Universal Natural Language Processing with Limited Annotations: Try Few-shot Textual Entailment as a Start. Generationary or "How We Went beyond Word Sense Inventories and Learned to Gloss''. Improving Grammatical Error Correction Models with Purpose-Built Adversarial Examples. Probing Pretrained Language Models for Lexical Semantics. Text Segmentation by Cross Segment Attention. Word Rotator's Distance. Centering-based Neural Coherence Modeling with Hierarchical Discourse Segments. ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning. Substance over Style: Document-Level Targeted Content Transfer. Improving Text Generation with Student-Forcing Optimal Transport. RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark. Message Passing for Hyper-Relational Knowledge Graphs. Human-centric dialog training via offline reinforcement learning. Analyzing Individual Neurons in Pre-trained Language Models. ConjNLI: Natural Language Inference Over Conjunctive Sentences. An Analysis of Natural Language Inference Benchmarks through the Lens of Negation. Fast semantic parsing with well-typedness guarantees. Online Back-Parsing for AMR-to-Text Generation. DagoBERT: Generating Derivational Morphology with a Pretrained Language Model. Knowledge-guided Open Attribute Value Extraction with Reinforcement Learning. Do sequence-to-sequence VAEs learn global features of sentences?. Dissecting Span Identification Tasks with Performance Prediction. Interview: Large-scale Modeling of Media Dialog with Discourse Patterns and Knowledge Grounding. Seq2Edits: Sequence Transduction Using Span-level Edit Operations. Compressive Summarization with Plausibility and Salience Modeling. DualTKB: A Dual Learning Bridge between Text and Knowledge Base. On Losses for Modern Language Models. OCR Post Correction for Endangered Language Texts. From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers. Evaluating the Factual Consistency of Abstractive Text Summarization. Weakly Supervised Subevent Knowledge Acquisition. PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation. QADiscourse - Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines. VolTAGE: Volatility Forecasting via Text Audio Fusion with Graph Convolution Networks for Earnings Calls. PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge. Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning. Affective Event Classification with Discourse-enhanced Self-training. Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank. Condolence and Empathy in Online Communities. Hierarchical Evidence Set Modeling for Automated Fact Extraction and Verification. Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning. Which *BERT? A Survey Organizing Contextualized Encoders. An Empirical Study of Generation Order for Machine Translation. Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers. Training Question Answering Models From Synthetic Data. Dense Passage Retrieval for Open-Domain Question Answering. Template Guided Text Generation for Task-Oriented Dialogue. Compositional Demographic Word Embeddings. Biomedical Event Extraction as Sequence Labeling. INSPIRED: Toward Sociable Recommendation Dialog Systems. X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models. Autoregressive Knowledge Distillation through Imitation Learning. Cross-Thought for Sentence Encoder Pre-training. An Exploration of Arbitrary-Order Sequence Labeling via Energy-Based Inference Networks. Self-Supervised Knowledge Triplet Learning for Zero-Shot Question Answering. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU. An Embedding Model for Estimating Legislative Preferences from the Frequency and Sentiment of Tweets. Distilling Multiple Domains for Neural Machine Translation. Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation. Tackling the Low-resource Challenge for Canonical Segmentation. Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis. On the Ability and Limitations of Transformers to Recognize Formal Languages. Visually Grounded Compound PCFGs. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics. A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support. Zero-Shot Stance Detection: A Dataset and Model using Generalized Topic Representations. Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations. Program Enhanced Fact Verification with Verbalization and Graph Attention Network. DAGA: Data Augmentation with a Generation Approach forLow-resource Tagging Tasks. CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs. Neural Topic Modeling with Cycle-Consistent Adversarial Training. Hierarchical Graph Network for Multi-hop Question Answering. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. Data and Representation for Turkish Natural Language Inference. Routing Enforced Generative Model for Recipe Generation. TernaryBERT: Distillation-aware Ultra-low Bit BERT. We Can Detect Your Bias: Predicting the Political Ideology of News Articles. Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks. Lightweight, Dynamic Graph Convolutional Networks for AMR-to-Text Generation. Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning. Evaluating and Characterizing Human Rationales. Ensemble Distillation for Structured Prediction: Calibrated, Accurate, Fast---Choose Three. Efficient Meta Lifelong-Learning with Limited Memory. What is More Likely to Happen Next? Video-and-Language Future Event Prediction. IGT2P: From Interlinear Glossed Texts to Paradigms. TESA: A Task in Entity Semantic Aggregation for Abstractive Summarization. Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models. HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training. MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics. Learning VAE-LDA Models with Rounded Reparameterization Trick. Disentangle-based Continual Graph Representation Learning. Knowledge Graph Alignment with Entity-Pair Embedding. Improving Word Sense Disambiguation with Translations. Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually). Adversarial Attack and Defense of Structured Prediction Models. A Spectral Method for Unsupervised Multi-Document Summarization. T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack. Controllable Meaning Representation to Text Generation: Linearization and Data Augmentation Strategies. Do ``Undocumented Workers" == ``Illegal Aliens"? Differentiating Denotation and Connotation in Vector Spaces. Annotating Temporal Dependency Graphs via Crowdsourcing. A Visually-grounded First-person Dialogue Dataset with Verbal and Non-verbal Responses. Improving Neural Topic Models using Knowledge Distillation. FedED: Federated Learning via Ensemble Distillation for Medical Relation Extraction. Entity Enhanced BERT Pre-training for Chinese NER. FIND: Human-in-the-Loop Debugging Deep Text Classifiers. Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data. Fact or Fiction: Verifying Scientific Claims. Domain Knowledge Empowered Structured Neural Net for End-to-End Event Temporal Relation Extraction. IIRC: A Dataset of Incomplete Information Reading Comprehension Questions. Counterfactual Off-Policy Training for Neural Dialogue Generation. Content Planning for Neural Story Generation with Aristotelian Rescoring. Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition. PyMT5: multi-mode translation of natural language and Python code with transformers. SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning. CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French. Less is More: Attention Supervision with Counterfactuals for Text Classification. Unsupervised Stance Detection for Arguments from Consequences. Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task. Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning. MODE-LSTM: A Parameter-efficient Recurrent Network with Multi-Scale for Sentence Classification. Pre-training Entity Relation Encoder with Intra-span and Inter-span Information. Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training. Modeling Protagonist Emotions for Emotion-Aware Storytelling. A Dataset for Tracking Entities in Open Domain Procedural Text. Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning. Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios. Natural Language Processing for Achieving Sustainable Development: the Case of Neural Labelling to Enhance Community Profiling. PathQG: Neural Question Generation from Facts. A Method for Building a Commonsense Inference Dataset based on Basic Events. Diversiﬁed Multiple Instance Learning for Document-Level Multi-Aspect Sentiment Classiﬁcation. Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation. Help! Need Advice on Identifying Advice. Keeping Up Appearances: Computational Modeling of Face Acts in Persuasion Oriented Discussions. Structured Pruning of Large Language Models. TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion. Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings. Methods for Numeracy-Preserving Word Embeddings. Friendly Topic Assistant for Transformer Based Abstractive Summarization. Word Frequency Does Not Predict Grammatical Knowledge in Language Models. HABERTOR: An Efficient and Effective Deep Hatespeech Detector. Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product. Unsupervised Adaptation of Question Answering Systems via Generative Self-training. What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding. Explainable Automated Fact-Checking for Public Health Claims. Regularizing Dialogue Generation by Imitating Implicit Scenarios. Generating Dialogue Responses from a Semantic Latent Space. MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering. Partially-Aligned Data-to-Text Generation with Distant Supervision. Multi-Instance Multi-Label Learning Networks for Aspect-Category Sentiment Analysis. Inquisitive Question Generation for High Level Text Comprehension. Semi-Supervised Bilingual Lexicon Induction with Two-way Interaction. Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference. Sparsity Makes Sense: Word Sense Disambiguation Using Sparse Contextualized Word Representations. Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses. Learning Adaptive Segmentation Policy for Simultaneous Translation. Unsupervised Natural Language Inference via Decoupled Multimodal Contrastive Learning. SLURP: A Spoken Language Understanding Resource Package. Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders. Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning. Assessing Phrasal Representation and Composition in Transformers. Position-Aware Tagging for Aspect Sentiment Triplet Extraction. Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders. Causal Inference of Script Knowledge. VCDM: Leveraging Variational Bi-encoding and Deep Contextualized Word Representations for Improved Definition Modeling. A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning. Wasserstein Distance Regularized Sequence Representation for Text Matching in Asymmetrical Domains. CHARM: Inferring Personal Attributes from Conversations. Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent. Training for Gibbs Sampling on Conditional Random Fields with Neural Scoring Factors. Quantifying Intimacy in Language. Sentiment Analysis of Tweets using Heterogeneous Multi-layer Network Representation and Embedding. A Bilingual Generative Transformer for Semantic Sentence Embedding. Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision. ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization. Introducing Syntactic Structures into Target Opinion Word Extraction with Deep Learning. Better Highlighting: Creating Sub-Sentence Summary Highlights. Coding Textual Inputs Boosts the Accuracy of Neural Networks. Contrastive Distillation on Intermediate Representations for Language Model Compression. Stepwise Extractive Summarization and Planning with Structured Transformers. Blank Language Models. Selection and Generation: Learning towards Multi-Product Advertisement Post Generation. Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models. Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network. Generating Diverse Translation from Model Distribution with Dropout. Identifying Exaggerated Language. Deep Weighted MaxSAT for Aspect-based Opinion Extraction. Learning from Task Descriptions. PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation. An Empirical Investigation of Contextualized Number Prediction. Form2Seq : A Framework for Higher-Order Form Structure Extraction. A Predicate-Function-Argument Annotation of Natural Language for Open-Domain Information eXpression. AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data. Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations. EmoTag1200 👍: Understanding the Association between Emojis 😄 and Emotions 😻. Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection. Modularized Transfomer-based Ranking Framework. Analyzing Redundancy in Pretrained Transformer Models. Semantic Evaluation for Text-to-SQL with Distilled Test Suites. RNNs can generate bounded hierarchical languages with optimal memory. Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments. Re-evaluating Evaluation in Text Summarization. Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions. Multi-Stage Pre-training for Automated Chinese Essay Scoring. To Schedule or not to Schedule: Extracting Task Specific Temporal Entities and Associated Negation Constraints. Unified Feature and Instance Based Domain Adaptation for Aspect-Based Sentiment Analysis. MovieChats: Chat like Humans in a Closed Domain. Conundrums in Entity Coreference Resolution: Making Sense of the State of the Art. End-to-End Emotion-Cause Pair Extraction based on Sliding Window Multi-Label Learning. Localizing Open-Ontology QA Semantic Parsers in a Day Using Machine Translation. An Unsupervised Sentence Embedding Method by Mutual Information Maximization. Temporal Knowledge Base Completion: New Algorithms and Evaluation Protocols. On the Sentence Embeddings from Pre-trained Language Models. Weakly Supervised Learning of Nuanced Frames for Analyzing Polarization in News Media. OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction. Interpretable Multi-dataset Evaluation for Named Entity Recognition. A Knowledge-driven Generative Model for Multi-implication Chinese Medical Procedure Entity Normalization. RethinkCWS: Is Chinese Word Segmentation a Solved Task?. Few-Shot Complex Knowledge Base Question Answering via Meta Reinforcement Learning. Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information.
Self-Paced Learning for Neural Machine Translation. ``You are grounded!'': Latent Name Artifacts in Pre-trained Language Models. The role of context in neural pitch accent detection in English. How Much Knowledge Can You Pack Into the Parameters of a Language Model?. Small but Mighty: New Benchmarks for Split and Rephrase. Learning Physical Common Sense as Knowledge Graph Completion via BERT Data Augmentation and Constrained Tucker Factorization. Learning Structured Representations of Entity Names using ActiveLearning and Weak Supervision. Semantically Inspired AMR Alignment for the Portuguese Language. Reactive Supervision: A New Method for Collecting Sarcasm Data. BAE: BERT-based Adversarial Examples for Text Classification. A Simple Approach to Learning Unsupervised Multilingual Embeddings. Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube. Some Languages Seem Easier to Parse Because Their Treebanks Leak. A Simple and Effective Model for Answering Multi-span Questions. Detecting Independent Pronoun Bias with Partially-Synthetic Data Generation. The Thieves on Sesame Street are Polyglots - Extracting Multilingual Models from Monolingual APIs. Distilling Structured Knowledge for Text-Based Relational Reasoning. Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles. Reasoning about Goals, Steps, and Temporal Ordering with WikiHow. An information theoretic view on selecting linguistic probes. Exploiting Sentence Order in Document Alignment. Semantic Role Labeling Guided Multi-turn Dialogue ReWriter. Adversarial Semantic Decoupling for Recognizing Open-Vocabulary Slots. Bootstrapped Q-learning with Context Relevant Observation Pruning to Generalize in Text-based Games. Is Graph Structure Necessary for Multi-hop Question Answering?. Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT. Simulated Multiple Reference Training Improves Low-Resource Machine Translation. Don't Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings. Severing the Edge Between Before and After: Neural Architectures for Temporal Ordering of Events. The Multilingual Amazon Reviews Corpus. Sound Natural: Content Rephrasing in Dialog Systems. Modularized Syntactic Neural Networks for Sentence Classification. Incremental Neural Coreference Resolution in Constant Memory. The importance of fillers for text representations of speech transcripts. Table Fact Verification with Structure-Aware Transformer. Public Sentiment Drift Analysis Based on Hierarchical Variational Auto-encoder. Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering. PatchBERT: Just-in-Time, Out-of-Vocabulary Patching. Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems. An Element-aware Multi-representation Model for Law Article Prediction. Keep it Surprisingly Simple: A Simple First Order Graph Based Parsing Model for Joint Morphosyntactic Parsing in Sanskrit. Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages. Pretrained Language Model Embryology: The Birth of ALBERT. Train No Evil: Selective Masking for Task-Guided Pre-Training. Deconstructing word embedding algorithms. On the importance of pre-training data volume for compact language models. Sequence-Level Mixed Sample Data Augmentation. Iterative Feature Mining for Constraint-Based Data Collection to Increase Data Diversity and Model Robustness. Conversational Document Prediction to Assist Customer Care Agents. Towards Better Context-aware Lexical Semantics:Adjusting Contextualized Representations through Static Anchors. Improving Low Compute Language Modeling with In-Domain Embedding Initialisation. Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning. Utility is in the Eye of the User: A Critique of NLP Leaderboards. Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers. Text Graph Transformer for Document Classification. Structure Aware Negative Sampling in Knowledge Graphs. Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks. Denoising Relation Extraction from Document-level Distant Supervision. Exploring Contextualized Neural Language Models for Temporal Dependency Parsing. BioMegatron: Larger Biomedical Domain Language Model. Generating Label Cohesive and Well-Formed Adversarial Claims. AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network. Revealing the Myth of Higher-Order Inference in Coreference Resolution. Parsing Gapping Constructions Based on Grammatical and Semantic Roles. Multi-label Few/Zero-shot Learning with Knowledge Aggregated from Multiple Label Graphs. Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble. Exploring the Role of Argument Structure in Online Debate Persuasion. Task-oriented Domain-specific Meta-Embedding for Text Classification. Unsupervised Text Style Transfer with Padded Masked Language Models. Suicidal Risk Detection for Military Personnel. Towards Modeling Revision Requirements in wikiHow Instructions. Don't Neglect the Obvious: On the Role of Unambiguous Words in Word Sense Disambiguation. Please Mind the Root: Decoding Arborescences for Dependency Parsing. Beyond [CLS] through Ranking by Generation. Modeling Content Importance for Summarization with Pre-trained Language Models. On the Sparsity of Neural Machine Translation Models. Multilingual Offensive Language Identification with Cross-lingual Embeddings. Translation Quality Estimation by Jointly Learning to Score and Rank. Multi-Stage Pre-training for Low-Resource Domain Adaptation.