{"id":228,"date":"2025-03-12T08:49:23","date_gmt":"2025-03-12T08:49:23","guid":{"rendered":"https:\/\/cilc2025.usal.es\/?page_id=228"},"modified":"2025-05-02T07:26:21","modified_gmt":"2025-05-02T07:26:21","slug":"plenarios","status":"publish","type":"page","link":"https:\/\/cilc2025.usal.es\/en\/plenarios\/","title":{"rendered":"Keynote Speakers"},"content":{"rendered":"\n<div class=\"panel-group kt-accordion\" id=\"accordionname338\"><div class=\"panel panel-default panel-even\"><div class=\"panel-heading\"><a class=\"accordion-toggle collapsed\" data-toggle=\"collapse\" data-parent=\"#accordionname338\" href=\"#collapse3380\"><h5><i class=\"icon-minus kt-icon-minus primary-color\"><\/i><i class=\"icon-plus kt-icon-plus\"><\/i>Marc Alexander and James Balfour<\/h5><\/a><\/div><div id=\"collapse3380\" class=\"panel-collapse collapse \"><div class=\"panel-body postclass\">\n<p>&nbsp;<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-236\" src=\"https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Alexander-Balfour-1024x526.png\" alt=\"\" width=\"682\" height=\"350\" srcset=\"https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Alexander-Balfour-1024x526.png 1024w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Alexander-Balfour-300x154.png 300w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Alexander-Balfour-768x394.png 768w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Alexander-Balfour-1536x788.png 1536w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Alexander-Balfour.png 1623w\" sizes=\"auto, (max-width: 682px) 100vw, 682px\" \/><\/p>\n<p style=\"text-align: center;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif; font-size: 12pt;\"><strong>\u2018up yours at the end of life\u2019: Opposition, Emotion, and Overlap in a Corpus of Scottish Debates on\u00a0Assisted\u00a0Dying<\/strong><\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif; font-size: 10pt;\">Marc Alexander and James Balfour<\/span><br \/>\n<span style=\"font-family: 'trebuchet ms', geneva, sans-serif; font-size: 10pt;\">University of Glasgow<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">In this plenary, we report on findings from a recent corpus-based study examining public discourse surrounding the Assisted Dying for Terminally Ill Adults (Scotland) Bill, which proposes to legalize medically assisted death for mentally competent terminal patients. With public opinion sharply divided\u2014faith groups generally opposing while public polls show over 70% support\u2014the study analyzes both public consultation responses and media coverage to understand key attitudes and arguments.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">The research comprises two complementary studies. The first analyzes public responses to government consultations, drawing from two datasets: 12,314 written submissions from 2022 (2.1 million words) and 7,236 responses from 2024 (1.8 million words). The second examines the media narrative around assisted dying between 2022-2024 in the UK (6,360 texts). By examining language patterns in the two datasets in tandem we reflect on the complex interaction between public attitudes towards a sensitive and contentious topic and the role media framing plays in shaping public debate.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">To examine oppositional discourse in the public responses, we first identify unique lexical choices exclusive to each group\u2014supporters used terms like &#8220;hideous,&#8221; &#8220;abject,&#8221; &#8220;urine,&#8221; and &#8220;linger,&#8221; while opponents employed words such as &#8220;eroded,&#8221; &#8220;burden,&#8221; &#8220;wedge,&#8221; and &#8220;shalt.&#8221; Second, tagging the corpus using WMatrix, we compare key semantic domains between groups, revealing that supporters&#8217; responses contained more emotional content (particularly in the &#8220;Sad&#8221; domain), while opponents&#8217; responses were more analytical in nature. Third, we conduct n-gram studies to identify areas of common ground between opposing viewpoints and detect potential copy-pasted responses within each group.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">For the second part of the study, we investigate media influence on the debate by analyzing 6,360 news articles published between 2022-2024 that explicitly referenced assisted dying. Using keyword analysis, n-grams, and concordance analysis, we examine how different politically-affiliated newspapers framed the debate. Our findings suggest significant overlap between media narratives and public consultation responses, with some phraseology being nearly identical across both datasets.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">The research extends beyond the specific assisted dying debate to address broader methodological questions about analyzing oppositional discourse in public life. The study revealed distinct rhetorical patterns: supporters of the bill tended to employ more emotionally charged language and personal narratives, while opponents favored analytical and consequence-based arguments. The media analysis demonstrated how news coverage might reinforce these polarized positions through consistent framing patterns. The bill&#8217;s consideration in Scotland occurs against the backdrop of similar debates in England, where recent legislative efforts have faced setbacks. The research notes the particular challenges of analyzing representative corpora in highly polarized debates, where responses may range from deeply personal experiences to organized campaign submissions.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">This comprehensive analysis of both public and media discourse provides valuable insights into how contentious healthcare policy debates are framed and argued across different platforms and stakeholder groups. The methodological approach developed here offers a framework for analyzing other polarized public debates, while the findings contribute to our understanding of how public opinion forms and expresses itself on complex ethical issues.<\/span><\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-odd\"><div class=\"panel-heading\"><a class=\"accordion-toggle collapsed\" data-toggle=\"collapse\" data-parent=\"#accordionname338\" href=\"#collapse3381\"><h5><i class=\"icon-minus kt-icon-minus primary-color\"><\/i><i class=\"icon-plus kt-icon-plus\"><\/i>Pascual Cantos<\/h5><\/a><\/div><div id=\"collapse3381\" class=\"panel-collapse collapse \"><div class=\"panel-body postclass\">\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-255\" src=\"https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Pascual-Cantos-298x300.png\" alt=\"\" width=\"348\" height=\"350\" srcset=\"https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Pascual-Cantos-298x300.png 298w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Pascual-Cantos-150x150.png 150w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Pascual-Cantos-768x773.png 768w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Pascual-Cantos.png 846w\" sizes=\"auto, (max-width: 348px) 100vw, 348px\" \/><\/p>\n<p style=\"text-align: center;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif; font-size: 12pt;\"><strong>La inteligencia artificial generativa en el laberinto de la traducci\u00f3n especializada<\/strong><\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt; font-family: 'trebuchet ms', geneva, sans-serif;\">Pascual Cantos<\/span><br \/>\n<span style=\"font-size: 10pt; font-family: 'trebuchet ms', geneva, sans-serif;\">Universidad de Murcia<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">En este estudio se analiza la eficacia de la inteligencia artificial (IA) generativa como herramienta de traducci\u00f3n autom\u00e1tica aplicada a dominios especializados. La investigaci\u00f3n se fundamenta en un enfoque metodol\u00f3gico que combina la ling\u00fc\u00edstica de corpus con t\u00e9cnicas cuantitativas avanzadas, empleando corpus paralelos en los campos biom\u00e9dico, jur\u00eddico y t\u00e9cnico.\u00a0<strong>Estos corpus permiten evaluar<\/strong>\u00a0la capacidad de los sistemas de IA generativa para abordar tareas complejas como la precisi\u00f3n l\u00e9xica, la cohesi\u00f3n discursiva y la adaptaci\u00f3n contextual, aspectos cr\u00edticos en la traducci\u00f3n especializada.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">El dise\u00f1o de la investigaci\u00f3n se centra en la implementaci\u00f3n de an\u00e1lisis cuantitativos exhaustivos. Entre las t\u00e9cnicas empleadas destaca el an\u00e1lisis de frecuencia l\u00e9xica y de patrones de n-gramas, ambos dirigidos a identificar la consistencia terminol\u00f3gica y\u00a0<strong>a detectar posibles incoherencias<\/strong>\u00a0en el uso del vocabulario t\u00e9cnico. Para la evaluaci\u00f3n de la calidad de las traducciones, se aplican m\u00e9tricas ampliamente reconocidas en el campo de la traducci\u00f3n autom\u00e1tica, como BLEU (<em>Bilingual Evaluation Understudy<\/em>) y COMET (<em>Crosslingual Optimized Metric for Evaluation of Translation<\/em>). BLEU eval\u00faa la correspondencia entre las traducciones generadas y\u00a0<strong>las<\/strong>\u00a0traducciones de referencia, mientras que COMET incorpora un enfoque m\u00e1s matizado basado en modelos neuronales que predicen la calidad de la traducci\u00f3n tomando como referencia juicios humanos.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Asimismo, el estudio adopta un an\u00e1lisis de varianza para explorar la adecuaci\u00f3n contextual de las traducciones generadas en los diferentes dominios. Este enfoque permite medir c\u00f3mo los modelos de IA gestionan variaciones en el contexto ling\u00fc\u00edstico y aseguran la coherencia discursiva, un requisito clave en los textos especializados. La metodolog\u00eda incluye, adem\u00e1s, una comparaci\u00f3n entre los resultados obtenidos por los sistemas de IA y las traducciones humanas, considerando par\u00e1metros como la precisi\u00f3n sem\u00e1ntica y la adaptabilidad estil\u00edstica.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Un aspecto esencial del dise\u00f1o metodol\u00f3gico es la preparaci\u00f3n y curaci\u00f3n de los corpus empleados. Estos corpus paralelos se construyen a partir de fuentes autorizadas en los tres dominios seleccionados, asegurando una representaci\u00f3n adecuada de la terminolog\u00eda y los estilos discursivos espec\u00edficos de cada \u00e1rea. Se prioriza la inclusi\u00f3n de textos aut\u00e9nticos que reflejen un uso realista del lenguaje t\u00e9cnico, lo que facilita una evaluaci\u00f3n m\u00e1s precisa de las capacidades de los modelos generativos.\u00a0<strong>La elecci\u00f3n cuidadosa de estos textos es clave para garantizar la validez de las conclusiones extra\u00eddas.<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">El objetivo principal de este estudio es proporcionar una evaluaci\u00f3n integral de la IA generativa en la traducci\u00f3n autom\u00e1tica especializada. La investigaci\u00f3n no solo examina las capacidades actuales de estos modelos, sino que tambi\u00e9n sienta las bases para optimizar su rendimiento mediante la incorporaci\u00f3n de corpus de entrenamiento m\u00e1s espec\u00edficos y estrategias de evaluaci\u00f3n mejoradas.\u00a0<strong>Adem\u00e1s, busca establecer un marco metodol\u00f3gico que pueda ser replicado en estudios futuros sobre traducci\u00f3n autom\u00e1tica en otros dominios especializados.<\/strong>\u00a0Esta aproximaci\u00f3n metodol\u00f3gica, que combina an\u00e1lisis ling\u00fc\u00edsticos detallados con t\u00e9cnicas de evaluaci\u00f3n cuantitativa, contribuye a un entendimiento m\u00e1s profundo del potencial de la IA generativa en entornos de traducci\u00f3n profesional.<\/span><\/p>\n<p style=\"text-align: left;\"><\/div><\/div><\/div>\n<div class=\"panel panel-default panel-even\"><div class=\"panel-heading\"><a class=\"accordion-toggle collapsed\" data-toggle=\"collapse\" data-parent=\"#accordionname338\" href=\"#collapse3382\"><h5><i class=\"icon-minus kt-icon-minus primary-color\"><\/i><i class=\"icon-plus kt-icon-plus\"><\/i>Mark Davies<\/h5><\/a><\/div><div id=\"collapse3382\" class=\"panel-collapse collapse \"><div class=\"panel-body postclass\">\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-249\" src=\"https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Mark-Davis-1-300x295.png\" alt=\"\" width=\"355\" height=\"350\" srcset=\"https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Mark-Davis-1-300x295.png 300w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Mark-Davis-1-768x756.png 768w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/Mark-Davis-1.png 852w\" sizes=\"auto, (max-width: 355px) 100vw, 355px\" \/><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 12pt; font-family: 'trebuchet ms', geneva, sans-serif;\"><strong>The Relevance of Large, Structured Corpora in the Age of Large Language Models<\/strong><\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt; font-family: 'trebuchet ms', geneva, sans-serif;\">Mark Davies<\/span><br \/>\n<span style=\"font-size: 10pt; font-family: 'trebuchet ms', geneva, sans-serif;\">Brigham Young University<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">I will provide a summary of the in-depth data from several \u201cwhite papers\u201d at <u>English-Corpora.org,<\/u> on how well the predictions of two prominent Large Language Models (LLMs) match the actual data from several robust corpora, including corpora from Sketch Engine, and several corpora from English-Corpora.org (COCA, COCA, GloWbE, NOW, iWeb, the TV and Movie corpora, and more). I will also provide limited data from the three corpora in the Corpus del espa\u00f1ol and the three corpora in the Corpus do portugu\u00eas.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">In terms of strengths, the LLMs arguably provide:<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Much richer collocational data than even 40-50 billion word corpora from Sketch Engine (especially for low frequency words). This is due to the advanced word embeddings in high-dimensional space in LLMs, which are much more powerful than the simplistic surface level association measures used in corpus linguistics.<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Better comparisons of contrasting words (e.g. <em>entire \/ complete<\/em>, <em>nuance \/ subtlety<\/em>, <em>perceive \/ discern<\/em> for English; we will also provide data from Spanish)<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Much more insightful analyses (generated by the LLMs themselves) of what the collocates tell us about the meaning and usage of words<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">The LLMs are surprisingly good (perhaps at the level of some of the best corpora) at:<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Estimating word and phrase frequency (such as rank ordering a list of 10-20 words)<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Categorizing words and phrases by dialect, historical period, and dialect<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Analyzing variation in word meaning across genres, historical periods, and dialects<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Predicting syntactic variation \u2013 between genres, historical periods, and dialects.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">However, there LLMs have the following significant limitations, as far as providing language data and carrying out linguistic analyses:<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">They are much worse at <em>generating<\/em> word and phrase lists (such as those at WordFrequency.info) than in analyzing \/ categorizing existing lists<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">We can never be sure if they are actually <em>generating<\/em> useful linguistic data themselves (for example, actual data on syntactic variation between genres, time periods, or dialects), or whether they are simply \u201cparroting\u201d something that they have scraped from an article or a web page.<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">They provide \u201cstatic data\u201d, whereas \u201cfull-featured\u201d corpus sites like English-Corpora.org and Corpusdelespanol.org allow us to see and use links between different words, phrases, and constructions<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Most importantly, LLMs do not allow us to \u201ccheck the data\u201d (via KWIC entries, metadata, etc) in the same way that we can with structured corpora.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">At the end of the day, it is not an either\/or proposition (either LLMs <em>or<\/em> structured corpora). LLMs are best used <em>in conjunction with<\/em> reliable corpus data. Corpus linguists can make use of the rich lexical data from LLMs, and AI\/ML researchers can use corpus data for fine-tuning, distillation, and Retrieval Augmented Generation (RAG) with LLMs<\/span><\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-odd\"><div class=\"panel-heading\"><a class=\"accordion-toggle collapsed\" data-toggle=\"collapse\" data-parent=\"#accordionname338\" href=\"#collapse3383\"><h5><i class=\"icon-minus kt-icon-minus primary-color\"><\/i><i class=\"icon-plus kt-icon-plus\"><\/i>Rebekah Wegener<\/h5><\/a><\/div><div id=\"collapse3383\" class=\"panel-collapse collapse \"><div class=\"panel-body postclass\">\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-243 aligncenter\" src=\"https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/rebekah-300x300.png\" alt=\"\" width=\"348\" height=\"350\" srcset=\"https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/rebekah-150x150.png 150w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/rebekah-768x772.png 768w, https:\/\/cilc2025.usal.es\/wp-content\/uploads\/sites\/152\/2025\/03\/rebekah.png 835w\" sizes=\"auto, (max-width: 348px) 100vw, 348px\" \/><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 12pt;\"><strong><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">From Big Data to Smart Data: B<\/span><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">uilding Better Datasets for Human-Centric AI with Meaning in Mind<\/span><\/strong><\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Rebekah Wegener<\/span><br \/>\n<span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Paris Lodron University Salzburg<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Recent developments in artificial intelligence, particularly those surrounding large language models, have sparked a renewed interest in foundational questions about the nature of human language and how machines process and generate language. These questions echo Halliday&#8217;s (2003) early insights about language as meaning potential and what this means for computational approaches to language. However, these advances also highlight fundamental questions about meaning, context, and the relationship between quantity and quality of data. As Dingemanse and Liesenfeld (2022) argue, creating more representative and meaningful datasets requires going beyond text collection to capture the complexity of human communication.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">To explore these questions further, I want to consider human-centric AI systems, focusing specifically on how such systems are designed and built in industry and academia. These systems have explicit requirements for understanding meaning making in context, for understanding abstract concepts such as importance and for understanding multimodal interaction. Drawing on previous work (e.g. Cassens &amp; Wegener, 2018 and Wegener, in press), I will demonstrate how such systems showcase the importance of high quality datasets for AI &#8211; particularly human-centric AI &#8211; and show how strong theoretical frameworks can inform the design of \u201csmart\u201d datasets that capture the complexity of human meaning-making.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Such endeavours are not without their challenges, and in many respects these challenges mirror long-standing questions in corpus linguistics about context, annotation, and the nature of meaning itself. These problems become particularly apparent when working with multimodal data, where meaning emerges not just from individual modes, but from their integration and interaction in context (Bateman, Wildfeuer &amp; Hiippala, 2017; O&#8217;Halloran, Tan &amp; Wignell, 2019). The development of tools and methods for handling such complexity requires careful consideration of both theoretical and practical concerns.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">While some of these questions are destined to remain core philosophical debates, other are the driving force behind new tool development. Such tools provide opportunities for addressing methodological challenges within corpus linguistics and in particular, hold the potential to assist in the study of meaning. I will briefly consider how new tools can be recruited for corpus linguistics and how human-centric AI can also benefit from tools and methods already popular within corpus linguistics (Driess et al., 2023; Henlein et al., 2024).<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">As Dingemanse &amp; Liesenfeld (2022) argue, \u201ccorpora represent an important and mostly untapped resource for language technology. \u201d Understanding meaning and the process of meaning making requires more than just collecting large amounts of data &#8211; amongst many other things, it requires theoretically informed approaches to dataset design, representation, annotation and analysis. For meaning-focused corpus research, \u201cdata comes in levels of granularity. A well-curated corpus&#8230;harbour(s) important insights about human interactional infrastructure\u201d (Dingemanse &amp; Liesenfeld, 2022).<\/span><\/p>\n<\/div><\/div><\/div><\/div>\n\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":189,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-228","page","type-page","status-publish","hentry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/cilc2025.usal.es\/en\/wp-json\/wp\/v2\/pages\/228","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cilc2025.usal.es\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/cilc2025.usal.es\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/cilc2025.usal.es\/en\/wp-json\/wp\/v2\/users\/189"}],"replies":[{"embeddable":true,"href":"https:\/\/cilc2025.usal.es\/en\/wp-json\/wp\/v2\/comments?post=228"}],"version-history":[{"count":19,"href":"https:\/\/cilc2025.usal.es\/en\/wp-json\/wp\/v2\/pages\/228\/revisions"}],"predecessor-version":[{"id":272,"href":"https:\/\/cilc2025.usal.es\/en\/wp-json\/wp\/v2\/pages\/228\/revisions\/272"}],"wp:attachment":[{"href":"https:\/\/cilc2025.usal.es\/en\/wp-json\/wp\/v2\/media?parent=228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}