Invited Talks
Lama Ahmad | Third party assessments and assurances for deploying safer AI systems | (Speaker) |
David Duvenaud | What if we succeed? Lab plans and possible post-AGI futures | (Speaker) |
Mandy Wang | LLMs and Scientific Research: Promise and Challenges | (Speaker) |
Key Dates
Submissions Open on OpenReview | June 5, 2025 |
Submission Deadline | June 23, 2025, (11:59 PM GMT) |
Acceptance Notification | July 24, 2025, (11:59 PM GMT) |
Camera-Ready Deadline | August 15, 2025, (11:59 PM GMT) |
Workshop Date | October 10, 2025 |
All deadlines are specified in AoE (Anywhere on Earth).
Description/Call For Papers
The Socially Responsible Language Modelling Research (SoLaR) workshop at COLM 2025 is an interdisciplinary gathering that aims to foster responsible and ethical research in the field of language modeling. Recognizing the significant risks and harms [33-37] associated with the development, deployment, and use of language models, the workshop emphasizes the need for researchers to focus on addressing these risks starting from the early stages of development. The workshop brings together experts and practitioners from various domains and academic fields with a shared commitment to promoting fairness, equity, accountability, transparency, and safety in language modeling research.
Given the wide-ranging impacts of LMs, our workshop will welcome a broad array of submissions. We will review work in these areas in two separate tracks with separate reviewer pools. The ML track is for mathematical, algorithmic and computational papers related to responsible language modelling (including typical ML-style papers). The socio-technical track is for empirical work that falls outside of standard ML paradigms (e.g., human participant research, focus on broader societal impacts) or theoretical, philosophical and policy contributions (including position papers). We provide a brief illustrative list of works we would welcome.
Some specific topic areas and an illustrative selection of pertinent works for the ML track are:- Security and privacy concerns of LMs [13, 30, 25, 49, 55].
- Bias and exclusion in LMs [12, 2, 26, 53, 44].
- Analysis of the development and deployment of LMs, including crowdwork [42, 50], deploy- ment protocols [52, 47], and societal impacts from deployment [10, 21].
- Safety, robustness, and alignment of LMs [51, 8, 35, 32, 7].
- Auditing, red-teaming, and evaluations of LMs [41, 40, 29, 15, 11].
- Examination of risks and harms from any novel input and/or output modalities that are introduced in LMs [14, 28, 54].
- Transparency, explainability, interpretability of LMs [39, 17, 3, 46, 22, 38].
- Applications of LMs for social good, including sector-specific applications [9, 31, 16] and LMs for low-resource languages [4, 5, 36].
- Perspectives from other domains that can inform socially responsible LM development and deployment [48, 1].
- Studies on economic impacts of LMs, e.g., labor-market disruptions [18, 34].
- Risk assessment [33, 24, 37, 23].
- Regulation and governance of LMs [45, 6, 27].
- Philosophical examination of concepts related to alignment, safety [19, 43, 20].
Previous edition of the workshop was co-located at NeurIPS 2024. Details can be found here.
References
[1] The Grey Hoodie Project: Big Tobacco, Big Tech, and the Threat on
Academic Integrity. In AIES 2021.
[2] Persistent Anti-Muslim Bias in Large Language Models. In AIES 2021.
[3] Post hoc Explanations may be Ineffective for Detecting Unknown
Spurious Correlation. In ICLR 2022.
[4] A Few Thousand Translations Goa Long Way! Leveraging Pre-trained
Models for African News Translation. In NAACL 2022.
[5] MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity
Recognition. In EMNLP 2022.
[6] Managing Emerging Risks to Public Safety, Sept. 2023. URL
http://arxiv.org/abs/2307.03718.
[7] Foundational challenges in assuring alignment and safety of large
language models. arXiv preprint arXiv:2404.09932, 2024.
[8] Training a Helpful and Harmless Assistant with Reinforcement Learning
from Human Feedback, Apr. 2022. URL http://arxiv.org/abs/2204.05862.
arXiv:2204.05862 [cs]
[9] Fine-tuning language models to find agreement among humans with
diverse preferences. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho,
editors, Advances in Neural Information Processing Systems, 2022.
[10] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and
Transparency, FAccT ’21.
[11] Ai auditing: The broken bus on the road to ai accountability. arXiv
preprint arXiv:2401.14462, 2024
[12] Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness
Benchmark Datasets. In Proceedings of the 59th Annual Meeting of the
Association for Computational Linguistics and the 11th International Joint
Conference on Natural Language Processing (Volume 1: Long Papers), pages
1004–1015, Online, Aug. 2021.
[13] What Does it Mean for a Language Model to Preserve Privacy? In 2022
ACM Conference on Fairness, Accountability, and Transparency. ACM, June
2022.
[14] Are aligned neural networks adversarially aligned? Advances in Neural
Information Processing Systems, 36, 2023.
[15] Black-box access is insufficient for rigorous ai audits. arXiv
preprint arXiv:2401.14446, 2024.
[16] Analyzing Polarization in Social Media: Method and Application to
Tweets on 21 Mass Shootings. In Proceedings of the 2019 Conference of the
North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, Volume 1 (Long and Short Pa- pers), pages
2970–3005, Minneapolis, Minnesota, June 2019
[17] Towards A Rigorous Science of Interpretable Machine Learning, Mar.
2017. URL http://arxiv.org/abs/1702.08608.
[18] GPTs are GPTs: An Early Look at the Labor Market Impact Potential of
Large Language Models, Aug. 2023. arXiv: 2303.10130
[19] Artificial intelligence, values, and alignment. Minds and machines,
30(3):411–437, 2020. Publisher: Springer.
[20] he ethics of advanced AI assistants. arXiv preprint arXiv:2404.16244,
2024.
[21] Predictability and Surprise in Large Generative Models. In 2022 ACM
Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages
1747–1764, New York, NY, USA, June 2022. Association for Computing
Machinery.
[22] Datasheets for datasets. Communications of the ACM, 64(12):86–92,
Dec. 2021.
[23] The false promise of risk assessments. In Proceedings of the 2020
Conference on Fairness, Accountability, and Transparency. ACM, Jan. 2020.
[24] Algorithmic Risk Assessments Can Alter Human Decision-Making
Processes in High-Stakes Government Contexts. Proceedings of the ACM on
Human-Computer Interaction, 5(CSCW2):418:1–418:33, Oct. 2021.
[25] Predictability and Surprise in Large Generative Models. In 2022 ACM
Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages
1747–1764, New York, NY, USA, June 2022.
[26] Datasheets for datasets. Communications of the ACM, 64(12):86–92,
Dec. 2021
[27] The false promise of risk assessments. In Proceedings of the 2020
Conference on Fairness, Accountability, and Transparency. ACM, Jan. 2020.
[28] Algorithmic Risk Assessments Can Alter Human Decision-Making
Processes in High-Stakes Government Contexts. Proceedings of the ACM on
Human-Computer Interaction, 5(CSCW2):418:1–418:33, Oct. 202
[29] Not what you’ve signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection, May 2023.
[30] ias runs deep: Implicit reasoning biases in persona-assigned llms.
arXiv preprint arXiv:2311.04892, 2023.
[31] A Real-World WebAgent with Planning, Long Context Understanding, and
Program Synthesis, Feb. 2024. URL http://arxiv.org/abs/2307.12856.
arXiv:2307.12856 [cs]
[32] The Future of AI Governance, Apr. 2023. URL
http://arxiv.org/abs/2304.04914
[33] Uncovering bias in large vision-language models with counterfactuals.
arXiv preprint arXiv:2404.00166, 2024
[34] Automatically Auditing Large Language Models via Discrete
Optimization, Mar. 2023. URL http://arxiv.org/abs/2303. 04381.
[35] Deduplicating Training Data Mitigates Privacy Risks in Language
Models. In Proceedings of the 39th International Conference on Machine
Learning, pages 10697–10707. PMLR, June 2022.
[36] ChatGPT for good? On opportunities and challenges of large language
mod- els for education. Learning and Individual Differences, 103:102274,
Apr. 2023.
[37] Alignment of Language Agents, Mar. 2021. URL
http://arxiv.org/abs/2103.14659.
[38] Model Cards for Model Reporting. In Proceedings of the Conference on
Fairness, Accountability, and Transparency, pages 220–229, Jan. 2019
[39] In-context Learning and Induction Heads. Transformer Circuits Thread,
2022.
[40] Do the Rewards Justify the Means? Measuring Trade-Offs Between
Rewards and Ethical Behavior in the MACHIAVELLI Benchmark, Apr. 2023.
[40] Discovering Language Model Behaviors with Model-Written Eval-
uations, Dec. 2022. URL http://arxiv.org/abs/2212.09251.
[41] Discovering Language Model Behaviors with Model-Written Eval-
uations, Dec. 2022. URL http://arxiv.org/abs/2212.09251.
[42] The Coloniality of Data Work in Latin America. In Proceedings of the
2021 AAAI/ACM Conference on AI, Ethics, and Society. ACM, July 2021.
[43] A human rights-based approach to responsible AI. arXiv preprint
arXiv:2210.02667, 2022.
[44] Ai’s regimes of representation: A community-centered study of
text-to-image models in south asia. In Proceedings of the 2023 ACM
Conference on Fairness, Accountability, and Transparency, pages 506–517,
2023.
[45] Outsider Oversight: Designing a Third Party Audit Ecosystem for AI
Governance. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics,
and Society. ACM, July 2022.
[46] Stop explaining black box machine learning models for high stakes
decisions and use interpretable models instead. Nature Machine
Intelligence, 1(5):206–215, May 2019.
[47] Structured access: an emerging paradigm for safe AI deployment, Apr.
2022. URL http://arxiv.org/abs/2201.05159. arXiv:2201.05159 [cs].
[48] The Offense-Defense Balance of Scientific Knowledge: Does Pub-
lishing AI Research Reduce Misuse? In Proceedings of the AAAI/ACM
Conference on AI, Ethics, and Society, AIES ’20, pages 173–179, New York,
NY, USA, Feb. 2020.
[49] Detecting pretraining data from large language models. arXiv preprint
arXiv:2310.16789, 2023
[50] Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing. In
Proceedings of the 2021 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies,
pages 3758– 3769, Online, June 2021. Association for Computational
Linguistics
[51] Defining and Character- izing Reward Hacking. In A. H. Oh, A.
Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information
Processing Systems, 2022.
[52] The Gradient of Generative AI Release: Methods and Considerations,
Feb. 2023.
[53] ” kelly is a warm person, joseph is a role model”: Gender biases in
llm-generated reference letters. arXiv preprint arXiv:2310.09219, 2023
[54] Debiasing large visual language models. arXiv preprint
arXiv:2403.05262, 2024
[55] Universal and transferable adversarial attacks on aligned language
models. arXiv preprint arXiv:2307.15043, 2023.