Cunningham Memorial Library: AI and Language Models (LLM): Environmental Concerns

Environment and Prompts

We have to recognize the negative impact that AI has on the environment. Yet, some of us are going to use some of these tools. If you're using generative AI, these articles will help you build sustainable prompts.

Batch Prompt:Accomplish More With Less This baseline is referred to as “SinglePrompt” in this paper. In terms of token count, when the data input is small compared to instructions and examples, this results in lower token utilization, compared with encoder-based models like finetuned BERT. This cost inefficiency, affecting inference speed and compute budget, counteracts many of the benefits that LLMs offer. This paper aims to alleviate this problem by batching multiple data points in each prompt, a strategy we refer to as “BatchPrompt”. We improve token utilization by increasing the “density” of data points, however, this cannot be done naively. Simple batching can degrade performance, especially as batch size increases, and data points can yield different answers depending on their position within a prompt. To address the quality issue while retaining high token utilization, we introduce Batch Permutation and Ensembling (BPE) for BatchPrompt – a simple majority vote over repeated permutations of data, that recovers label quality at the cost of more token usage.

Think Before You Prompt: Reduce Your AI Carbon Footprint with ROCKS The way AI models process our prompts has an impact on the environment. When you type a question into a LLM like ChatGPT, it does not read your message word-for-word in the way humans do. How these systems understand us depends on their ability to process natural language, breaking down our words into manageable, computationally understandable pieces called tokens. The ease with which we can now generate responses using GenAI contradicts the immense computational and environmental power behind each interaction.

Top Tips on Using Generative AI In An Environmentally Friendly Way 6 quick tips that try to ensure that the transformational benefits of generative AI are not outweighed by its environmental costs.

Research Articles

Artificial Intelligence (AI) end-to-end: The Environmental Impact of the Full AI Lifecycle Needs to be Comprehensively Assessed - Issue Note
The United Nations Environment Programme (UNEP) is the leading global environmental authority that sets the global environmental agenda, promotes the environmental dimension of sustainable development within the United Nations system, and serves as an authoritative advocate for the global environment with a mandate to keep under review the world environmental situation. Against this mandate, UNEP has been requested by UN Member States to consider the environmental dimensions of digital technologies, assessing their opportunities to enable environmental sustainability and the impact they can have on the environment. This note outlines key areas identified by UNEP regarding the environmental impact of Artificial intelligence (AI) across its lifecycle. The note aims to inform Member States, civil society, the private sector and the public, while encouraging the research community to develop and use scientific methods to allow the objective measurement of AI’s environmental footprint.

The Climate and Sustainability Implications of Generative AI
The rapid expansion of generative artificial intelligence (Gen-AI) is propelled by its perceived benefits, significant advancements in computing efficiency, corporate consolidation of artificial intelligence innovation and capability, and limited regulatory oversight. As with many large-scale technology-induced shifts, the current trajectory of Gen-AI, characterized by relentless demand, neglects consideration of negative effects alongside expected benefits. This incomplete cost calculation promotes unchecked growth and a risk of unjustified techno-optimism with potential environmental consequences, including expanding demand for computing power, larger carbon footprints, shifts in patterns of electricity demand, and an accelerated depletion of natural resources. This prompts an evaluation of our currently unsustainable approach toward Gen-AI’s development, underlining the importance of assessing technological advancement alongside the resulting social and environmental impacts.

Energy Costs of Communicating with AI
This study presents a comprehensive evaluation of the environmental cost of large language models (LLMs) by analyzing their performance, token usage, and CO₂ equivalent emissions across 14 LLMs ranging from 7 to 72 billion parameters. Each LLM was tasked with answering 500 multiple-choice and 500 free-response questions from the MMLU benchmark, covering five diverse subjects. Emissions were measured using the Perun framework on an NVIDIA A100 GPU and converted through an emission factor of 480 gCO₂/kWh. Our results reveal strong correlations between LLM size, reasoning behavior, token generation, and emissions. While larger and reasoning-enabled models achieve higher accuracy, up to 84.9%, they also incur substantially higher emissions, driven largely by increased token output. Subject-level analysis further shows that symbolic and abstract domains such as Abstract Algebra consistently demand more computation and yield lower accuracy. These findings highlight the trade-offs between accuracy and sustainability, emphasizing the need for more efficient reasoning strategies in future LLM developments.

Green Prompting
Large Language Models (LLMs) have become widely used across various domains spanning search engines, code generation, and text creation. However, a major concern associated with their adoption is the high cost of inference, impacting both their sustainability and financial feasibility. In this study, we empirically study how different prompt and response characteristics directly impact LLM inference energy cost. We conduct experiments leveraging three open-source transformer-based LLMs across three task types—question answering, sentiment analysis, and text generation. For each inference, we analyzed prompt and response characteristics (length, semantic meaning, time taken, energy consumption). Our results demonstrate that even when presented with identical tasks, models generate responses with varying characteristics and subsequently exhibit distinct energy consumption patterns. We found that prompt length is less significant than the semantic meaning of the task itself. In addition, we identified specific keywords associated with higher or lower energy usage that vary between associated tasks. These findings highlight the importance of prompt design in optimizing inference efficiency. We conclude that the semantic meaning of prompts and certain task-related keywords significantly impact inference costs, leading the way for deeper exploration towards creating energy-adaptive LLMs.