
Global Investigative Journalism Network
The field of Artificial Intelligence (AI) is rife with stories for investigative reporters to tap. This new reporting guide, a collaboration between the Pulitzer Center’s AI Accountability team and GIJN, is meant to help reporters understand some of the nitty-gritty of the technology underlying AI and to give them a framework through which they can examine it.
It is written by Lighthouse Reports investigative journalist Gabriel Geiger, and Documented investigative journalist and Craig Newmark Graduate School of Journalism at CUNY data journalism professor Lam Thuy Vo, who helped co-design the Pulitzer Center’s AI Spotlight Series.
The following is a section taken from the full guide, which can be found here. In the guide, Geiger and Thuy Vo break down what AI actually means, and then provide a framework for AI Accountability Stories:
What Is AI?
Many people were first introduced to the idea of artificial intelligence through ChatGPT. For that reason, people think of ChatGPT as AI and of AI as just ChatGPT.
But the truth is far more complex than that. Artificial intelligence describes the process of using machines to mimic human decision-making and can better be thought of as a “grab bag” of a term that encompasses a large number of technologies.
Scientists and researchers coined the term in the 1950s and since then have found many, differing ways to recreate human intelligence through technology.
One of the most popular and prolific AI methods these days is machine learning and all the forms it takes, including its subsets deep learning and generative AI.
Machine learning is the process of analyzing data to find patterns that allow us to make predictions or decisions based on those findings. These analyses use various mathematical methods, from simple statistics to complex neural networks, often depending on the amount of data that’s being processed. The result of this training is a computer program, or AI model, that can take in new data and make predictions or generate new information based on this old data. In many ways, you can imagine machine learning outputs as a remix of old data. In one use case, simple machine learning models may be used by government agencies that are assigning risk scores to potential welfare recipients or to people who are applying for housing benefits.
Deep Learning is a subset of machine learning that requires a large amount of data entries, often in the millions, and uses complex analytical methods, like neural networks, which are mathematical methods that mimic the structure of the brain and consist of interconnected nodes, to make sense of the data. (You can learn more about neural networks here.) This kind of machine learning is often used by Big Tech companies that may use it for predicting terms in search engines or recommendation systems for streaming services.
Then there is generative AI which is a subset of machine learning that requires even more data and, during its training phase, even more energy and intricate mathematical methods to make its models. Generative AI differs from many other machine learning methods in that it does not just produce a recommendation for a timeline or a predictive score, but also creates new content in the form of text or imagery. That’s the technology we now encounter through Large Language Models (LLMs) in the form of chatbots like ChatGPT or Gemini, as well as through apps that create images from text prompts like Midjourney.
The diagram below lays out all versions of machine learning
Knowing how machine learning works in broad strokes can help journalists find ways to speak about it, ask informed questions about the technology, and find ways to better tap into the various stages of AI development for their reporting.
Framework for AI Accountability Stories
When we first began developing the AI Spotlight Series with Karen Hao, we kept returning to a simple question: What do we wish we had known when we first started reporting on AI? The answer was a framework for how to identify and frame AI stories.
AI encompasses a broad set of technologies and issues, and it can be overwhelming to figure out where to start. Our framework revolves around the four stages of modern AI development. At the foundation are the inputs, the data and compute that make today’s systems possible. From there, models are built and trained, shaped by data and design choices. Finally, these models are applied in the real world. Each of these development stages comes with its own set of related issues, actors involved, and impacted people or structures.