Welcome to The Local News Cartographer
News deserts, data wrangling, and drawing borders around stories that don’t know they have any.
A few years ago, I set out to answer what sounded like a simple question: how many local news outlets are there in the UK? At the heart of my PhD is the problem of news deserts — places where communities lack meaningful local coverage. To study deserts based on what actually gets published, rather than just where offices are located, I first needed a reliable list of outlets to sample from. Easy, right? Not quite.
Outlets close quietly, websites vanish, some rebrand, others exist only on Facebook, and new ones spring up without making the national news. Even defining “local” was tricky — is a town website with one volunteer reporter an outlet? What about a regional daily with shrinking staff but national ambitions? Before I could map where journalism goes, I had to map the journalists themselves. What seemed like a straightforward first step quickly became its own research project. One that culminated in journal article, a conference poster and paper, and a job at the Public Interest News Foundation.
Mapping local news
As part of this chapter, I conducted research to test the reliability of the main databases used to quantify local news provision in the UK. These datasets — circulation auditors such as ABC and JICREG for print — are often treated as authoritative, yet when triangulated against one another, alongside earlier research sheets compiled by scholars such as Ramsay and Moore (2015), we found each was incomplete in different ways. By bringing these fragmented sources together, we demonstrated just how limited they were, and built a more robust national directory of print and digital local news outlets, reflecting conditions in 2022–23.
In the process, I connected with the Public Interest News Foundation, which was grappling with the same question: could a single map of all local outlets be created, one that would allow us to see how availability varies across the country? Together, we worked on designing a system to make this task sustainable. With more than 2,000 outlets across print, digital, radio, and TV, verifying which are still active (and venturing out to research which ones just launched) is a cumbersome task. To shorten research time, in 2024 I developed a (more) sophisticated database with built-in automations leveraging openly available data: automated reading industry press such as Press Gazette and Hold the Front Page for closures and launches, parsing circulation data for updates, and even detecting when an outlet’s homepage has remained static — a sign it has ceased publishing. The result was not just a cleaner dataset, but a proof of concept: maintaining a live map of UK local news is possible, but only with systems that acknowledge how unstable, messy, and fast-changing the sector has become.
The payoff from this effort came in early 2024, when I researched and wrote the Public Interest News Foundation’s Local News Report based on the database. The report offered a clear national picture of local news provision — not just who owns what, but where gaps in coverage are most acute. The findings confirmed what many suspected: stark inequalities persist, with some communities awash in news and others effectively invisible. These patterns matter, not only for access to information but for how people experience democracy at the local level.
The report resonated well beyond academia. It was picked up by The Guardian, and by Journalism.co.uk, and used as the data backbone for investigations by Press Gazette (on the lack of London council coverage) and the Media Reform Coalition (on media ownership concentration). In each case, the map became a reference point for public debate about who gets served by local journalism — and who is left behind. For me, this was a powerful reminder that the unglamorous work of database-building isn’t just housekeeping: it shapes the evidence base on which national conversations about news deserts, funding, and media reform are conducted.
Mapping local news coverage
With the infrastructure of outlets in place, my PhD shifted focus to the content itself: what news exists, and how it reaches the communities it claims to serve. Unlike national outlets such as The Guardian, most local news publications do not provide APIs, and media archives typically rely on keyword searches, which cannot capture every story a publication produces. Each website is different, and building scrapers that work reliably across hundreds of outlets — especially for retrospective data — is extremely challenging. I wanted not just a snapshot, but a longitudinal picture, going back in time to see how local reporting evolved.
In early 2023, I leveraged the Twitter Academic API, still accessible to researchers at that time, to build a solution. I first identified the Twitter handles for the local news outlets in my database, then collected all their posts. From these tweets, I extracted URLs to published articles, cleaned the dataset to remove noise, and reconstructed the outlets’ content streams. The result: 4.5 million tweets and 2.5 million news articles spanning 2020–2022. This allowed me to study the actual content produced by local outlets, rather than relying on proxies like office locations or nominal coverage areas.
From there, the questions multiplied. Do local news outlets cover the communities they claim to serve? How does media ownership influence the geographic patterns of reporting? As Bob Franklin noted in 2006, the “local” in local news often exists only in the masthead. To answer these questions, I needed to know where each article was talking about. That required extracting and geocoding locations — a surprisingly difficult task, given that small, ambiguous place names abound. “Station Road,” for instance, exists in dozens of towns; how can we know which one a reporter is referring to?
To solve this, I developed a system using open-source, locally run large language models, keeping the data in-house and secure. This system disambiguates and geocodes locations with high precision. Because the computation required is immense, I sampled roughly 25% of the articles, yielding around 1.6 million location mentions, which I am now mapping and analysing from multiple perspectives. This allows me to move beyond presence/absence of outlets, toward understanding the actual geography of news coverage, the gaps, overlaps, and patterns shaped by ownership and editorial practice.
With the locations in hand, I could finally start answering the bigger questions. But 1.6 million place mentions is a lot of data — too much to just look at manually. So I turned to computational methods to make sense of it. The idea was simple: treat each news outlet like a “geography in action,” and ask, where do they actually report, and how far does that reach extend?
I developed a framework that measures coverage in four key ways. First, how far and wide the news travels — whether a news outlet sticks to one neighbourhood or spans an entire region. Second, administrative reach — do outlets cover multiple towns or council areas, or concentrate on just one? Third, spatial diversity — are locations clustered or spread out unpredictably? And fourth, distance decay — how quickly coverage fades the further you get from the outlet’s main location.
Applying this framework to my sample of articles, I could group outlets into distinct types: from hyperlocal papers that barely stray from their immediate town, to metropolitan outlets with a city-wide footprint, all the way to “national-local” outlets whose content reaches across the country. These patterns show more than just who covers what — they reveal the structure of local journalism itself, how ownership shapes coverage, and where communities are overlooked. In short, this approach moves us from simply counting outlets to understanding the actual geography of news.
To make sense of the geographic patterns in local news coverage, I built an interactive dashboard that visualises how outlets cluster based on their reach. Beyond being a way for others to explore the data, the dashboard was a crucial tool for me as a researcher: deciding on a clustering solution is tricky, because statistical metrics don’t always capture interpretability or real-world meaning. By interacting with the dashboard, I could see which clusters made sense, compare different feature sets, and refine the analysis. It’s both a research instrument and a way to experience the structure of UK local journalism firsthand.
You can explore the dashboard here.

What is this Substack about?
What began as a PhD housekeeping task — making a list of local news outlets — turned into something much bigger. Building the infrastructure to study local journalism at scale meant confronting the reality that this sector is constantly shifting, oftentimes away from the public eye.
But the real story—although I will tell it, for those who care about reproducibility or replicating this elsewhere—isn't the databases or the 2.5 million articles. It's what we can now see that was invisible before: which communities are truly served by local journalism, and which are left behind; whether who owns the media makes a difference; how things have been evolving over time.
I'm starting this Substack because this work shouldn't live only in academic papers. I want to open that process up — to share the methods, challenges, and insights from my research, invite constructive feedback or collaboration, and show what it really takes to map journalism at scale. This includes the messy realities of collecting and cleaning millions of articles, the difficulties of disambiguating small, ambiguous locations, and the creative methodological opportunities emerging from new tools (hello, AI!).
I hope you'll follow along.
About me
I'm Simona Bisiani, a computational journalism researcher at the Surrey Institute for People-Centred AI. Before diving into academia, I worked as a data journalist, creating projects that revealed hidden patterns in everything from accessibility data to European sanctions. My background in both journalism (BA Hons) and computational social science (MSc) is what drew me to this line of research — I think local news coverage matters and I like to develop technical skills to build the infrastructure needed to study it at scale. You can read more about me on website or my PhD notebook.


