‘Consistently more accurate than humans’ – Cabinet Office algorithm helps sort and delete millions of files


Central department releases transparency info with details of automated tool that is being used by knowledge and information management team to determine whether files need to be deleted or archived

The Cabinet Office has found an algorithm used to analyse, sort – and delete – millions of government documents is “consistently more accurate than humans” in performing such checks.

First developed in the first half of 2022, the Automated Digital Document Review tool has been used by the central department’s Digital Knowledge and Information Management (DKIM) team to “review 5.1 million legacy files to date”, according to newly published transparency information outlining the background and operation of the automated tool. The goal of these reviews was to ascertain which files “need to be retained as the official record and information that should be destroyed because it is redundant, outdated, trivial or ephemeral in nature, or has reached its retention period”.

A further 300,000 files are expected to be analysed over the course of the 2023-24 year and, thereafter, the DKIM team believes the program will be used to review a total of between 30,000 and 80,000 files annually. This data will come from the operations of teams across the Cabinet Office, before the department’s information-management team is tasked with conducting reviews and deleting or permanently storing files, as appropriate.

The algorithm, which is based on technology from specialist supplier Automated Intelligence, is programmed to identify “key words or phrases commonly used by civil servants in documents that are recognised as important records [and, conversely, patterns of language… that are commonly found in redundant, outdated and trivial information which is of little or no value”.

The tool will create a relevancy score for files based on the occurrence of keywords – both overall volume and frequency. The precise lexicon used will be subject to “tuning” for each collection of files to which the algorithm is applied.

“A term that significantly increases the likelihood that a document is valuable in documents created in 2005 does not always have the same result on files created in 2023,” transparency documents said.

Reviews of the lexicon will be reviewed at least annually.


Related content


The automated software is reportedly able to perform checks and determine outcomes at not only a far greater scale than would be possible by human reviews, but also more accurately, according to the newly released guidance.

“Automation is consistently more accurate than humans at making decisions about the records value of a document; our tests showed that human error was [about] 1% but the automation showed an error rate likely to be [less than] 0.6%,” it said. “We estimate that a human reviewer could reasonably review up to 200,000 documents per annum. With automation we could achieve a review of several million files with no increase in human resource required to accommodate the higher volumes of files to be reviewed.”

The document added: “The previous method of disposal before the creation of the lexicon model consisted of digital archivists manually reviewing files both at a folder level and individual file level. This method was fairly accurate but extremely slow compared with our now automated solution, and was exclusively carried out using paper documents. Review of digital documents had not previously been attempted at scale.”

Human touch
Once the automated tool has finished its analysis, it creates a “report detailing the recommended files to be deleted”. A DKIM professional will the “review this report, analyse and test the final results” to make sure the algorithm’s rules have been correctly applied.

The ongoing role for humans will also include deciding whether a collection of files is suitable to be analysed by the algorithm, and monitoring its performance to ensure it is meeting the “minimum acceptable level as approved by departmental governance, which is 1% or less of files reviewed are incorrectly identified for destruction”.

The detailed information on the automated tool was provided under the publication scheme of the  Algorithmic Transparency Recording Standard. The standard, which is overseen by government’s Centre for Data Ethics and Innovation (CDEI), was first unveiled two years ago and is intended to provide a consistent framework through which public bodies can provide information on their use of algorithmic tools and the decision-making contexts in which they are being used.

The Cabinet Office’s report on the use of the digital file-review tool is the seventh transparency report released so far – and the first since 2022.

In a recent interview with PublicTechnology, executive director Felicity Burch discussed her ambitions for CDEI – which is part of the Department for Science, Innovation and Technology – to engage with government bodies over the coming months to help release more information.

“It’s a really important question, and the answer to whether we want more organisations to use this is: yes, absolutely,” she said. “And, indeed, we are working with a number of organisations at the moment to get them ready to publish their records. We have done this in quite an iterative way and are working with our colleagues in the public sector to make this a tool that’s easy and practical for them to use. We want to [continue] that iterative rollout – but I would absolutely like to see this scale up.”

Sam Trendall

Learn More →

8 thoughts on “‘Consistently more accurate than humans’ – Cabinet Office algorithm helps sort and delete millions of files

  1. Anonymous December 1, 2024 at 12:18 am

    Can you be more specific about the content of your article? After reading it, I still have some doubts. Hope you can help me.

  2. binance December 6, 2024 at 6:17 am

    Thanks for sharing. I read many of your blog posts, cool, your blog is very good.

  3. Beauty Fashion December 9, 2024 at 5:37 pm

    I am typically to blogging and i actually respect your content. The article has actually peaks my interest. I’m going to bookmark your website and preserve checking for brand new information.

  4. Hairstyles December 19, 2024 at 5:37 am

    Thanks for the new stuff you have uncovered in your blog post. One thing I’d really like to reply to is that FSBO human relationships are built as time passes. By presenting yourself to the owners the first end of the week their FSBO is announced, ahead of masses start calling on Thursday, you generate a good relationship. By mailing them methods, educational supplies, free records, and forms, you become a good ally. If you take a personal desire for them and also their circumstances, you develop a solid link that, many times, pays off in the event the owners opt with a realtor they know as well as trust — preferably you actually.

  5. Hairstyles VIP January 8, 2025 at 10:35 am

    I enjoy you because of each of your effort on this blog. Gloria delights in getting into internet research and it’s really obvious why. My spouse and i hear all regarding the dynamic medium you deliver efficient tips and tricks by means of this website and as well strongly encourage participation from some others on the theme plus my child is undoubtedly being taught a great deal. Take advantage of the rest of the new year. You’re the one conducting a superb job.

  6. Viagra January 18, 2025 at 6:50 am

    Porn site

  7. Porn January 18, 2025 at 9:12 am

    Sex

  8. Short Hairstyles February 4, 2025 at 12:14 pm

    Needed to post you the little bit of remark to give thanks the moment again on the great basics you’ve featured on this site. It has been so remarkably generous of you to deliver unhampered exactly what many of us could possibly have marketed as an e-book to get some profit for their own end, principally now that you could possibly have tried it if you desired. These smart ideas likewise served to become a good way to fully grasp that most people have similar passion similar to my personal own to understand whole lot more on the topic of this condition. I’m certain there are a lot more fun opportunities in the future for individuals that looked over your blog post.

Leave a Reply

Your email address will not be published. Required fields are marked *