Internal software engineers working for the tax department have implemented a machine learning tool which identifies and helps disable websites charging citizens for services offered free via government’s own site
HM Revenue and Customs has developed an algorithm to help detect and crack down on unauthorised use of official government branding.
On government’s online repository of algorithmic transparency records, the tax agency has this week published details of the Logo Detection and Classification Toolkit (LDK). The tool is “an innovative machine learning solution developed in-house by HMRC civil servants [which] uses open-source deep-learning technology… [and] provides fraud analysts with a user interface to combat misuse of HMRC and HMG trademark logos by third parties”.
The transparency record explains that the technology is designed to work via three modules, the first of which uses “Google and Bing search engines to look for URLs that are hosting” one or more of eight specified logos linked to HMRC or government more widely.
The second module is dedicated to “webscraping the suspicious URLs and saving snapshot images on local disk”, while the third “consists of running the deep-learning engine that scans for trademark logos” via an object-detection system. URLs that include the use of one or more of the logos are then checked against an allowed list.
The tool does not directly take down sites it has detected as being fraudulent or unauthorised but, rather, it provides a list of URLs which analysts can then follow up on and, if required, disable through manual processes.
Related content
- Government AI Playbook instructs departments to ‘know limitations and have meaningful human control’
- Pace of AI adoption will vary by departments but requires ‘consistency on leadership and governance’, DSIT chief says
- PM ‘throwing full weight of Whitehall’ behind AI with all departments tasked with driving adoption
Having first been deployed in October 2021, the LDK algorithm has turned a process that previously took three days into one that can be completed in three to four hours, according to the record.
The tool is now used once a month and “on average it webscrapes about 400 URLs” and operates with an accuracy rate of 97%, HMRC indicated.
The system has typically enabled the department’s cybersecurity teams to take down 10 URLs each month for their unauthorised use of logos – “some innocently, others not so innocently to help add credibility to their services by attempting to suggest they are affiliated to HMRC”. This includes sites that implore visitors to pay to be connected to services that are available for free on GOV.UK.
To ensure that the algorithm continues to function as intended, regular updates are made.
“There are times when the algorithm generates a certain number of false positives and false negatives due to unseen patterns in the data that is being analysed,” the record says. “To overcome this circumstance, we tend to re-train the model every six month on new data that it has failed to recognise and use the new set of weights and biases in production.”
HMRC’s release of details on LDK is one of 14 records of central government algorithms published at the start of this week. These releases come hot on the heels of 10 local government records posted in late January.
The publication of information using the blueprints of the algorithmic transparency recording standard is now mandatory for Whitehall departments, with many more records expected to be released over the coming weeks and months.