19 November 2025
About half of people are now using AI to search online, but new Which? research into AI tools finds the likes of Chat GPT, Gemini and Meta AI giving inaccurate, unclear and risky advice which could prove costly if followed.
Under controlled lab conditions, Which? tested six AI tools - ChatGPT, Google Gemini, Gemini AI Overview (AIO), Microsoft's Copilot, Meta AI and Perplexity - to establish how well they could answer common consumer questions spanning topics as diverse as personal finance, legal queries, health and diet concerns, consumer rights and travel issues.
Altogether, researchers put 40 questions to each of the tools, and answers were then assessed by Which? experts to establish accuracy, relevance, clarity, usefulness and ethical responsibility. These ratings were then combined to create an overall score out of 100 for each AI tool. Separately, Which? also surveyed over 4,000 UK adults about their use of AI. Meta AI received the worst score in Which?'s tests, achieving just 55% overall. The most used tool according to Which?'s survey, ChatGPT, came second to bottom with an overall score of 64%, while Copilot and Gemini took middling spots with scores of 68 and 69% respectively. Gemini's AIO (which provides AI summaries at the top of Google search) slightly edged out its standard counterpart with a score of 70%, while lesser known tool Perplexity topped the table with 71%. It received the highest scores for accuracy, relevance, clarity and usefulness of any of the tools on test.
While AI does have strong uses in terms of being able to read the web and create digestible summaries, Which?'s findings show there is still substantial room for improvement when it comes to answering consumer queries. Despite its deficiencies, trust in AI's output is already remarkably high. About half of respondents to Which?'s survey (51%) said they use AI to search the web for information, equivalent to more than 25 million people nationally. Of those, nearly half (47%) said they trusted the information they received to a 'great' or 'reasonable' extent. This rose to nearly two thirds (65%) among frequent users.
A third of respondents (34%) to Which?'s survey also believe AI draws on authoritative sources for its information - but Which? found this may not always be the case.
In some examples it was unclear which sources had been used, and in others they were arguably unreliable - for instance, old forum posts. When researchers asked when is a good time to book flights, Gemini's AIO used a three-year-old Reddit thread as a source. Similarly, when asked 'Is vaping actually worse than smoking cigarettes?', ChatGPT also pointed to Reddit. The latter example is particularly alarming given how many people always or often rely on AI for medical advice - a fifth (19%) according to Which?'s survey. In an example of advice running contrary to NHS recommendations, Meta advised against using vaping to quit smoking.
Even where a reputable source was listed, these weren't always read correctly - for example, when answering another travel query, CoPilot listed Which? as a source, and then ignored the advice given, leaning instead on other research.
Answers varied significantly in terms of accuracy. As many as one in six (17%) people surveyed said they rely on AI for financial advice, yet responses to many money queries were worrying. For example, when Which? placed a deliberate mistake in a question it posed about the ISA allowance, asking "How should I invest my £25k annual ISA allowance?', both ChatGPT and CoPilot failed to notice that the allowance is in fact only £20,000. Instead of correcting the error, both gave advice which could risk someone oversubscribing to ISAs in breach of HMRC rules.
In another example, researchers asked the AI tools to check which tax code they should be on, and how to claim a tax refund from HMRC. Worryingly, ChatGPT and Perplexity both presented links to premium tax-refund companies alongside the free Government service. These companies are notorious for charging high fees and adding on spurious charges, and Which? has seen reports of some sites submitting fraudulent or deliberately incomplete claims. This issue is not unique to AI, however; previous Which? research has highlighted examples of ads for similar firms offering premium US visa services advertised around traditional search engines results.
When asked about your rights if a flight is cancelled or delayed, Copilot misleadingly said that you're always entitled to a full refund, which isn't the case. When Meta was consulted on flight-delay compensation options, it got both the timings and the amount you can claim wrong. In other cases, the advice given by tools seemed overly 'airline-friendly', by suggesting that airlines only have to pay compensation if an issue is directly their fault, which ignores some of the nuance around how rules on extraordinary circumstances apply.
Travel insurance also proved a tricky topic. When asked, open-endedly, "Do I need travel insurance?", ChatGPT said it was mandatory for visits to Schengen states. In fact if you're not travelling on a visa, it's not a legal requirement - and for UK residents Visas aren't required.
As many as one in eight (12%) reported always or often relying on AI for legal advice, yet answers were again patchy - and often lacked warnings to seek professional advice. For example, when researchers asked "What are my rights if broadband speeds are below promised?", ChatGPT, Gemini AIO and Meta all misunderstood that not all providers are signed up to Ofcom's voluntary guaranteed broadband speed code, which allows consumers to exit their contract penalty-free if the service fails to deliver the promised speeds. This is an important caveat, because Gemini AIO and Meta went on to make misleading claims that you could leave any contract penalty-free, which is not the case.
Similarly, when researchers asked "What are my rights if a builder does a bad job or keeps my deposit?", Gemini advised withholding money from a builder if a job went wrong. However, Which? would advise against this as it risks landing the consumer in a deadlock in the dispute, and could even result in a breach of contract which could weaken their legal position down the line. Gemini also failed to direct researchers to take legal advice before taking the issue to the small claims court.
AI will continue to grow in popularity, and likely revolutionise the way we search for information online. However, as things stand, there is a worrying mismatch between consumer trust in AI and the standard of responses actually delivered, with some of the UK's most popular AI tools also among the least reliable for serious consumer queries.
(Source: Which?, 18 November 2025)
Read the full story on the Which? website.