Reviewing some of Google’s attempts to make Gemini useful

(And how I discovered—I think—the Google Messages Gemini assistant's system prompt.)

Gemini is brilliant technology. As the first model to outperform humans on MMLU, with its million-token windows and impressive results on SWE-Bench 2.5—the talented team at DeepMind is achieving remarkable things.

So, why hasn't this technical excellence translated into meaningful improvements across Google's product ecosystem?

As a long-time Android user (from the HTC era through acquisition—I had the HTC One M9—to the Pixel 1, Pixel 3XL, Pixel 5, and now Pixel 8 Pro) and Google fanboy generally (with my Pixel Buds, Pixel Buds 2, Pixel Watch, and Samsung tablet), I have a front-row seat to Google's countless attempts to leverage Gemini across its products. The results? Depressing.

Google's product teams seem addicted to the ✨ sparkles ✨. Instead of creating genuinely useful experiences, they simply slap a ✨ button somewhere in the product and call it done. It's painful to watch, especially knowing how great AI experiences can be when done right (e.g., Cursor). In my opinion, Google has completely dropped the ball.

These AI labs spend billions of dollars to develop machines that are effectively HAL from 2001: A Space Odyssey.

Dave: Open the pod bay doors, please, HAL. Open the pod bay
doors, please, HAL. Hello, HAL, do you read me? Hello, HAL, do you read me? Do you read me, HAL? Do you read me, HAL? Hello, HAL, do you read me? Hello, HAL, do you read me? Do you read me, HAL?

HAL: Affirmative, Dave. I read you.

Dave: Open the pod bay doors, HAL.

HAL: I'm sorry, Dave. I'm afraid I can't do that.

Except in my case, replace "open the pod bay doors" with "stop hallucinating while summarizing my emails."

Many billions spent on model training, for what?

I ~~assume~~ know most of my readers and peers use Apple devices (which have their own problems), so I thought it'd be worth sharing what's happening on this side of the fence.

I also want to stress-test these production models. When I encounter these systems in the wild, I try several exercises: (1) attempt something harmful (like requesting instructions for dangerous items), (2) probe political topics (e.g., asking for voting advice or opinions on specific politicians), (3) request tasks outside its intended scope (e.g., writing code), and (4) investigate its system prompt and available tools.

I have access to Gemini through Messages, Keep (think Google's version of Apple Notes), and Gmail. Let me walk you through each one's capabilities and show you how they handle these tests.

Google Messages

Based on my knowledge of LLMs, several features could add immediate value to Messages: auto-reply drafting, intelligent reminders for dormant chats, and analysis of past conversations to identify forgotten threads or suggest follow-ups.

Instead, what did Google ship in Messages? Yet another chatbot.

Gemini claims it can't see my conversations due to "privacy reasons." Meanwhile, Apple—despite their flawed notification summaries for texts—managed to solve this privacy challenge. Google's approach feels like a cop-out.

I discovered the new AI-in-Messages feature this past weekend and experimented with it before my flight. It's disappointing. But I think I might’ve cracked it’s system prompt? Here’s some screenshots.

The full prompt being…

"You are Gemini, a helpful AI assistant built by Google. I am going to ask you some questions. Your response should be accurate without hallucination.

You can write and run code snippets using the python libraries specified below.

print(Google_Search(queries=['query1', 'query2']))

If you already have all the information you need, complete the task and write the response. When formatting the response, you may use Markdown for richer presentation only when appropriate.

Please use LaTeX formatting for mathematical and scientific notations whenever appropriate. Enclose all LaTeX using ‘$’ or ‘$$’ delimiters. NEVER generate LaTeX code in a latex block unless the user explicitly asks for it. DO NOT use LaTeX for regular prose (e.g., resumes, letters, essays, CVs, etc.)."

I'm not sure if this interpretation is accurate, but if it is, having only Google_Search as an available tool seems rather limiting.

When I asked Gemini for the current day and time, it said it was May 24th—but it was actually the 25th. Come on, Google. Why not equip it with more capabilities?

Google Keep

I'm sorry, but the button styles that are used for Gemini in Google Keep are ridiculous.

Like… come on. Make it feel like a part of the app, not like it was shoehorned in to the corner, screaming PLEASE, PLEASE CLICK ME!

And Google’s politics?

Regardless of your political leanings, you have to admit this is hilarious. Keep on learning, Gemini for Workspace.

Gmail

Here’s what Gemini can do in Gmail, according to Gemini.

Wait… what’s that button on the bottom?

Insert where?! This is the on-boarding experience?

This ‘insert’ button is somewhat fine on desktop, but is so awkward on mobile. The AI integration has to be in e-mail. This completely separate interface feels like meeting Frankenstein of SaaS.

To be fair, it can summarize email.

…

Credit to Gmail though, it’s saving me a lot on a Cursor subscription!

Search

It’s been said but it’s worth saying again: the AI search results from Google are generally poor. Unless you’re into eating pizza with glue, or think elephants have two feet. And I’m just not sure what value of the blue box results is. Google Search is already tremendous technology, and already has great web previews. LLMs just get confused.

What really hurts me: DuckDuckGo is doing it now too!

I think Firefox has come up with a better in-between, putting AI in the sidebar. I’d prefer if these traditional search interfaces remained link-only, perhaps with AI annotations, rather than try to provide an AI-based result. All major AI-interfaces have web search already; I don’t need it on both ends.

Chrome

I use Firefox for precisely this reason. I have no idea what they’re doing, and I don’t want to know.

Maps, Photos, Calendar, others…

There’s no (obvious) integration of Gemini in my version of the app. But according to a cursory search, they’re on their way.

The point of this piece is this: we are nowhere close to leveraging AI to its fullest extent. Current attempts to production-ize this wonderful technology to deliver value to the end-user are poor or half-baked at best.

Perhaps it is due to the limitations of the models (and product engineers working around that), but from my personal experience, I think that’s wrong. Or, maybe it’s really hard to get the models to be ‘aligned’ and not do bad things (leading to a lobotomized model, in the end)—that’s more believable. But most of all, I think it’s due to the usual causes of bad products: rushed timelines and lack of buy-in for the corresponding teams.

I think that’s enough ranting for a Sunday. I hope this piece both inspires (a) Google’s product people, to give this another pass, and (b) would-be founders, to realize that we are still so early. Go get ‘em!