How to spot human writing on the internet?

Shreyas Prakash headshot

Shreyas Prakash

In the classic Turing Test, a computer is considered intelligent if it can convince a human that it’s another human in a conversation. At that time, human-generated content dominated the internet.

But that was a decade ago. Today, the landscape has shifted dramatically. AI-generated content now rivals, and in some cases outpaces, human-created material.

According to the ‘expanding dark forest’ theory—

4chan proposed years ago: that most of the internet is “empty and devoid of people” and has been taken over by artificial intelligence. A milder version of this theory is simply that we’re overrun . Most of us take that for granted at this point.

As this dark forest expands, we will become deeply sceptical of one another’s realness.

The dark forest theory of the web points to the increasingly life-like but life-less state of being online. Most open and publicly available spaces on the web are overrun with bots, advertisers, trolls, data scrapers, clickbait, keyword-stuffing “content creators,” and algorithmically manipulated junk.

Souce: Maggie Appleton

While the web keeps getting infested with this content, it’s a good junction for us to think of the absolute reverse of the Turing test experiment — In this flood of bot-generated word salads, how human are you?

In a reverse turing test, instead of a machine trying to pass as human, a human attempts to pass as a machine. This can involve humans trying to convince an AI or a panel of robots that they are one of them, often by mimicking machine-like responses or behaviors. CAPTCHA is a very traditional example of a reverse turing test which is used in websites to distinguish human users from bots.

CAPTCHA, a standard example of a reverse turing test.

Mechanical Turks are another example. It’s a crowdsourcing internet marketplace that enables individuals and businesses (Requesters) to coordinate the use of human workers (Workers/Turkers) to perform tasks that computers are currently unable to do.

You also have more recent examples in which one of the NPC characters in a game are being impersonated by a human, and the rest of the AI characters need to figure out who amongst them is a human being. This video is both eerie and fascinating at the same time:

A group with the most advanced AIs of the world try to figure out who among them is the human.

How might we (as humans) distinguish ourselves from that of an LLM?

At this point in time, bot-generated writing are relatively easy to detect. Wonky analogies, weird sentence structures, repetition of common phrases, and some psuedo-profound bullshit. For example: “Wholeness quiets infinite phenomena” is a total “bullshit”. It means nothing, and people can still rank the phrase as profound. If you’re still not convinced, try searching Google for common terms. You will find SEO optimizer bros pump out billions of perfectly coherent but predictably dull informational articles on every longtail keyword combination under the sun.

However, as time passes, the ability for humans to distinguish humans from AI-generated content might get increasingly harder.

This is what we’re competing against. And as we continue to read more such articles, we can just ‘smell’ an AI-generated article from a distance. Crawl the web long enough, and you will find certain words being used repeatedly. ‘Elevate’, and ‘delve’ are perhaps the worst culprits, with the former often appearing in titles, headings and subheadings.

Language models also have this peculiar habit of making everything seem like a B+ college essay.

‘When it comes to…’.

‘Remember that…’.

‘In conclusion…’.

‘Let’s dive in to the world of…’.

Remarkable. Breakthrough. State-of-the-art. The rapid pace of development. Unprecendented. Rich tapestry. Revolutionary. Cutting-edge. Push the boundaries. Transformative power. Significantly enhances..

Apart from some favourite phrases, Language models also have some favourite words.

Explore. Captivate. Tapestry. Leverage. Embrace. Resonate. Dynamic. Delve. Elevate. And so on.

The final giveaway that a piece of content is a copy and paste job from a generative AI tool is in the formatting. I asked ChatGPT recently to write a guide on Language models, and this is what it came up with:

GPT4 generated output from Perplexity

Each item is often highlighted in bold, and then ChatGPT likes to throw in a colon to expand upon each point. There’s nothing inherently wrong with presenting lists in this fashion, but it has become its signature style and can therefore be easily identified as AI content.

Another way to pass the reverse turing test is to communicate intentionality. One of the best ways to prove you’re not a predictive language model is to demonstrate critical and sophisticated thinking, You’re not just mashing random words together.

Humans can also humanise their content even further is by making it personal. In the essay titled The Goldilocks Zone, by Packy McCornick, he argues that so far we’ve continued to think of AI progress in terms of OOMs of Effective Compute. It’s a great measure, and can help us with a lot of useful things.

Continuing to scale knowledge won’t magically scale agency. 

Packy argues that even if the OOMs of Effective Compute increases linearly (and we should trust the curve here), we should disagree on the mappings. Language models don’t have agency and drive. Continuing to scale knowledge wouldn’t magically scale agency too. They’re on a different axis altogether.

Coming back to our original topic, sprinkling a personal narrative would be our best bet on the short term to differentiate our writing from predictive language models. As the AIs are not scaling up their ability to have agency and drive (not yet).

To summarise (this might sound like an AI-generated closing phrase, but I promise, I’m a human being), these are some tactics humans could adopt to distinguish themselves from AI while writing:

  • Hipsterism (Packy McCornick and Tim Urban’s visuals are great examples of non-conformism incorporating pencil sketches, and handwritten annotations)

  • Recency bias (some language models might have a knowledge cutoff)
  • Referencing obscure concepts. Friedrich Nietzsche’s writing seems more Nietzschean because of his use of —übermensch, ressentiment, herd, dionysian/apollonian etc. Michel Foucault’s writing sounds more Foucaultian because of his use of archaeology of knowledge, genealogy, biopower, and panopticism etc.
  • Referencing friends who are real but not famous
  • Niche interests (nootropics, jhanas, nondual meditation, alexander technique, artisanal discourse, metta, end time-theft, gurdjieff, zuangzhi, flat pattern drafting, parent figure protocol etc (you get the drift))
  • Increasing reliance on neologisms, protologisms, niche jargons, euphemism emojis, unusual phrases, ingroup dialects, and memes-of-the-moment.

Examples of neologisms, and protologisms: lingo used by Google engineers that may not be widely known outside the company could be considered protologisms within that specific context

  • Referencing recent events you might have attended, in-person gatherings etc. Your current social context acts as a differentiator. LLMs are predominantly trained on the generalised views of a majority English-speaking, western population who have written a lot on Reddit and lived between 1900 and 2023. As anthropologists would like to term it, these are Western Educated Industrialised Rich Democratic (WEIRD) Societies, their opinions and perspectives. This clearly does not represent all human cultures and languages and ways of being.
  • Last but not least, referencing personal events, narratives and lived experiences.

Communicating your drive and personal narrative wherever possible.

Language models don’t have that, yet.

Subscribe to get future posts via email (or grab the RSS feed). 2-3 ideas every month across design and tech

Read more

  1. Breadboarding, shaping, slicing, and steelthreading solutions with AI agentsproduct-management
  2. How I started building softwares with AI agents being non technicalagentic-engineering
  3. Legible and illegible tasks in organisationsproduct
  4. L2 Fat marker sketchesdesign
  5. Writing as moats for humanswriting
  6. Beauty of second degree probesdecision-making
  7. Read raw transcriptsknowledge
  8. Boundary objects as the new prototypesprototyping
  9. One way door decisionsproduct
  10. Finished softwares should existproduct
  11. Essay Quality Rankerobsidian
  12. Export LLM conversations as snippetsbrowser-extension
  13. Flipping questions on its headinterviewing
  14. Vibe writing maximswriting
  15. How I blog with Obsidian, Cloudflare, AstroJS, Githubwriting
  16. How I build greenfield apps with AI-assisted codingai-coding
  17. We have been scammed by the Gaussian distribution clubmathematics
  18. Classify incentive problems into stag hunts, and prisoners dilemmasgame-theory
  19. I was wrong about optimal stoppingmathematics
  20. Thinking like a ship
  21. Hyperpersonalised N=1 learningeducation
  22. New mediums for humans to complement superintelligenceai-coding
  23. Maxims for AI assisted codingai-coding
  24. Personal Website Starter Kitai-coding
  25. Virtual bookshelvesaesthetics
  26. It's computational and AI everythingai-coding
  27. Public gardens, secret routesdigital-garden
  28. Git way of learning to codeai-coding
  29. Kaomoji generatorsoftware
  30. Copy, Paste and Citecuriosities
  31. Style Transfer in AI writingai-coding
  32. Understanding codebases without using codeai-coding
  33. Vibe coding with Cursorai-coding
  34. Virtuoso Guide for Personal Memory Systemsmemory
  35. Writing in Future Pastwriting
  36. Publish Originally, Syndicate Elsewhereblogging
  37. Poetic License of Designdesign
  38. Idea in the shower, testing before breakfastsoftware
  39. Technology and regulation have a dance of ice and firetechnology
  40. How I ship "stuff"software
  41. Weekly TODO List on CLIcli
  42. Writing is thinkingwriting
  43. Song of Shapes, Words and Pathscreativity
  44. How do we absorb ideas better?knowledge
  45. Read writers who operatewriting
  46. Brew your ideas lazilyideas
  47. Vibescreativity
  48. Trees, Branches, Twigs and Leaves — Mental Models for Writingwriting
  49. Compound Interest of Private Notesknowledge
  50. Conceptual Compression for LLMsai-coding
  51. Meta-analysis for contradictory research findingsdigital-health
  52. Beauty of Zettelswriting
  53. Proof of workproduct
  54. Gauging previous work of new joinees to the teamleadership
  55. Task management for product managersproduct
  56. Stitching React and Rails togetherai-coding
  57. Exploring "smart connections" for note takingknowledge
  58. Deploying Home Cooked Apps with Railssoftware
  59. Self Marketing
  60. Repetitive Copypromptingwriting
  61. Questions to ask every decadejournalling
  62. Balancing work, time and focusproductivity
  63. Hyperlinks are like cashew nutswriting
  64. Brand treatments, Design Systems, Vibesdesign
  65. How to spot human writing on the internet?writing
  66. Can a thought be an algorithm?product
  67. Opportunity Harvestingcareers
  68. How does AI affect UI?design
  69. Everything is a prioritisation problemproduct-management
  70. Nowlifestyle
  71. How I do product roastsproduct
  72. The Modern Startup Stacksoftware
  73. In-person vision transmissionproduct
  74. How might we help children invent for social good?social-design
  75. The meeting before the meetingmeetings
  76. Design that's so bad it's actually gooddesign
  77. Breaking the fourth wall of an interviewinterviewing
  78. Obsessing over personal websitessoftware
  79. Convert v0.dev React to Rails ViewComponentsrails
  80. English is the hot new programming languagesoftware
  81. Better way to think about conflictsconflict-management
  82. The role of taste in building productsdesign
  83. World's most ancient public health problemsoftware
  84. Dear enterprises, we're tired of your subscriptionssoftware
  85. Products need not be user centereddesign
  86. Pluginisation of Modern Softwaredesign
  87. Let's make every work 'strategic'consulting
  88. Making Nielsen's heuristics more digestibledesign
  89. Startups are a fertile ground for risk takingentrepreneurship
  90. Insights are not just a salad of factsdesign
  91. Minimum Lovable Productproduct
  92. Methods are lifejackets not straight jacketsmethodology
  93. How to arrive at on-brand colours?design
  94. Minto principle for writing memoswriting
  95. Importance of Whytask-management
  96. Quality Ideas Trump Executionsoftware
  97. How to hire a personal doctor
  98. Why I prefer indie softwareslifestyle
  99. Use code only if no code failscode
  100. Personal Observation Techniquesdesign
  101. Design is a confusing worddesign
  102. A Primer to Service Design Blueprintsdesign
  103. Rapid Journey Prototypingdesign
  104. Directory Structure Visualizercli
  105. AI git commitscli
  106. Do's and Don'ts of User Researchdesign
  107. Design Manifestodesign
  108. Complex project management for productproducts
  109. How might we enable patients and caregivers to overcome preventable health conditions?digital-health
  110. Pedagogy of the Uncharted — What for, and Where to?education
  111. Future of Ageing with Mehdi Yacoubiinterviewing
  112. Future of Equity with Ludovick Petersinterviewing
  113. Future of Mental Health with Kavya Raointerviewing
  114. Future of Tacit knowledge with Celeste Volpiinterviewing
  115. Future of Rural Innovation with Thabiso Blak Mashabainterviewing
  116. Future of unschooling with Che Vanniinterviewing
  117. Future of work with Laetitia Vitaudinterviewing
  118. How might we prevent acquired infections in hospitals?digital-health
  119. The soul searching yearsentrepreneurship
  120. Design education amidst social tribulationsdesign
  121. How might we assist deafblind runners to navigate?social-design