What does your email address say about you?

LLMs are famous for trained in large data. Estimated for GPT4, eg, provide training data Sizes up to 1 petabyte of data. This training data comes from crawling on open internet, as well as collections of books, articles, science papers, etc.
It means so LLMs know more than you can imagine. They all know everything in Wikipedia, they read every book written (if it is scanned, at least). They can also find every social media post you wrote, every product review and every YouTube comment.
Obviously, it’s a big worry! This blog post checks this subject, and also provides a fun tool for users to see what an LLM is aware about their email address.
THE FOR PRIVATE
One of AI’s largest concerns is the potential for models that are unintentionally memorable and disclose sensitive information from their training data. New research suggested that this risk increases the size of the model – more extensive models can be easier to reveal sensitive information they train.
As a safe example, NYT gathered in evidence Right GPPS4 saves the full news article in this model, which can avoid proper prompting.
Most AI providers carry out protections against direct revelation of personal information, but are they good? they have different policy allowing users to disable the model from using new information they providedBut what about the old information living on the open internet?
A simple test with copilot or cursor can reveal what LLM knows about you. In fact, you can create a user data structure, and see if LLM autocompletes are your correct email. See screenshot.
For what is worth, writing this article, LLM does not complete my email address. However, in the past when I tried Github’s copilot, it really completed my email address. One can easily imagine scenarios where it may be bad – if it autocompletes social security numbers, credit card numbers, API keys, etc.
The art of participating
Scrubbing PII and adding guards against PII exposure actively working and often resolved for common use cases, such as the email address explored above. But did it solve all?
Here is where it is interesting: like psychological humanities that make educated men based on subtle visuals and behaviors that can impress individuals about the individuals based on little information. It is not about the revelation of the memorized data – about identification of statistics pattern and correlation.
Consider these similarities in the world:
- Psychology observes clothing options, speaking patterns, and body language
- Astrologers make wide statements that can be used by many people
- Personality tests like Myers-Briggs use answers to specific questions to make wider identifications
Your email address: a digital crystal ball
Your email address can reveal more about you than you think. Let’s break what AI can do that may not be counted:
- Age Age: Email emails and name conventions can suggest generation integration. For example, a zoomer cannot have an email address ‘@aol’ email!
- Professional background: Username domain names and structures may indicate industry or occupation
- Cultural Spanish: Language standards of Usernames may suggest culture or linguged heritage
- Interests and Hobbies: Numbers or references to email addresses often appear in personal interest
- Location: Domain extensions and service providers can identify geographical location
- Gender: Names are often used in emails, which can be revealed sex.
“What’s the matter?” You may think. Well, this information can be very valuable to Internet applications, especially ad tech. If you are a privacy-sensitive person, you can think twice how your email is name.
Try it yourself
Know about what your email address can be revealed about you? I have created an interactive tool using AI to analyze email addresses and generate insights. Analysis is made of an LLM. It means a fun tool to describe the subject, and not very serious.
The technical part
For technically shocked, the tool is simple:
- It passes the LLM email address, with a quick, asking it to verify information about the user.
- Email is not stored, and the IP address is not available.
- We have used TEACHER To get a formatted response, as well as a summary.
- Frontend uses the next and vercels to display the UI.
https://www.maximepeabody.com/images/emailreading.jpg
2025-02-19 19:01:00