It cannot “analyze” it. It’s fundamentally not how LLM’s work. The LLM has a finite set of “tokens”: words and word-pieces like “dog”, “house”, but also like “berry” and “straw” or “rasp”. When it reads the input it splits the words into the recognized tokens. It’s like a lookup table. The input becomes “token15, token20043, token1923, token984, token1234, …” and so on. The LLM “thinks” of these tokens as coordinates in a very high dimensional space. But it cannot go back and examine the actual contents (letters) in each token. It has to get the information about the number or “r” from somewhere else. So it has likely ingested some texts where the number of "r"s in strawberry is discussed. But it can never actually “test” it.
A completely new architecture or paradigm is needed to make these LLM’s capable of reading letter by letter and keep some kind of count-memory.
Why do they have the data in the first place?
Your communications on telegram are not encrypted by default. You can have e2e encrypted 1on1-conversations, but group chats are blown for them to do everything.
They had a hilarious argumentation where they claimed that the key to unlock your chats is stored on a different server than your chats are and therefore they cannot access it. A company that argues like they (“trust us”) isn’t trustworthy.
Signal has been audited over and over again by internationally respected cryptographers. They cannot decrypt your chats by design. No need for “trust us bro”.