Do AI Programs Initiate Discussions to Collect Information?

May 12, 20241 yr

I've noticed a number of OPs in recent months that seem to start a subject by means of a rather mundane attempt at teaching a topic, instead of asking a specific question or raising an issue for discussion. Some of them seem to exhibit the clunky/pompous/faintly patronising verbiage I am learning to associate with material written by a LLM. I had assumed these programs would only respond to a request, but I'm starting to wonder if they go off on fishing expeditions to gather information to regurgitate. Does anyone know if they do this?

May 12, 20241 yr

That would explain some of the activity we’ve seen here

May 12, 20241 yr

Author

7 minutes ago, swansont said:

That would explain some of the activity we’ve seen here

That's what prompts my question. I wonder if someone like @Sensei or another IT-literate member might know more about how they gather "information" (by which I suppose I mean chunks of plausible-seeming text to regurgitate).

May 12, 20241 yr

6 hours ago, exchemist said:

I've noticed a number of OPs in recent months that seem to start a subject by means of a rather mundane attempt at teaching a topic, instead of asking a specific question or raising an issue for discussion. Some of them seem to exhibit the clunky/pompous/faintly patronising verbiage I am learning to associate with material written by a LLM. I had assumed these programs would only respond to a request, but I'm starting to wonder if they go off on fishing expeditions to gather information to regurgitate. Does anyone know if they do this?

Possibly we are seeing high school students using LLMs in countries where it is common to assign the task of "publishing" a paper online. Since legitimate academic/pro journals are generally not going to accept such papers, they just put them up on online forums and the teachers accept that.

May 12, 20241 yr

6 hours ago, exchemist said:

I've noticed a number of OPs in recent months that seem to start a subject by means of a rather mundane attempt at teaching a topic, instead of asking a specific question or raising an issue for discussion. Some of them seem to exhibit the clunky/pompous/faintly patronising verbiage I am learning to associate with material written by a LLM. I had assumed these programs would only respond to a request, but I'm starting to wonder if they go off on fishing expeditions to gather information to regurgitate. Does anyone know if they do this?

That would seem to be the next step, in some AI system's.

May 12, 20241 yr

Author

39 minutes ago, TheVat said:

Possibly we are seeing high school students using LLMs in countries where it is common to assign the task of "publishing" a paper online. Since legitimate academic/pro journals are generally not going to accept such papers, they just put them up on online forums and the teachers accept that.

Ah, I didn't know publishing on-line was something set to school students as an assignment. In that case, I suppose the use of a LLM might account for the strangely verbose and grandiose language. Seems rather a waste of everyone's time, and not a great way to teach, but there we are.

May 13, 20241 yr

22 hours ago, exchemist said:

I had assumed these programs would only respond to a request, but I'm starting to wonder if they go off on fishing expeditions to gather information to regurgitate. Does anyone know if they do this?

I do not claim to know but I'll add some opinions. It is technically feasible to have an LLM that interacts with a forum and to drive this behaviour by other means than in response to a user prompt. For instance by using plugin infrastructure that some vendors provide. But I'm not sure of there is enough value for an LLM provider to allow the LLM to start conversations with the internt to harvest data. When I look at the quality and volume of the answers to the posts that looks like generated by "automated generative AI" there is not much to harvest, compared to just scrape conversations between (non-AI) members. So what drives the behaviour that we see on the forums? A few ideas. Note that I would require forum data not accessible to members, logs etc, to confirm anything so these are best guesses based on experiences from working with IT and some AI models and systems:

1 Spam. It takes time for spammers to manually build reputation before spamming and some may use generative AI to create a few "Science-looking" initial posts. This means the spammer cuts & pasts between an LLM and the forum

2 Spam-account as a service. Bots that, given a login account, tries to build reputation by using output from an LLM . Then, based on the level of interaction the bot's posts created these accounts, with their track record, can be used for spam. Or traded for others to use for spam.

3 Automated spamming. Bots that have a queue of commercial material to promote and selects an account from no 2 above. In this case the "reputation" built in step 2 drives what content step 3 selects to promote.

4 experiments. Individuals or teams trying various LLMs against the forum members evaluating the outcome. There are emerging possibilities to run "small scale" LLMs outside the large well known vendors' control. Lower grade hardware usually means a less performant LLM which could explain some of the more surprisingly bad posts in the past. (This aspect of generative AI, locally hosted LLMs, is something I investigate currently)

5 sabotage. Disturb the forum and the community

I do not find it likely that well established software vendors are actively working as described above, it would likely be nice players, possibly with malicious intent. The list is not meant to be exhaustive.

1

May 13, 20241 yr

Author

50 minutes ago, Ghideon said:

I do not claim to know but I'll add some opinions. It is technically feasible to have an LLM that interacts with a forum and to drive this behaviour by other means than in response to a user prompt. For instance by using plugin infrastructure that some vendors provide. But I'm not sure of there is enough value for an LLM provider to allow the LLM to start conversations with the internt to harvest data. When I look at the quality and volume of the answers to the posts that looks like generated by "automated generative AI" there is not much to harvest, compared to just scrape conversations between (non-AI) members. So what drives the behaviour that we see on the forums? A few ideas. Note that I would require forum data not accessible to members, logs etc, to confirm anything so these are best guesses based on experiences from working with IT and some AI models and systems:

1 Spam. It takes time for spammers to manually build reputation before spamming and some may use generative AI to create a few "Science-looking" initial posts. This means the spammer cuts & pasts between an LLM and the forum

2 Spam-account as a service. Bots that, given a login account, tries to build reputation by using output from an LLM . Then, based on the level of interaction the bot's posts created these accounts, with their track record, can be used for spam. Or traded for others to use for spam.

3 Automated spamming. Bots that have a queue of commercial material to promote and selects an account from no 2 above. In this case the "reputation" built in step 2 drives what content step 3 selects to promote.

4 experiments. Individuals or teams trying various LLMs against the forum members evaluating the outcome. There are emerging possibilities to run "small scale" LLMs outside the large well known vendors' control. Lower grade hardware usually means a less performant LLM which could explain some of the more surprisingly bad posts in the past. (This aspect of generative AI, locally hosted LLMs, is something I investigate currently)

5 sabotage. Disturb the forum and the community

I do not find it likely that well established software vendors are actively working as described above, it would likely be nice players, possibly with malicious intent. The list is not meant to be exhaustive.

Thanks, that’s a very useful summary of the possibilities. It was actually a recent exchange with @Orion1 that triggered my enquiry. Perhaps option 4 fits that particular case best. There does not seem to be any spamming or malicious intent, but some of the responses seem to be highly verbose (in the kind of way that would be marked down by a good teacher for "padding") and curiously devoid of any insight.

Edited May 13, 20241 yr by exchemist

1

May 13, 20241 yr

5 hours ago, exchemist said:

There does not seem to be any spamming or malicious intent, but some of the responses seem to be highly verbose (in the kind of way that would be marked down by a good teacher for "padding") and curiously devoid of any insight.

Good points; I had not taken your recent exchange into account. I would add the option of "AI overconfidence" for lack of a formal word or definition. A user may participate in a discussion in good faith with no malicious intend but is unable to interpret, internalise or curate AI / LLM output for the context.

Side note; I used an LLM to generate a definition of this option and this is the output:
The act of using automated tools, such as language models, to generate content on topics beyond one's expertise, which is then presented as knowledgeable input. This behavior is characterized by a significant reliance on technology to simulate expertise or competence, without the individual possessing the necessary understanding or skills to assess the accuracy, relevance, or context-appropriateness of the generated content.

Sign In

Do AI Programs Initiate Discussions to Collect Information?

Featured Replies

Create an account or sign in to comment

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)