What’s stopping me from doing this? Here we go:
I’m going to start an instance and federate with everyone who will allow it, which is most instances including this one, I believe.
Then I’m going to feed all that data into my new website, called Open Lemmy Stats, where anyone can query the user data ive accumulated. The homepage will be ripe with insights, leaderboards and all kinds of data on prolific users.
Additionally, I’ll display a snapshot/profile of a random user by feeding that users data to GPT4 to make inferences about the user’s political affiliations and display the results.
Worst of all, I’m not going to out my instance for everyone to know it as the one to defederate. In fact, I’m spinning up a few instances that will host innocuous communities that I plan to mod and support to give my instances cover for their true purpose: redundant fediverse datastreams for my site, Open Lemmy Stats.
I’ll also have a store where anyone can buy my collected fediverse data for a handsome sum.
Just kidding I’m not doing any of this. But someone absolutely will or already is working on it. They’ll make a good bit of money too, I’d bet.
This is inspired by a recent post on youshouldknow@lemmy.world where someone highlighted what kind of data instance admins have access to, even for users not on their instance.
I wanted to share this to start a discussion that I find interesting. I’m interested in your thoughts, or to hear more on why this may or may not be possible and if it is, maybe some ideas how to fix that? because obviously such a site would be problematic, but no doubt popular for oh so many reasons.
Edit: typo, I called admins adminis. Corrected.
Edit 2: wanted to credit the post I was referencing from YSK, here it is - https://lemmy.world/post/1033769
I was thinking yay that sounds like an awesome data visualization platform, that would be great. Until I got to the “just kidding” part.
You are right, all this information is readily available. And we would be really naive if we think that no one is collecting this yet.
You, or someone else, should build this, such that it is clearly visible for everyone what data is available. And not just visible to the select few who builds their own closed data mining systems.
This would be a pretty bad idea. Not only are companies going to steal all the data from that site, but its going to lead to people going through every user’s history to block people who don’t have the same “color politics” as them. Its going to lead to hyper echo chambers, even worse than other social platforms.
I think it would be better if this data is obfusicated even from instance admins. Does this present a bigger challenge in identifying malicious users? Probably, yes. However, it protects the Fediverse first and foremost from the vampire companies stealing consumer data, and protects the Fediverse from becoming the loudest echo chamber on the planet.
Differing opinions, viewpoints, and politics are important to genuine discussion. These “color politics” don’t have to even be part of the discussion to influence what people say. I don’t know about you, but being in a thread where people only ever agree with me and offer no alternative ideas is not a place I want to spend a lot of time in. Because who knows, maybe my ideas are wrong, and I might (shudder), change my mind.
Hey, I completely agree with you, in that the most interesting discussions are among groups where I don’t agree with everyone. This is where I learn and grow as a person.
But in saying that, aren’t you also saying that some people, like you and me, would not use such a database to filter out the users we do not agree with?
And would it not be a logical conclusion to make, that people who likes to build and stay in their echo chambers, would not be more inclined to listen to different opinions just because they don’t have a more efficient tool to sort out people they disagree with?
What I am saying is, all information that is technically available will be collected and analysed. Better make a public and open platform showing everything, such that everyone can see exactly what can be collected and surmised from the already public information, than to keep users blind from what information they actually leak publically.