Interesting difference from Reddit: Upvotes/Downvotes are not anonymous

o_o@programming.dev · 1 year ago

Interesting difference from Reddit: Upvotes/Downvotes are not anonymous

o_o@programming.dev · 1 year ago

Agreed from a technical standpoint.

But the implications are still interesting. One might (big might) trust Reddit as an organization not to use this data for evil, but with federation, there’s nothing stopping an instance from simply releasing all users’ voting history to be public.

Of course, my instance didn’t even ask for an email to sign up, so my entire account is anonymous that way.

I wonder if there are technical ways to federate votes anonymously?

UrbenLegend@lemmy.ml · 1 year ago

Yeah, I wonder how you can federate anonymously while still maintaining defenses against vote manipulation.

Zagorath@aussie.zone · 1 year ago

I think you could probably do something like have the votes be reported in aggregate by the instance.

Any individual instance admin could use defences against vote manipulation by their own users, and other instances’ admins could use defences against one particular instance being widely used for vote manipulation.

UrbenLegend@lemmy.ml · 1 year ago

I know some privacy oriented services (Brave Browser comes to mind) aggregate telemetry data like that to preserve privacy. Perhaps something like that is possible for Lemmy as well.

hare_ware@pawb.social · 1 year ago

Someone could just run a rogue instance host all their bots on there, hiding it from anyone else.

Zagorath@aussie.zone · 1 year ago

Right, but that’s where defederation comes in. Good faith admins can detect their own users and selectively ban them, while bad-faith admins running a server full of brigaders can be defederated if, for example, they detect anomalous patterns coming from that instance.

WalrusDragonOnABike@kbin.social · edit-2 1 year ago

but with federation, there’s nothing stopping an instance from simply releasing all users’ voting history to be public.

Which kbin.social does.

JackbyDev@programming.dev · 1 year ago

Maybe you could hash the user and post together somehow this way it is hashed but also unique per post. If you only hashed the username then the entirety of the user’s voting history would be known if the hash was reverted.

o_o@programming.dev · 1 year ago

Could be hashed and salted, with a random salt.

The trouble is, then, that it’s harder to disallow users from voting multiple times if the voting user isn’t on the post’s home instance.

CarbonIceDragon@pawb.social · 1 year ago

Couldn’t someone vote multiple times anyway by just having a bunch of different accounts?

o_o@programming.dev · 1 year ago

Yes, true, the current system does allow that. But the current system also doesn’t allow users to accidentally vote twice (and it remembers your vote)— this is the feature I think would be more challenging to implement if we were to hash & salt the user’s ID.

frostphunk@lemmy.world · 1 year ago

That’s always been a problem on Reddit and is on Lemmy now too though

kevincox@lemmy.ml · 1 year ago

Hashing can’t effectively protect known values. If you want to know if someone voted for a post you can just hash their username and post ID. This is trivial and cheap.

If you want to know who voted on a post you just find every username you can find and hash it. It isn’t super cheap but isn’t very expensive either. There are only 8G people on the planet, many bitcoin rigs can calculate this in seconds. Sure, you can use a more expensive hash and there may be more accounts than people but it will remain feasible.

This is the same reason you can’t hash phone numbers in a meaningful way.

The best option is probably just for the instance to report counts and you just have to trust it. If it is noticed that an instance seems to be inflating votes you stop counting its votes. People can work together to create blocklists for known cheating instances. Your instance would still know this but at least it is within your trust, not federated publicly.

JackbyDev@programming.dev · 1 year ago

Nah, if you can properly hash a password such that it doesn’t match the same properly hashed password from a different website then you can properly hash usernames in this case such that others couldn’t reverse it or put in the same input and get the same output you created. The technology is there. It’s more of a question if it’s really worth it. At least for now I’m not concerned with a malicious admin leaking someone’s vote history.

https://en.wikipedia.org/wiki/Salt_(cryptography)

kevincox@lemmy.ml · 1 year ago

No, hashing passwords is a different case because you know what the user is so you can use a unique salt. The password itself is also high entropy. For this use cause you can have at best per-post salt.

Think about it. The task that you are asking for is to quickly check if a user has voted for a post to prevent duplicates. So literally the operation you want is the same as you are trying to prevent. If you can enumerate users then you an by definition check if they have voted for a post.

CorrodedCranium@lemmy.fmhy.ml · edit-2 1 year ago

But the implications are still interesting. One might (big might) trust Reddit as an organization not to use this data for evil, but with federation, there’s nothing stopping an instance from simply releasing all users’ voting history to be public.

Another potential privacy issue is that deleted content stays on server and I believe it’s similar with posted images.

o_o@programming.dev · 1 year ago

I think this issue is overblown. Instances of Lemmy might run modified code and choose to save things that the user intended to delete, of course, but the default setup of Lemmy seems reasonable to me in terms of how it treats deletion.

Currently it keeps deleted posts forever to allow users to un-delete if they choose, but deleting your account clears everything. And I believe there’s work in progress to discard deleted posts after 30 days. Details here: https://github.com/LemmyNet/lemmy/issues/2977

CorrodedCranium@lemmy.fmhy.ml · edit-2 1 year ago

Thank you for pointing this out. I was looking into privacy in relation to Lemmy and came across this post where I got the wrong idea I guess. I couldn’t find much else online at the time

And I believe there’s work in progress to discard deleted posts after 30 days.

That would be a nice addition

sinnerdotbin@lemmy.ca · edit-2 1 year ago

This keeps on being asserted but it is far from true. If defederation happens or your local goes offline, posts/comment history/profile/votes will remain on other widely used instances and out of your control.

A large instance has already defederated with 2 other larger instances. If you run a personal instance I feel it will become very, very common to be be locked out of managing your data.

You can expect defederation to happen all the time as that is a deliberate part of the open federated model.

And that is to say nothing about federation simply breaking sometimes.

I already have been locked out of content that exists on other instances that will remain forever and I’ve only been around a short while. I don’t care personally, but people keep asserting this claim that only bad actors or scrapers will dupe your data. Federated data is very different than a non-federated copy for many reasons and that matters to some people. Everyone should understand deleting your account, or modifying your content will often not remove your content outside your instance, and many people engage outside their local. It will likely exist in federated, Lemmy searchable form forever in some capacity (in the current iteration anyway).

Not trying to spread FUD, but if we want to maintain users they have to be educated as they will find out eventually and not be happy.

I have some working drafts on policies for admins to help them navigate and explain their responsibilities to their users.

It is a bit of a weird read outside of the context, but this is an optional primer I have drafted that will hopefully help explain the distinctions:

https://github.com/BanzooIO/federated_policies_and_tos/blob/main/optional-privacy-policy-intro.md

o_o@programming.dev · edit-2 1 year ago

Yes, that’s a fair point. Just because you send a “I have deleted this message” signal out into the universe doesn’t mean that everyone will receive or obey it.

I assumed that was understood.

But that’s very different from instances intentionally and malevolently keeping data despite indicating to users that it was deleted, which is what I think folks’ privacy concerns are about.

EDIT: What I mean is that the federation model is inherently non-private in a certain sense (but in the same sense that someone could take a screenshot of your Reddit comment and your deleting your comment won’t delete their copy). But Lemmy is not egregiously misusing data.

sinnerdotbin@lemmy.ca · edit-2 1 year ago

This is largely assumed by someone like yourself or I who understands the implications. I am finding it evident that a lot of people are not aware.

There is also a distinction to a potential screenshot, a scrape or archive no one visits, and a federated copy on a widly used instance you have lost access to.

I edited my comment above to include a project I am working on to hopefully help admins get this across and educate users on how to appropriately engage to their comfort level.

o_o@programming.dev · 1 year ago

I appreciate your commitment to this privacy consideration. I personally don’t think it’s the hill I’d prefer to die on, but I welcome your contributions.

sinnerdotbin@lemmy.ca · 1 year ago

Thanks! I’m for mass adoption and want admins to succeed. That starts with keeping users educated (and admins covered).