• 1 Post
  • 38 Comments
Joined 1 year ago
cake
Cake day: June 5th, 2023

help-circle

  • Servers not having the same content in their “all” feeds is not a bug, it’s by design. The design philosophy for Mastodon (and I’d say the fediverse as a whole) is to let the users curate their own feeds instead of showing them everything or algorithmically guessing what they might be interested in. Servers will only receive posts from accounts that at least one of this server’s accounts is subscribed to. Having every post federate to every server even if nobody there is interested in those posts would be a waste of resources.

    Yes, that makes discovery of new content significantly harder but that’s the tradeoff for being able to host your own small instance without the need for a super powerful server. I can run my instance that serves just a couple of users on a 10-year-old server that runs a dozen other things at the same time. We see the stuff we’re interested in and don’t have to spend disk space, processing power and network bandwidth on content none of us will ever read and neither do we have to spend those resources on sending our posts to other instances where nobody will read them.






  • No joke here. Large language Models (which people keep calling AI) have no way of checking if what they’re saying is correct. They are essentially just fancy text completion machines that answer the question what word comes next over and over. The result looks like natural language but tends to have logical and factual problems. The screenshot shows an extreme example of this.

    In general, never rely on any information an LLM gives you. It can’t look up external information that wasn’t in its training set. It can’t solve logic problems. It can’t even reliably count. It was made to give you a plausible answer, not a correct one. It’s not a librarian or a teacher, it’s an improv actor who will „yes, and“ everything. LLMs will often rather make up information than admit that they don’t know. As an easy demonstration, ask ChatGPT for a list of restaurants in your home town that offer both vegan and meat-based options. More often than not, it will happily make you a list with plausible names and descriptions but when you google them, none of the restaurants actually exist.


  • Plus people apparently don’t know what „algorithm“ means. Sorting by average rating is an algorithm. Filtering by genre is an algorithm. Anything that takes an input (a database of books), performs a discrete set of steps and produces an output (an ordered list of books) is an algorithm. Even if it’s not performed by a computer but yourself standing in front of your bookshelf.




  • When will people learn that LLMs have no understanding of truth or facts? They just generate something that looks like it was written by a human with some amount of internal consistency while making baseless assumptions for anything that doesn’t show up (enough) in their training set.

    That makes them great for writing fiction but try asking ChatGPT for the best restaurants in a small town. It will gladly and without hesitation list you ten restaurants that have never existed, including links to websites that may belong to a completely different restaurant.





  • I guess there are different opinions about what downvotes are for. Personally I think they shouldn’t reflect if something matches my opinion but if it’s worth reading. For me, a downvote says “this is badly written”, “this is rude” or in general “this shouldn’t be on people’s front page”. I will gladly upvote a post/comment that contradicts my personal belief if the author put effort into it.


  • They can siphon your data no matter what you do. As I’ve said in other comments, everything on the internet has been crawled and scraped for literal decades. This post is already indexed by a bunch of different search engines and most likely by some other scrapers that harvest our data for AI or ad profiles. And you can do nothing about it without hurting your legitimate audience. Nothing at all. There’s robots.txt as a mechanism to tell a crawler what it should or shouldn’t index but that’s just asking nicely (mostly to prevent search engines from indexing pages that don’t contain actual content). You could in theory block certain IP ranges or user agents but those change faster than you can identify them. This dilemma is the whole reason why Twitter implemented rate limiting. They wanted to protect their stuff from scrapers. See where it got them.

    Most important rule of the internet: if you don’t want something archived forever, don’t post it!




  • I don’t know if gatekeeping is the right word but some people treat you like traitors if you even suggest that federating with Meta might be a valid option. Just look at the upvote/downvote ratio of this post and some comments I got. Some people are very entrenched in their opinion and I wouldn’t be surprised to soon see posts with “We must defederate from everyone who federates with Meta”.