[Discussion] Reddit-like aspects of Lemmy that make no sense in a federation.

Lvxferre@lemmy.ml · 11 months ago

Aaaaah. I really, really wanted to complain about the excessive amount of keys.

(My comment above is partially a joke - don’t take it too seriously. Even if a new key was added it would be a bit more clutter, but not that big of a deal.)

Lvxferre@lemmy.ml · edit-2 11 months ago

The source that I’ve linked mentions semantic embedding; so does further literature on the internet. However, the operations are still being performed with the vectors resulting from the tokens themselves, with said embedding playing a secondary role.

This is evident for example through excerpts like

The token embeddings map a token ID to a fixed-size vector with some semantic meaning of the tokens. These brings some interesting properties: similar tokens will have a similar embedding (in other words, calculating the cosine similarity between two embeddings will give us a good idea of how similar the tokens are).

Emphasis mine. A similar conclusion (that the LLM is still handling the tokens, not their meaning) can be reached by analysing the hallucinations that your typical LLM bot outputs, and asking why that hallu is there.

What I’m proposing is deeper than that. It’s to use the input tokens (i.e. morphemes) only to retrieve the sememes (units of meaning; further info here) that they’re conveying, then discard the tokens themselves, and perform the operations solely on the sememes. Then for the output you translate the sememes obtained by the transformer into morphemes=tokens again.

I believe that this would have two big benefits:

The amount of data necessary to “train” the LLM will decrease. Perhaps by orders of magnitude.
A major type of hallucination will go away: self-contradiction (for example: states that A exists, then that A doesn’t exist).

And it might be an additional layer, but the whole approach is considerably simpler than what’s being done currently - pretending that the tokens themselves have some intrinsic value, then playing whack-a-mole with situations where the token and the contextually assigned value (by the human using the LLM) differ.

[This could even go deeper, handling a pragmatic layer beyond the tokens/morphemes and the units of meaning/sememes. It would be closer to what @njordomir@lemmy.world understood from my other comment, as it would then deal with the intent of the utterance.]

Lvxferre@lemmy.ml · 11 months ago

Soap and water do wonders for 90% of the restroom cleaning.

The problem is that the other 10% are important too.

Lvxferre@lemmy.ml · 11 months ago

Not quite. I’m focusing on chatbots like Bard, ChatGPT and the likes, and their technology (LLM, or large language model).

At the core those LLMs work like this: they pick words, split them into “tokens”, and then perform a few operations on those tokens, across multiple layers. But at the end of the day they still work with the words themselves, not with the meaning being encoded by those words.

What I want is an LLM that assigns multiple meanings for those words, and performs the operations above on the meaning itself. In other words the LLM would actually understand you, not just chain words.

Lvxferre@lemmy.ml · 11 months ago

Yup, that’s the stuff. It’s mostly a finishing touch, to get rid of bacteria.

Lvxferre@lemmy.ml · edit-2 11 months ago

At the very least, I’d recommend you:

gloves - because you’ll get really close to that gross shit. You don’t want to touch it.
a sponge - it doesn’t need to be new; your old kitchen sponge is enough, just don’t use it again in the kitchen. Use the yellow side to spread the cleaning agent, and the green side to remove obnoxious grime stuck to something. (Do it gently, and only with a really old sponge, to avoid scratching the surface.)
a bucket - mostly to mix some soap and water.
a dry rag - mostly for finishing/drying. A cringey old shirt that you won’t be using again is usually enough.
toilet brush - don’t use the sponge to clean inside the toilet bowl; you’ll be spreading the bacteria from your shit and piss to the rest of the restroom.

Everyone has the cleaning agents that they swear upon, so look for something that works for you. For me it’s

alcohol vinegar - to get rid of that brown crust in the sink (water in my city is hard as a brick) and around the shower drain. I usually apply it, wait a few minutes, then use the sponge to scrub it a bit. Then I remove the vinegar with the rag.
bleach - exclusively used inside the toilet bowl. I squish some bleach there, then scrub it with the toilet brush, then flush it off, making sure that there’s no bleach behind.
disinfecting agent - I squish a bit of that inside the toilet bowl and just leave it there. It smells good, and it gets rid of the bacteria.
an ammonium-based cleaning agent - I squish it on obvious grime on the walls (except the above), then scrub it with the sponge.
soap and water - to “wash” the walls with the sponge.
plain water with some disinfecting agent - to rinse it. Then I just remove the excess water with the rag and let the restroom to dry naturally (with closed doors otherwise my cats will step on the bathroom, step outside, and now I got to clean the bathroom again plus the corridor and furniture).

Important detail: do not mix any two of the cleaning agents that I’ve mentioned. Specially not ammonium and bleach.

For reference, the disinfecting agent that I use is called “pinho sol”, but I have no idea if it’s sold outside Brazil. You probably have some similar product wherever you live.

Lvxferre@lemmy.ml · 11 months ago

Complexity does not mean sophistication when it comes to AI and never has and to treat it as such is just a forceful way to make your ideas come true without putting in the real effort.

It’s a bit off-topic, but what I really want is a language model that assigns semantic values to the tokens, and handles those values instead of directly working with the tokens themselves. That would be probably far less complex than current state-of-art LLMs, but way more sophisticated, and require far less data for “training”.

Lvxferre@lemmy.ml · 11 months ago

creating a label and checking the skip invoice box

That works great too, specially if you want to use less foolproof filters. Or even a mix of both strategies.

Lvxferre@lemmy.ml · 11 months ago

Oh “great”, more crap between Ctrl and Alt.

[Grumpy grandpa] In my times, the space row only had five keys! And we did more than those youngsters do with eight, now nine keys!

Lvxferre@lemmy.ml · 11 months ago

Thank you! It’s working now.

Lvxferre@lemmy.ml · 11 months ago

It’s giving me an error, “Error Finding Entity // Make sure you spelled the entity correctly and that it exists!”, when I use my username for lemmy.ml; curiously it works well when I do it for my beehaw.org account.

Lvxferre@lemmy.ml · edit-2 11 months ago

[Note: this is my personal take, not Chomsky’s]

We can recognise colours and things even without properly labelling them. (Colour example: I have no clue on how to call the colour of my cat’s fur, but I’m fairly certain to remember thus recognise it.) However, it’s hard to handle them logically this way.

if you are outside and it is raining, then you get wet
if you get wet, you might get sick
so if you are outside and it is raining, you might get sick

And at least for me this is the main role of the internal monologue. It isn’t just about repeating the state of the things, it’s about connecting pieces of info together, as if I was explaining the link to another person.

Perhaps those without verbal internal monologue/dialogue have a more persistent innate language, that is not overwritten by common external language?

Possible; I don’t know, really. It’s also possible that the “innate language” doesn’t really exist, only the innate ability to learn a language; but that ability is already enough to structure simple reasoning.

Lvxferre@lemmy.ml · edit-2 11 months ago

If you want, you could use GMail filters to delete those emails automatically. Here’s how:

click the engine button (settings), then “see all settings”, then “filters and blocked addresses”.
click “create a new filter”. Add “top of Google search” to the field “has the words”, leave other fields blank.
click “create filter”, then check the “delete it” box, then “create filter” again.
repeat steps 2-3 for other shit that SEO spam is likely to mention.

Important: never use as a filter anything that legitimate users might reasonably say. Only things that you’re fairly certain to come from a spammer.

EDIT: I repeated two steps without noticing it. My bad.

Lvxferre@lemmy.ml · 11 months ago

I don’t understand, why are you calling the other poster racist? I’m so confused… everything that he said is true. Source: I’m a gratch.

Lvxferre@lemmy.ml · 11 months ago

Chomsky’s concept of UG (universal grammar) is able to handle this. Since there would be a chunk of language that is innate (universal), that feral child would share it. So, as a conclusion from that, even if the feral child isn’t expressing it through vocalisation, since they lack an “application” of the UG (like Nahuatl, Mandarin, Quechua, English, Kikongo etc.), they’d still have some rather simple internal monologue.

…that said I think that Chomsky’s UG is full of shit. I do agree with him that the faculty of language might have developed first to structure thought; but my reasoning resembles a bit more yours, the role of language would be to formalise thought. Thinking without language is possible in the same way as moving across a village without roads - it’s doable but clunky, and you’ll likely take far more effort than with proper roads/ a language.

Not to challenge Chomsky on his own turf

Don’t worry. Everyone and their dog challenges him. Including himself, he’s often contradicting his own earlier statements.

Lvxferre@lemmy.ml · 11 months ago

Got it - mostly politics, then. That explains a lot why you guys are seeing far more toxicity than I do, I don’t generally join political discussions. (And when I do, since I’m myself communist, perhaps I don’t even notice it.)

Lvxferre@lemmy.ml · edit-2 11 months ago

That hints me that what people here is calling “toxic” is politics-related, since I’m a lemmy.ml user and I certainly would not say that my experience here is overall “toxic”.

And, funnily enough, most of the issues that I had were with users from either lemmy.world or sh.itjust.works; sometimes lemm.ee.

Lvxferre@lemmy.ml · edit-2 11 months ago

It depends a lot on what you consider “toxic”.

If it’s just about intrusive off-topic political discussion, then I fully agree with you: it’s far more common in Lemmy than in Reddit, and sometimes it reaches a point that even people who’d otherwise enjoy discussing politics roll their eyes and say “not this shit again”.

However, if “toxic” includes other forms of undesirable behaviour, then Lemmy is probably less toxic than Reddit. For example: while sometimes you do see here disingenuous and deliberate stupidity, “waah TL;DR!!”, the “I don’t understand” conveying disagreement, or passive aggressiveness, in Reddit they pop up all the time.

So, what do you consider toxic? Depending on that, the other users’ experiences might be really similar or really different from yours.

Lvxferre@lemmy.ml · 11 months ago

Chomsky would say that the original purpose of language is to structure thought, with communication being solely secondary. (Or something like this, I don’t recall it word-by-word.)

If that’s correct, then internal monologues are simply a result of your brain processing your thoughts.

Lvxferre@lemmy.ml · 11 months ago

Ah, got it. My bad. Yeah, not providing anything is even lazier, and unlike “lazy” bash scripts it leaves the user clueless.

Lvxferre@lemmy.ml · edit-2 1 year ago

[Discussion] Reddit-like aspects of Lemmy that make no sense in a federation.

Lvxferre@lemmy.ml · edit-2 1 year ago

Simple script for PulseAudio, to quickly switch between headphones and speakers

Lvxferre@lemmy.ml · 1 year ago

Small tips, tricks, and guidelines for newbie community mods

Lvxferre@lemmy.ml · edit-2 1 year ago

How Reddit handles competition

Lvxferre