Selfhosted “plagiarism” checker with custom sources?

inspxtr@lemmy.world · 11 months ago

They don’t seem to allow account deletions. Does it mean that this could include accounts that they still keep but people don’t use their services anymore?

inspxtr@lemmy.world · 11 months ago

forgive my naivety, how does such a community avoid promoting ageism?

inspxtr@lemmy.world · edit-2 1 year ago

suggests either these people are so detached from reality, or they are appealing this to very specific sets of people under the guise of a general appeal

inspxtr@lemmy.world · 1 year ago

the whole premise of OP is that this monitors people, and many organizations use TOTP, which one could also use without internet connections or phones AFAIK.

I’m in academia and I wish this is implemented more. Data breaches are getting quite common, and Github is so entwined in software engineering that it is critical to increase security measures.

inspxtr@lemmy.world · 1 year ago

or maybe most of them in a folder? and one file that defines their locations for environment variables

inspxtr@lemmy.world · 1 year ago

Something like this, unless they know the root cause (I didn’t read the paper so not sure if they do), or something close to it, may still be exploitable.

inspxtr@lemmy.world · 1 year ago

what are the other alternatives to ENV that are more preferred in terms of security?

inspxtr@lemmy.world · edit-2 1 year ago

yeah I guess maybe the formatting and the verbosity seems a bit annoying? Wonder what the alternatives solution could be to better engage people from mastodon, which is what this bot is trying to address.

edit: just to be clear, I’m not affiliated with the bot or its creator. This is just my observation from multiple posts I see this bot comments on.

inspxtr@lemmy.world · 1 year ago

I’m curious, why is this bot currently being downvoted for almost every comment it makes?

inspxtr@lemmy.world · edit-2 1 year ago

Thanks for the suggestions! I’m actually also looking into llamaindex for more conceptual comparison, though didn’t get to building an app yet.

Any general suggestions for locally hosted LLM with llamaindex by the way? I’m also running into some issues with hallucination. I’m using Ollama with llama2-13b and bge-large-en-v1.5 embedding model.

Anyway, aside from conceptual comparison, I’m also looking for more literal comparison, AFAIK, the choice of embedding model will affect how the similarity will be defined. Most of the current LLM embedding models are usually abstract and the similarity will be conceptual, like “I have 3 large dogs” and “There are three canine that I own” will probably be very similar. Do you know which choice of embedding model I should choose to have it more literal comparison?

That aside, like you indicated, there are some issues. One of it involves length. I hope to find something that can build up to find similar paragraphs iteratively from similar sentences. I can take a stab at coding it up but was just wondering if there are some similar frameworks out there already that I can model after.

inspxtr@lemmy.world · 1 year ago

yeah agreed with your sentiment. I think it’s good to have an intuition about something, but it’s much better when there’s data to back it up.

Cuz then, they can do the same with others, say Youtube or other streaming services, and start to compare the numbers, like % of ads, what types of ads, how long are the ads relative to content, how many of these ads are political, how many of these ads may be harmful, …

Having these numbers can be quite handy for other researchers and regulators to look into these issues more concretely, rather than just say, “as your brothers and sisters already know, tiktok serves ads”

inspxtr@lemmy.world · edit-2 1 year ago

Selfhosted “plagiarism” checker with custom sources?

inspxtr@lemmy.world · 1 year ago

how bout baserow.io or nocodb cloud? Haven’t used them but I think they’re open source. But they don’t have mobile apps AFAIK for editing.

inspxtr@lemmy.world · 1 year ago

while the following is not really my threat model, wouldn’t a person who’s being targeted, say a journalist/activist, have a higher chance of their device being compromised (possibly even physically)? If so, would Session still be a valid option for them?

inspxtr@lemmy.world · 1 year ago

how so?

inspxtr@lemmy.world · 1 year ago

deleted by creator

inspxtr@lemmy.world · 1 year ago

deleted by creator

inspxtr@lemmy.world · 1 year ago

deleted by creator

inspxtr@lemmy.world · 1 year ago

I’m curious about how to verify that these bots respect the rules. I don’t doubt that they do, since it might be a PR nightmare for these big tech companies if they don’t, but I don’t know how to verify them. Asking because I’m also doing this for my website.

By the way, LLMs are usually also trained by common crawl, (not sure to what extent), but I’m not sure whether you want to block common crawl.

Another thing to consider is whether your website is indexed and crawled by web archive, and whether web archive has some policy on AI bot crawlers and scrapers.

inspxtr@lemmy.world · edit-2 1 year ago

this is an interesting story but for those who prefer to read, here the article linked in the video description:

https://thefourth.media/apartments/

I also ran this through smmry to summarize. Below is the result:

The Apartments With No Entrance A shady land sale has left the residents of Sea Park Apartments locked in a decades-long land dispute, with no control over their own homes.

These apartments are “Enclosed” in more ways than one: The original developer of the apartments sold the apartment’s carpark and common areas - which surround the apartment blocks - to an individual, leaving residents in the unusual position of having their homes completely encircled by someone else’s private land.

Built in the 70s and completed in the early 80s, Sea Park Apartments is one of the earliest apartments in Petaling Jaya, if not the earliest, constructed at a time when most residential developments in the area still involved landed properties.

This meant residents had no way to access their homes without first trespassing on private property, and no control over the common facilities sited on that private land.

The individual who purchased the disputed lands is Yap Say Tee, who once managed a hotel owned by the developer, and was earlier approached by the developer to manage the car park at Sea Park Apartments.

With the developer’s sale of these lands to Yap, the rules of the game changed: The developer is no longer the registered owner of the disputed lands nor responsible for addressing the remonstrations of the residents, which reached a peak in 2013.

With the facilities on private land, access road on private land, the property value will go down, and residents will have no agency.

inspxtr@lemmy.world · 1 year ago

I’m quoting the page that I linked from privacyguides warning

These messengers do not have Forward Secrecy, and while they fulfill certain needs that our previous recommendations may not, we do not recommend them for long-term or sensitive communications. Any key compromise among message recipients would affect the confidentiality of all past communications.

inspxtr@lemmy.world · 1 year ago

Suggestion for Airtable alternative with mobile options?

inspxtr@lemmy.world · 1 year ago

Is there a day of any given year that is least special?

inspxtr@lemmy.world · 1 year ago

Trying to delete DigitalOcean acc after locked

inspxtr@lemmy.world · 1 year ago

Comment systems for static pages (Jekyll)?