I dont know what to think, really.
The Dekaif channel has 434 videos, but YouTube is only showing 275 to clients, whether logged in or not, whether yt-dlp or official access.
This isn’t the first channel I’ve witnessed this, and weirder stuff, on. Another example is this video - “Belt” meme - it is accessible on Grayjay, yet not on YouTube, meaning (I think) that publicly shared videos are being deindexed, and yet they are still hosted.
You used to be able to take the video code from the URL (everything after ‘?v=’ and before ‘&’) and get the exact video in search results. Not now. The second YouTuber, Sparky, has 35 uploads, only 9 of which are visible. And I can attest that at least one of the remaining 26 is hosted, but invisible. I don’t even know how it came up using Grayjay but not YouTube or Revanced.
Basically, there’s a TON of shady underhanded shit happening at YTHQ and everyone needs to jump ship to Odysee, Peertube or some platform that won’t be clogged with AI. This is bad for everyone.
I’m posting it here mainly because I verified my findings with yt-dlp, and this new bs is successfully thwarting my attempts to archive.
3rd Oct edit: I am seeing massive differences in indexed videos versus archived videos. I am currently aggregating but the definitely affected videos range from 10% to 50%
Finding search terms is the one task I consistently use LLMs for. They did not say that though, they said replacing traditional search with LLMs, that traditional search is about to “go the way of the dinosaur”. I dont trust any local LLM to accurately recall anything it read.
Not to mention that once we gain dependence on LLMs, which is something big tech is trying really hard to achieve right now, it will not be all that difficult for the creators to introduce biases that give us many of the same problems as search engines. Product placement, political censorship, etc. There would not be billions of dollars in investment if they thought they weren’t going to get anything out of it.
(the best) Local LLMs are FOSS though, if bias is introduced it can be detected and the user base can shift away to another version, unlike centralized cloud LLMs that are private silos.
I also don’t think LLMs of any kind will fully replace search engines, but I do think they will be one of a suite of ML tools that will enable running efficient local (or distributed) indexing and search of the web.
First of all, they are not FOSS. I know it seems tangential to the discussion, but it’s important because biases cannot be reliably detected without the starting data. You should also not trust humans to see bias because humans themselves are quite biased and will generally assume that the LLM is behaving correctly if it aligns with their biases, which can be shifted in various ways over time, too.
Second, local LLMs don’t have the benefit of free software where we can modify them freely or make forks if there are problems. Sure, there’s fine tuning, but you don’t get full control that way, and you need access to your own tuning data set. We would really just have the option to switch products, which doesn’t put us much further ahead than using the closed off products available online.
I’m all for adding them to the arsenal of tools, but they are deceptively difficult to use correctly, which makes it so hard for me to be excited about them. I hardly see anyone using these tools for the purposes they are actually good for, and the things they are good for are also deceptively limited.