ActualScan Tech

Analytic results (Roads into search #2)

Posted at Jan 2, 2021

Happy New Year everyone! Due to my non-fantastic health later last year, and annoying problems with the crawler, the progress in work on ActualScan was slower lately. But I hope to have one major announcement on the project to make this January.

For now, let’s go on with the discussion of how ActualScan differs from existing search engines. In the last entry, I explained how social indexing, by focusing on niches and user interests, can aid us in fighting spam and dealing with the hugeness of the World Wide Web.

Analytic search, on the other hand, concerns scoring and ordering search results. As I already mentioned when mapping Google discontentment, search result pages have been badly hacked by the Search Engine Optimization (SEO) industry. Their efforts make the results dominated by cookie-cutter content made for quick riches, often rendering obtaining useful information difficult.

There is always the fight in the background between the SEO guys and Google. You can check out a timeline of updates to Google search here, although the exact nature of most updates is kept secret and inferred after the fact. This is often by observing by which old SEO tricks stopped working (at least partially). A good example is networks of fake, machine-generated blogs linking each other for boosting their Google rankings. It’s one of the methods that were curbed over the years, in part by mass banning.

But even as page scoring is now made, presumably, by machine learning models, there is an incentive for spammers to test how the algorithm works – and work out how to contrive content that appeases the algorithm with minimal effort. A particularly widespread problem nowadays is burying cooking recipes and answers to simple queries in labirynthine prose about everything and nothing. If you do it as a webmaster, Google thinks you have a quality website packed with lots of detailed information. (And technically there is lots of (useless) information!) Another symptom is catching a wide range of specific queries with overly general, simplistic content. It won’t really answer your questions but is well made to get to the first results page.

Analytic search results aim to, at once:

How is it accomplished? Every indexed page is saved with a collection of statistical and linguistic features. When you search the index, you can control how the results are ranked according to those features, and modify the exact rules in real time. This is done by simply moving sliders. You can also name and save your rulesets for easy later use. Because most people, understandably, cannot be bothered to tinker with the sliders, you will also get a package of well crafted pre-made rules.

An example screenshot of interface for browsing text snippets and configuring rules with sliders

An early shot of tuning an analytic search.

This setup also makes SEO manipulation much more difficult. It’s true that the rules are known and not concealed as in the case of old search engines, but they are also easily changed. If you optimize for a set of statistics, the user can just modify their preference a little to get rid of your content (like turning the dial in an oldschool radio to improve reception). And new features can be added to the index to let the admins and the userbase sort the Web with even more sophistication.

I believe that the way to challenge the current Web technology monocultures has to involve confronting the technical challenges with different methods. The user should have more to gain from using a different tool than just feeling good about avoiding Google. In fact, answering the technical challenges may be more important than caring about the tech giants and replicating traditional products.

ActualScan, as I wanted to describe it, will let the users collectively map the Web while making their own decisions. The Internet as the place for blogs, discussions and articles is made by individual people and maybe it should be explored in a similar, human-controlled way.