ActualScan Tech

Open sourcing ActualScan

Posted at Jan 21, 2021

Recently I’ve published the full source code of ActualScan on GitHub, along with instructions that should let you (with some fiddling) run it on your own computer.

I encourage you to try it, or at least read the documentation! Hopefully it will give you some ideas about building an alternative search engine in 2020-2021. If you want to know about this in the context of more mature projects, you can refer to my posting about social indexing and detailed writeups made by the team of Cliqz search engine, which sadly went down in the first half of 2020.

Although this is not yet an ActualScan version I would run on a public server, many things work. You can scan subreddits and some tested websites (you may also have luck with different ones). You have access to a sophisticated search system with filtering by tags and sites, and real time changes in scoring.

One thing that ActualScan shares with Cliqz and some other projects that I mentioned is focusing on building our own index – as opposed to working with API of some incumbent engine (in practice, this is often Bing). This is mostly because I believe in differentiating alternative search engines in features, not only in not being/containing spyware (i.e. not having invasive tracking). And for a radically different approach to search, we need full control of our data model.

Where do we go next?

The big milestone that I want to work towards is having a public-facing server at actualscan.com. The dream would be probably to have a federation of servers (like the services in Fediverse), with each specializing in different topics (tags) and sites.

For some time, between jobs and some personal stuff, I managed to work on the project sort-of-fulltime. This obviously wasn’t very realistic in the long term, but I’m happy I brought the codebase to a point where it can be published and worked on – with its basic architecture in place.

That being said, I will now be working on ActualScan as on a hobby project, between other things. This why I’m avoiding making any big declarations. If you care about the project and want to help, please send me an email (contact \at/ actualscan.com) or use GitHub issues.

Even if you can’t or aren’t interested in programming, I’d want to know about little niches and microuniverses of blogs, forums etc. where people hang out. I want to tailor ActualScan for such places on the Internet – and your knowledge may be very important in constructing the index(es).