ActualScan Tech

Deep Probe

Posted at Sep 17, 2019

The main subject of this site is a text research application I call, for now, Deep Probe.

The purpose of Deep Probe is somewhat similar to typical search engines, but its operation is different. You enter your query, but instead of a list of places that seem to be relevant, you get a list of main recurring themes.

Deep sea sabertooth fish. Stuff one can encounter down there.

Deep sea sabertooth fish. Stuff one can encounter down there.

For example, if you ask Deep Probe about particular headphones, you may see as the main themes:

Or, querying about a book:

Keep in mind that these are not some random bits from the first blog you found on Google, or three top reviews on Amazon. These are big currents in the flow of what people say on the Internet (or at least in the corpus that Deep Probe is using). You get actual sentences or fragments assorted from many sources. You can see, at a glance, what is said the most often, and investigate further if you wish.

a diver among fish

by Leonard G.

I have been thinking about some kind of “text digesting machine” for a long time now. What fascinated me most in linguistics and language processing have revolved around understanding texts (and also, assembling things in systemic ways).

While there exist many solutions that do some similar job, text summarizers for example, they don’t provide the broad analysis I am talking about. And I think we should have better tools under known science of language and computation.

As of middle September 2019, I finally have an embarassingly hacky way of performing theme analysis with some code ran from the command line (or from inside of Emacs, to be precise). Although very primitive, this is an important beginning. It shows that the overall thinking makes sense technologically, and now needs to be much more developed and packaged in some kind of a sane interface.

So here I will document my journey onwards – while hopefully noting interesting developments in linguistics and natural language processing.