Skip to content

Elusive Information

2010 June 19
by Hélène Martin

I made a marginally provocative (and slightly childish) tweet and was asked to explain.

Ok, world: I’ve decided that search engines, INCLUDING YOUR PRECIOUS GOOGLE, are basically useless. Please improve. Love, Hélène.

I call it search malaise.  Until maybe 2004 or so, I was super excited every time I searched for information and got tons of amazing stuff back.  Web search (and Usenet) triggered my interest in computing and allowed me to learn to program among other things of various utility and tastefulness.  At first, the whole process was beyond thrilling.  Over the last several years, I’ve gone from seeing search as a magical gift to treating it as an unexceptional tool to finally being incredibly irritated by it.

As I’ve been building up a high school computer science program, I’ve been starved for teaching resources, connection to other high school CS teachers and relevant CS education research.  I know there’s a ton of stuff out there but I keep running into the same resources or rather lack thereof.  Blogs and mailing lists occasionally point out real gems and I find myself frustrated that my searches never brought me to them.  Of course, it’s not entirely a search problem.  In fact, it’s probably primarily an information representation issue — if only information creators had tools to better characterize their information, maybe I wouldn’t be so irritated.  Where’s this semantic web business I keep hearing about?  Oh, right, it’s a Hard Problem.

Let me illustrate the type of issues I run into.  Let’s say I need some ideas for giving my students practice on while loops.

Oops.  ”Assignment” also means giving a value to a variable, so nothing very useful comes out.  Ok.

The first and third hits are useful, but the rest really aren’t relevant to what I’m looking for.  Isn’t it obvious, though?!  I want problem statements I can give to my students that encourage them to practice the while construct!  Deep breath.  What if I try adding something about intro to programming courses?

Unfortunately course numbers are entirely unpredictable between organizations.

Come on!  APCS hasn’t been taught in C++ since 2003!  Did the quotes help? Sort of.  Without them the third result was about choosing APC loop capacitors.

This goes on and on until I give up and look through a good old-fashioned book.  There’s no overlap between the results of semantically identical queries and the whole endeavor becomes a game of figuring out which search terms are most likely to bear fruit.

In reality, there are great resources on while loops for me to use as inspiration and to point my students to.  CodingBat did make an appearance in the results, which is good.  I happen to know about Practice-It and UW’s great assignments because I know and love their creators.  I’ve also come across Princeton’s CS1 assignments, The Practice of Using Python assignments, Mr. Hanley’s assignments, Roger Frank’s labs and a bunch of others I know to turn to when I have a need for ideas.  I have no idea which magical query led me their way.

More and more, I find myself giving up on a straight web search and instead searching my own Delicious bookmarks.  Of course, that has its own issues.  Did I tag a particular thing I’m looking for as assignments?  apcs?  creativecomputing?  cs1?  Is there any type of logic to how I tag information?!

What’s being done about this?  I don’t really know and I’d like to know more.  A couple of years ago I got very excited to hear about Powerset‘s ambitious goals.  I hear now they’re part of Bing.  There are various notations for representing relationships between information and hopefully giving search engines more of a clue, but if I can’t figure out a sensible way to tag my small collection of bookmarks, the efforts seem doomed to fail — these systems place the burden of establishing semantic links on content creators and we’re probably too lazy/busy/incompetent to do a good job.

There’s so much data and information that’s out of reach because I just can’t find it… it makes me sick to think about it!

2 Responses leave one →
  1. June 20, 2010

    The problem here is not so much that search engines and their writers don’t know how to make them more intelligent, it’s that they don’t know how to do so in an efficient manner. Mapreduce works well for finding keywords and creating a basic graph of what keywords point to what and in what order, because that’s all static information that can be very quickly precalculated. Holistically parsing a document for meaning is an entirely different process that doesn’t lend itself nearly as well to scale and parallelization.

    And, on the other side of the pipe, 3-8 word queries generally don’t give enough context to search engines for them to figure out which of two possible meanings you want. Even a human wouldn’t necessarily be able to extract your intent out of your first query.

    Sadly, RDF(a) won’t really solve this problem, as it’s really just about more keywords.

    (PS “while loop exercises” seems to yield decent results)

    • Hélène Martin permalink*
      June 20, 2010

      Insightful points. You’re absolutely right that there’s a very real tradeoff between speed and usefulness. I’d be willing to wait five minutes for truly excellent results but that would be a suicide move on Google’s part.

      I agree that my first query was overly ambiguous… except Google “sees” me search for class materials day after day and saves my search history. It’d be nice if some of that could come into play. Similarly, it knows when I’ve clicked on a link and immediately left it or if I’ve been shown a particular result before… then again, maybe I just think using that information would make my experience better but it actually wouldn’t. I’m sure people much smarter than me have been thinking about these problems for a long time! Google did have the up/down arrows for a bit which would have been great if only it weren’t so easy to abuse.

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS