Wednesday, July 8, 2009 82 Comments

Wolfram Alpha and hubristic user interfaces

I feel it's been too long since we had a purely technical discussion here on UR. Gotta mix it up a little more. I know UR has some technical readers. For everyone else, the summer is long.

Aside from being Billy Getty's freshman roommate (it is no longer a secret that Mr. Getty owned an illegal ferret named "Earwig"), your author was a graduate student in computer science around the same time as Messrs. Brin and Page, at a similar though different institution. Unfortunately, his interest was not search, but operating systems. This turned out to be the "old thing." Thus your author is doubly familiar with proximity to great wealth and success.

Basically, MM thought search was lame. It reminded him uncomfortably of "library and information science." As for the Web, it seemed a laudable improvement on FTP - without that nasty reverse TCP connection. It certainly didn't involve any distributed shared memory or process migration, that's for sure.

Your author has certainly seen the error of his young ways. He now agrees not only that full-text search is a good idea, but that distributed shared memory is almost always a bad one, and process migration is always a bad one. Indeed, it's not really clear to me that operating systems is a valid academic field at all. If someone had axed its funding (the dirtiest word in the English language has seven letters and starts with "F") in 1980, how different would your computer be? Bear in mind: someone would also have a few billion dollars to spend on something else.

But there were (and, sadly, still are) a lot of very bright people in OS, which is really what attracted young MM. And he did learn a trick or two. Some of which work for problems other than process migration. And so, when in the year 2009, he sees people (also very bright) making one of the same mistakes that these bright people taught him not to make in 1992 - he feels obliged to comment.

Indeed (as we'll see), every decade since the '80s, billions of dollars and gazillions of man-hours have been invested in this fundamental error, to end routinely in disaster. It's as though the automotive industry had a large ongoing research program searching for the perpetual-motion engine.

The error is that control interfaces must not be intelligent. Briefly, intelligent user interfaces should be limited to applications in which the user does not expect to control the behavior of the product. If the product is used as a tool, its interface should be as unintelligent as possible. Stupid is predictable; predictable is learnable; learnable is usable.

I was reminded of this lesson by a brief perusal of Wolfram Alpha, the hype machine's latest gift. Briefly: there is actually a useful tool inside Wolfram Alpha, which hopefully will be exposed someday. Unfortunately, this would require Stephen Wolfram to amputate what he thinks is the beautiful part of the system, and leave what he thinks is the boring part.

WA is two things: a set of specialized, hand-built databases and data visualization apps, each of which would be cool, the set of which almost deserves the hype; and an intelligent UI, which translates an unstructured natural-language query into a call to one of these tools. The apps are useful and fine and good. The natural-language UI is a monstrous encumbrance, which needs to be taken out back and shot. It won't be.

This is hilariously illustrated by WA's own Technology Review puff piece. Our writer, par for the course, spends seven pages more or less fellating Dr. Wolfram (for real technology journalism: L'Inq and El Reg), but notes:
The site was also bedeviled by an inflexible natural-language interface. For example, if you searched for "Isaac Newton birth," you got Newton's birth date (December 25, 1642; you also learned that the moon was in the waxing-crescent phase that day). But if you searched for "Isaac Newton born," Alpha choked. Aaronson tested it with me and found it couldn't answer "Who invented the Web?" and didn't know state-level GDP figures, only national ones.
"Why won't it work with two cups of flour and two eggs?" Gray asked, finally.

"Well," Williams replied, "there's a bug."
But if you gave Wolfram Alpha every allowance--that is, if you asked it about subjects it knew, used search terms it understood, and didn't care to know the primary source--it was detailed, intelligent, and graphically stunning.
"Wolfram Alpha is an important advance in search technology in that it raises expectations about how content that is stored in databases should be searched," Marti Hearst, a computer scientist at the University of California, Berkeley, and the author of Search User Interfaces, told me. But she added that it "has a long way to go before achieving its ambitious goals."
Fun fact: when the author was a junior grad student, Marti Hearst was a senior grad student. How long will it be before intelligent search interfaces achieve their ambitious goals? Will Professor Hearst have, or have not, retired by then? Suppose we cut off her funding, etc? And how exactly did CS get to be this field that goes around in a circle, sucking cash and getting nowhere? That's certainly not why I spent all my Friday nights in the Sun lab.

But what do I mean by control interface? The hypothesis turns on this definition.

Let's examine this difference between Google and WA. Basically, Google is the exception: the UI that is not a control interface. Because Google's search interface is not a control interface, it should be an intelligent interface, as of course it is.

Google is not a control interface because intrinsic to the state of performing a full-text search is the assumption that the results are to some extent random. Let's say I've heard of some blog called "Unqualified Reservations" and I type it into Google.

Am I sure that the first result will be the blog itself? I suppose I'm about 95% sure. Do I have any idea what will come next? Of course not. Will I automatically click on the first result? Certainly not. I will look first. Because for all I know, the million lines of code that parsed my query could be having a bad hair day, and send me to Jim Henley instead.

Google is not a control interface, because no predictable mapping exists between control input and system behavior, and none can be expected. A screwdriver is a control interface because if I am screwing in a screw and I turn the handle clockwise, I expect the screw to want to go in. If the screw is reverse threaded, it will want to come out instead, confusing me dreadfully. Fortunately, this mapping is not random; it is predictable. (Yes, Aspies, by "random" I mean "arbitrary.")

Because of this predictable mapping, people who screw in large numbers of screws are saved a large amount of cognitive load. The feedback loop becomes automatic. It embeds itself in muscle memory. Billions of lives made easier. Give it up for the standardization of the screw.

But any such mapping is inherently impossible for full-text search. Google's problem is an intrinsically heuristic one. The result of the search is always a starting point for further analysis. There is never any automatic next step.

The advantage of this inherent unpredictability is that since a search request never implies any precise rules for the prioritization of results, a search engine can use arbitrarily fuzzy and complex heuristics to get the best results to the top. And, indeed, should. Thus, Google can be Google, and Google should be Google. And Google is Google. Give it up to teh Goog.

And here we come to Wolfram Alpha. WA is not the same thing as Google. Everyone knows this. Everyone does not seem to realize the implications, however. Let me explain why the natural-language interface of WA is such an awful idea.

WA is not a full-text search engine. It is a database query and visualization tool. More precisely, it is a large (indeed, almost exhaustive) set of such tools. These things may seem similar, but they are as different as popes and partridges.

Google is not a control interface; WA is. When you use WA, you know which of these tools you wish to select. You know that when you type "two cups of flour and two eggs" (which now works) you are looking for a Nutrition Facts label. It is only Stephen Wolfram's giant electronic brain which has to run ten million lines of code to figure this out. Inside your own brain, it is written on glowing letters across your forehead.

So the giant electronic brain is doing an enormous amount of work to discern information which the user knows and can enter easily: which tool she wants to use.

When the giant electronic brain succeeds in this task, it has saved the user from having to manually select and indicate her actual data-visualization application of choice. This has perhaps saved her some time. How much? Um, not very much.

When the giant electronic brain fails in this task, you type in Grandma's fried-chicken recipe and get a beautiful 3-D animation of a bird-flu epidemic. (Or, more likely, "Wolfram Alpha wasn't sure what to do with your input." Thanks, Wolfram Alpha!) How do you get from this to your Nutrition Facts? Rearrange some words, try again, bang your head on the desk, give up. What we're looking at here is a classic, old-school, big steaming lump of UI catastrophe.

And does the giant electronic brain fail? Gosh, apparently it does. After many years of research, WA is nowhere near achieving routine accuracy in guessing the tool you want to use from your unstructured natural-language input. No surprise. Not only is the Turing test kinda hard, even an actual human intelligence would have a tough time achieving reliability on this task.

The task of "guess the application I want to use" is actually not even in the domain of artificial intelligence. AI is normally defined by the human standard. To work properly as a control interface, Wolfram's guessing algorithm actually requires divine intelligence. It is not sufficient for it to just think. It must actually read the user's mind. God can do this, but software can't.

Of course, the giant electronic brain is an algorithm, and algorithms can be remembered. For instance, you can be pretty sure that the example queries on the right side of your screen ("June 23, 1988") will always send you to the same application. If you memorize these formats and avoid inappropriate variations, you may not end up in the atomic physics of the proton.

This is exactly what people do when circumstances force them to use this type of bad UI. They create an incomplete model of the giant electronic brain in their own, non-giant, non-electronic brains. Of course, since the giant electronic brain is a million lines of code which is constantly changing, this is a painful, inadequate and error-prone task. But if you are one of those people for whom one of Wolfram's data-visualization tools is useful, you have no choice.

This effect is unavoidable in any attempt at an intelligent control interface. Because any attempt at intelligence is inherently complex, the UI is effectively byzantine and incomprehensible. It isn't actually random, but it might as well be. There is no human way of knowing when it will work and when it will crap out.

But because the attempt fails, the algorithm is incapable of producing actual divine awareness of the user's intent, a user who is actually trying to use the control interface to get something done, ie achieve the normal task of selecting the dataset to query and visualize, cannot simply delegate that task to the UI. At least, not reliably. So she is constantly pounding her head on her desk. (As a toy, of course, Wolfram Alpha is great - a toy is not a tool, ie, not a control interface. As a toy, it would never have been built.)

Thus, the "flexible" and "convenient" natural-language interface becomes one which even Technology Review, not exactly famous for its skepticism, describes as "inflexible." The giant electronic brain has become a giant silicon portcullis, standing between you and your application of choice. You can visualize all sorts of queries with Wolfram Alpha - but first you have to trick, cajole, or otherwise hack a million lines of code into reading your mind.

For serious UI geeks, one way to see an intelligent control interface is as a false affordance - like a knob that cannot be turned, or a chair that cannot be sat in. The worst kind of false affordance is an unreliable affordance - a knob that can be turned except when it can't, a chair that's a cozy place to sit except when it rams a hidden metal spike deep into your tender parts.

Wolfram's natural-language query interface is an unreliable affordance because of its implicit promise of divine intelligence. The tool-guessing UI implicitly promises to read your mind and do what you want. Sometimes it even does. When it fails, however, it leaves the user angry and frustrated - a state of mind seldom productive of advertising revenue.

Now: as I said, we have seen this pattern before. In the department of intelligent control interfaces, everyone above a certain age will be reminded of one great fiasco of the past: the Apple Newton, and its notorious cursive handwriting recognition. (The Doonesbury clip is a perfect four-panel dramatization of the effect of an unreliable affordance.)

Again we see an intelligent algorithm attempt to insinuate itself into the control loop. Again, we see risible disaster. (One difference is that handwriting recognition is not a problem requiring divine intelligence - at least, not for everyone. But human intelligence is equally impossible. Apple actually still ships the Newton handwriting engine, but no one uses it and it still sucks.)

With both these examples under our belt, we may consider the general problem of hubristic user interfaces. For at the spiritual level, the sin here is clearly that of hubris - overweening pride that angers the gods, and drives them to ate, or divine destruction. By presuming to divine intelligence, of course, Wolfram Alpha has committed hubris in the highest degree. (Dr. Wolfram is certainly no stranger to hubris.)

At a more mundane level, however, we may ask: how do these obvious disasters come about? Man is flawed and hubris is eternal, of course. But really. Why, year after year, does the software industry piss away zillions of dollars, and repeatedly infuriate whatever gods there be, butting its head against this wall like a salmon trying to climb Boulder Dam? Why on earth do these mistakes continue to be designed, implemented, and shipped? By smart, smart people?

The simple answer is that both academia and the industry are, to a substantial extent, driven by hype. Hype gets press, and hype also gets funding. The press (Inquirer and Register excepted) is not a critical audience. The NSF is an even less critical audience - at least, for projects it is already pursuing. Again, if abject failure were an obstacle to continued funding, most of "computer science" would have ceased to exist sometime in the '90s. Instead, Professor Hearst will no doubt be able to pursue her ambitious goals until a comfortable retirement in the 2030s. Long live science!

Hype also generates funding because it generates exaggerated sales projections. For instance:
"What Wolfram Alpha will do," Wolfram says, "is let people make use of the achievements of science and engineering on an everyday basis, much as the Web and search engines have let billions of people become reference librarians, so to speak."
It could do things the average person might want (such as generating customized nutrition labels) as well as things only geeks would care about (such as generating truth tables for Boolean algebraic equations).
Generating customized nutrition labels! The average person! I just laughed so hard, I needed a complete change of clothing.

Dr. Wolfram, may I mention a word to you? That word is MySpace. If there is any such person as this average person, she has a MySpace account. Does she generate customized nutrition labels? On a regular basis, or just occasionally? In what other similar activities does she engage - monitoring the population of Burma? Graphing the lifecycle of stars? Charting Korean copper consumption since the 1960s? Perhaps you should feed MySpace into your giant electronic brain, and see what comes out.

Like most hubristic UIs, Wolfram Alpha is operating with a completely fictitious user narrative. The raison d'etre of the natural-language interface, stated baldly, is to create a usable tool for stupid people who might be confused or intimidated by a tree of menus. The market of stupid people is indeed enormous. The market of stupid people who like to use data-visualization tools is, well, not. (And since the interface is not in fact easy but actually quite difficult, it achieves the coveted status of a non-solution to a non-problem.)

But there is a more subtle and devilish answer to the question of why hubristic UIs happen.

Strangely, to the developers of intelligent control interfaces, these interfaces appear to work perfectly well. Moreover, when the developers demo these interfaces, the demo comes off without a hitch - and is often quite impressive. This is not the normal result of broken software. This "demo illusion" convinces the developers that the product is ready to ship, although it is not and will never be ready to ship.

Demo illusion is caused, I think, by the same compensation mechanism that allows users to grit their teeth and use a hubristic UI. Again, the user who has no choice but to use such a monster develops her own internal mental model of its algorithm. If you are forced to use a Newton, you can, and this is what you do.

For example, the Newton user may note that when she writes a T with the bar sloping up, it is recognized as a T, whereas when the bar slopes down it has an ugly tendency to come out as a lambda. So she trains herself to slope her Ts upwards, or to always enter "one cup of flour" rather than "two cups of flour" and double the Nutrition Facts herself, or to jump through any other trivial and unnecessary hoop in order to placate the angry god inside the "intelligent" UI. By slow painful effort, she constructs a crude subset of the system's functionality which happens to work for her, and sticks to it thereafter.

But for the actual developers, this compensation mechanism is far more effective. The actual developers (a) have enormous experience with the hubristic UI, (b) have enormous patience with its flaws, and (c) most important, know how it actually works. So their internal model can be, and typically is, orders of magnitude better than that of any naive user. So the product actually seems to work for them, and does. Unfortunately, it's hard to make money by selling a product to yourself.

Now. UR is a positive, upbeat blog, and we never explore problems without offering solutions. And one of the reasons that Newton is such a fine example of hubristic UI is that Palm, a few years later, came along and did pen input right. It turns out, as some of us had always suspected, that pen computing is just not a very good idea, and the real solution is little keyboards. However, it is not impossible to make pen input work as a product - and Palm proved it, with Graffiti.

What Jeff Hawkins realized is that the human skull contains an organ called a "brain," which has spent several million years learning to use tools. Therefore, if you are building a control interface, ie a tool, the prudent way to proceed is to (a) assume your users will need to learn to use your tool, (b) make it as easy as possible to learn the tool, and (c) make the tool as effective as possible once it is learned.

The big win of Graffiti was that the Graffiti recognizer was simple - perhaps an order of magnitude simpler than Newton's, maybe more like two. If you invested the small amount of mental effort to learn Graffiti, which was not at all out of proportion to the cost or utility of the Palm, you had a predictable and reliable control mapping with a low error rate, because your brain's internal model of Graffiti was reasonably close to the actual algorithm. Moreover, the process of learning it was actually kind of fun.

Applying this realization to put a good UI on Wolfram Alpha would not be difficult at all. It would not even require removing the giant electronic brain, which could remain as a toy or exploratory feature. Again, it is a perfectly decent toy, and it may even be a reasonable way to explore the space of visualization tools and datasets that WA provides.

But if you are an actual flow user who actually needs to get something done, WA could give you an alternative, manual interface for selecting your tool. You might perform the discovery task by browsing, say, a good old-fashioned menu. For example, the Nutrition Facts tool might come with its own URL, which you could bookmark and navigate to directly. There might even be a special form for entering your recipe. Yes, I know none of this is very high-tech. (Obviously the coolest thing would be a true command line - but the command line is truly not for all.)

A more intriguing question is whether the Graffiti approach can be applied to full-text search. Many modern search engines, notably the hideous, awfully-named Bing, are actually multiple applications under the hood - just like WA. If Bing figures out that you are searching for a product, it will show you one UI. If it figures out that you are searching for a celebrity, it will show you another UI. It may also switch algorithms, data sets, etc, etc. I'm sure Google has all kinds of analogous, if more subtle, meta-algorithms.

While generic full-text search, unlike generic data visualization, remains a viable application and a very useful one, specialized search might (or might not - this is not my area of expertise) be an even more useful one. If the user has an affordance by which to tell the algorithm the purpose or category of her search, the whole problem of guessing which application to direct the query to disappears and is solved perfectly. A whole class of category errors ceases to exist.

My guess is that if there is any "next thing" in search interfaces, it will come not from smarter UIs, but from dumber ones in which the user does more work - the Graffiti effect. If a small quantity of user effort can produce a substantial improvement in user experience (which is a big if), the user will accept the bargain. Hey, it made Jeff Hawkins rich.


Anonymous sean said...

Sounds like what you're referring to is a Chinese Room, or that WA is a sort of unfinished Chinese Room. Sure, the results sound semi-coherent, but the algorithm doesn't really understand the in input.

July 9, 2009 at 6:44 AM  
Blogger HealthyFoods said...

They create an incomplete model of the giant electronic brain in their own, non-giant, non-electronic brains. Of course, since the giant electronic brain is a million lines of code which is constantly changing, this is a painful, inadequate and error-prone task. But if you are one of those people for whom one of Wolfram's data-visualization tools is useful, you have no choice.

This is true, but it is equally true of typical search engines such as yahoo, bing or google. learning the types and combinations of keywords which will generally get you to a page with reasonable results seems to require the exact same kind of annoying simulation, at least in my experience.

July 9, 2009 at 7:09 AM  
Blogger newt0311 said...

Perhaps I am just being pedantic here but s/intelligent/deterministic+comprehensible+well-defined/g seems to be in order here. For example, most advanced programming languages have function call semantics which are extremely complex (especially ones with multi-methods and patterns). However, they work quite well because they are comprehensible and deterministic and well-defined.

July 9, 2009 at 7:50 AM  
Anonymous Anonymous said...

Beautifully written. "Divine intelligence"

July 9, 2009 at 7:58 AM  
Anonymous Anonymous said...

My father got into programming full time in the early 70's after his science job dried up. He told me when he retired in the late 90's he had gotten tired of it because it was all about the interface at that point.

I can see why having a cool interface is attractive from a functional standpoint but I think programmers want to be cool, and get attention from non-geeks, and designing a cool interface makes you a cool guy who the chicks dig.

July 9, 2009 at 8:50 AM  
Blogger Aaron Davies said...

see also applescript.

and btw, i learned graffiti so thoroughly, it influences my physical handwriting to this day, two years and change after i retired my palm--i still write 'f's and 'B's in a style very obviously influenced by palm.

for another example of a crap interface, see bloggers awful captcha's.

July 9, 2009 at 8:58 AM  
Anonymous NG said...

Can this please not be your only post for the week?

July 9, 2009 at 9:25 AM  
Blogger Ole - André Johansen said...

I agree that the attempt of making the WA interface into natural language is kinda silly, but I also think you may have got WA a little wrong.

As far as I understand WA is not meant to be used in the way you describe. It is meant to be used for combining the information in the database in any way you want, to create new information. It is hence not an information search-engine, it is a big calculator that happens to have a lot of constants defined. If I want to find information about Isaac Newton, WA is not the right place to be. If I want to multiply the number of years Newton lived with the current velocity of the IIS, WA is the place to be. Catch my drift? It is not meant for you to find the information you want, it is made so you can make new information via correlation and creative combining of objective information.

But as I said, the whole natural languange thing is silly, and will almost certainly never work. They need a syntax, one that is close to natural language, but makes the job for the interpreter much easier. Then you can actually learn the syntax instead of just guesing all the time.

July 9, 2009 at 10:02 AM  
Anonymous John said...

"Now. UR is a positive, upbeat blog" - I love the sarcasm. It is of course as integral to the blog as the constant linking to primary sources. As a non-uber-geek I may not have understood everything in this post, but the writing style made it enjoyable nonetheless.

July 9, 2009 at 11:09 AM  
Anonymous Patung said...

Your'e being pretty hard on Bing, it's a well functioning enough search engine.....and I detect a little over-respect for Google....example, they employ thousands upon thousands of people to manually check websites and rate them, those that rate very lowly, ie are marked 'spam', are effectively booted out of the index, or minus 30-ed, or 40-ed....Google doesn't even trust Google's algorithim.

July 9, 2009 at 11:22 AM  
Anonymous Tonio Loewald said...

A really interesting, insightful, and entertaining post, which certainly matches my frustration with Wolfram|Alpha.

What puzzles me is why there isn't a simple and precise but non "AI" method of dealing with the back end (e.g. why can't we even see some kind of list of available databases and what they contain?).

I think your use of the Newton handwriting recognition system as an example is actually off the mark. Even Doonesbury was referring to the first iteration. The shipping version is based on the second iteration which was pretty amazingly good, and far better than Graffiti which somewhat dealt with character recognition as an issue but was terrible for actually writing or editing passages of text. The link you post is a person who actually likes it well enough to be frustrated by inkwell's UI limitations (which actually didn't exist on the Newton, since if the Newton wouldn't recognise your quotation marks you could just bring up a soft keyboard).

July 9, 2009 at 11:23 AM  
Blogger Ian Danforth said...

Wonderful post. First time here. Thanks!

Do note that this hubristic drive is ultimately powered by the fact that people who know us well do read our minds regularly and we simply want the same from all information producing entities.

To ward off the skepticism of "yeah but the brain is really hard." So was flight. :)

July 9, 2009 at 12:38 PM  
Blogger Stephen Ball said...

Rome wasn't built in a day. You expect this software to do everything all at once? It's a stupendous achievement that is so amazing that we can't help but look at it and see its even more incredible potential.

July 9, 2009 at 12:43 PM  
Blogger Daniel A. Nagy said...

Annotate this article over at

July 9, 2009 at 1:01 PM  
Blogger Alex Bunardzic said...

Interestingly, the free range search seems to work best on Twitter. Using google and/or other old school search engines is a guaranteed frustrating experience, but somehow Twitter seems to work the best, almost like a mind reader.

The secret lies not in implementing fancy algorithms and advanced facets of AI, but in helping people supply pure info. By enforcing the 140 characters per post, Twitter had achieved the sweet spot where no one has the time for entertaining any silliness (such as micro formats, to give but one example). No silliness, just pure text, seems to be the most powerful way toward building a fully functional, fully searchable web that cooperates with humans on all levels.

July 9, 2009 at 1:02 PM  
Anonymous Brent said...

I'm almost sure that Mencius intends an analogy between WA's GUI and the Cathedral's democratic facade.

July 9, 2009 at 1:10 PM  
Anonymous Anonymous said...

Nice post! A series of posts in the WA community that express essentially the same concepts in somewhat more detail:

July 9, 2009 at 1:51 PM  
Blogger jehiah said...

Of course google is a control interface as well. You tell it which type of search you are doing with triggers like 'phonebook:' or you select the image search, or news search, or book search or .... This is what WA needs to do, give a way to target or hint at a specific dataset or visualization.

July 9, 2009 at 2:17 PM  
Anonymous Anonymous said...

A bit long-winded, but essentially fair. Language understanding is not only incredibly hard, but in fact, those of us who do it for a living know that it's incredibly harder than even laymen think it is. Given its enormous complexity and the paucity of our knowledge, dumb beats smart any day of the week - let humans do what they're good at (think, categorize, judge) and let computers do what they're good at (churn through massive amounts of data without getting bored or distracted).

But I think you get one thing wrong here. WA is definitely trying to solve a problem, which a cascade of menus doesn't solve: how to navigate through a huge space of information and specify a structured request based on that space. Humans do that through language, mostly; computer programmers do it by programming. But computers can't understand language, and laymen can't code. So I wouldn't dismiss the UI challenge so casually.

July 9, 2009 at 2:36 PM  
Blogger un1crom said...

Nice post. Let's dissect what we're doing with Wolfram|Alpha a bit (I'm part of the team).

a) yes, there are improvements in linguistic parsing that need to be made. in some cases, major improvements.

b) many of those improvements come as a result of a bigger, more robust data and algorithmic warehouse. Many of the inputs are linguistically understood but we lack a coherent model to respond. or the heuristics used to select algos and data to compute with override other reasonable options.

c) It would be hard to conceive of an alternative interface for general use of the system. Certainly it would be possible to make an index of formal calculators and let people select them. This someone will likely do with the API. However, in doing so much of the functionality is cut off - especially where you want to mix and match algos and data from completely different domains. It's even possible to build a query-by-image or audio file interface. But again, in refining that input control structure we might lose some of the functionality.

d) this is just the beginning. with research and analysis such as this post, we'll continue to improve W|A, the site and platform. We're very open to your feedback and suggestions.

e) we're not doing straight NLP. so many of the statements people enter are a mix of fact query plus operators plus parameters. Sometimes they are simple, other times not so much. It might be possible to offer a compound experience where user put something simple in or something complex and we prompt them in various ways. However, we think we can keep chipping away a make this an interface where you really can just dump an input in and the system does what you expect. Again, we're just at the beginning. learning how people expect to use a system and how they evolve with it is very useful. A co-evolution of use and function, of sorts.

I'll write more and answer folks questions if anyone has more specific comments/questions/ideas.

Thanks for taking the time to write your thoughts up!

July 9, 2009 at 2:37 PM  
Blogger G. M. Palmer said...

Stephan Ball --

You sound like an Obama supporter -- he's only had 6 months!

Meh. Ship it right or don't ship it. Great artists don't display their sketches to the world.

July 9, 2009 at 2:50 PM  
Blogger G. M. Palmer said...

"Stephen," sorry.

July 9, 2009 at 2:50 PM  
Blogger Jon Ericson said...

I played with Alpha for a few days and you hit the nail squarely on the head. It was fun to play with (especially for small numbers of elements in a set) but as I added more elements I invariably hit a variation that no longer worked. There are few things more annoying than knowing the software has the answers you are looking for, but you can't convince it to actually tell you what they are.

July 9, 2009 at 2:52 PM  
Blogger Pies said...

I think that WA should first of all expose its datasets rather than applications, because (I imagine, since I don't know any) WA users are usually interested in data. When the user selects one or more datasets, show the output of all applications that can accept the particular datasets.

I think WA should also publish APIs to their apps and then just copy the best ideas from mashups.

Good critique, and very funny.

July 9, 2009 at 3:50 PM  
Blogger Nathan Hawks said...

Mmm, I have to be a sore thumb. I think W|A is pretty awesome and I've found a lot of little uses for it.

As a writer, I like making comparisons and looking at information that is way, way over my head. The greatest usefulness for me has been in science fiction questions, using their models of the universe and speed / distance / travel time calculations. Can I get this information elsewhere? Yes. Can I get it as easily, or traverse it as easily, or switch gears to do info calculations about a completely different subject without leaving the site, elsewhere? Not to my knowledge. Meanwhile, I can't even read the calculations I'm copying and pasting from Wikipedia to get the information W|A is calculating for me. Not a clue. I just know how to apply it to a question after a bit of fetching.

Now, someone linked to this article with the tagline "Why W|A fails," and it was my vehement distaste of that outlook that brought me to clicking. The hubris of their UI, on the other hand, flows into "W|A = fail" for me about as smoothly as water flowing uphill during a cooling season. How long did Gmail refuse to give us a "Delete" button? Did Gmail fail?

Some of your article could be confused for derogation hurled at W|A's audience. If this was part of your intent in writing those passages, then ask yourself if Mr. Hawking is being hubristic when he dumbs-down bleeding-edge physics for the McDonalds and junior high school markets, or if he's just being generous with the fruits of human progress. It seems like a valid comparison to me.

Now I'll expose myself to some real ire by saying I didn't even read the entire blog post. I did read the comments, and all I can say is, it's quite early to nitpick a project of this scope. So it doesn't know every synonym and word form for an otherwise valid query. So its pointy-clicky interface isn't as fleshed-out as the natural language access to the data. If there wasn't a great big "beta" attached to this free public service, I'd be more interested in considering your criticisms, but all I see right now is a bunch of people trashing a pretty unique and powerful tool that I've both been awed by, and found to be useful.

I just had to say that I just don't think it's time to criticize them yet. The perfect being the enemy of the good, and all that.

July 9, 2009 at 5:49 PM  
Anonymous Anonymous said...

This is a great example of how my job, librarian, works. Patrons come with unstructured free-text queries which I interpret and feed to our system. It generally works (well, actually, research says about 55% of the time). I'm not sure how specific you expect users to be about what sources the want to use, but my experience says that most users are familiar with two or three sources, tops. They also tend to come with incompletely formed questions, which makes everything more fun and hard. Thanks for the elegant layout of the problems--best thing I've read all week.

July 9, 2009 at 5:58 PM  
Blogger HA HA HA said...

"You know that when you type 'two cups of flour and two eggs' ... you are looking for a Nutrition Facts label."

Nope. I'm looking for recipes. If nutrition facts are the first and only thing it's got to say about flour and eggs, the problem with WA is as Ole - André Johansen implies above: This thing's actual purpose is not immediately obvious, and you have to grok that purpose before you can hit any nails with it. Maybe premature to discuss the thing.

July 9, 2009 at 6:31 PM  
Anonymous Maynard Handley said...

@un1crom and the author:

I think it would go some way to defusing the misunderstandings to clarify what is the added value that WA provides over Mathematica 7 APART from the natural language stuff.
Surely if one wants a fully-fledged command-line control interface (complete with all the fun of such interfaces, like useful features added in Mathematica 6 that STILL FSCKING aren't documented even in Mathematica 7), then Mathematica is it?

For those who last used Mathematica some years ago, one of the newer features is a "database extraction" capability whereby certain Mathematica functions "evaluate" by querying a (curated) database. Just as always in Mathematica, these functions all take one, or two, or three, or a squillion options, sometimes take options that are lists, frequently return results that are lists, etc etc --- all nicely tied into all the rest of Mathematica's pattern recognition/list processing/graphing/calculating goodness.

July 9, 2009 at 7:23 PM  
Blogger Brainslug said...

That Alpha's parser is an unreliable affordance is the root of the problem. The black box nature of the parser leaves you with insane results. Great, I can ask it:

number of feet in a mile => 5280 (I get a bunch of other crap I don't care about, but that's fine.)

number of feet in a mile / 3 => 1760 feet

Great! I'm sensing a pattern here. The database is great on sports, so:

number of runs boston red sox 1998 => 876

Okay! So now:

number of runs boston red sox 1998 / 3 => 0

That's right, zero. Okay, I passed Algebra 1; maybe there's a parenthesis problem:

(number of runs boston red sox 1998) / 3 => Wolfram|Alpha isn't sure what to do with your input.

Aw, hell, maybe I have a postgraduate degree in mathematics and I've used Mathematica:

[number of runs boston red sox 1998] / 3 => Wolfram|Alpha isn't sure what to do with your input.

Why does this not work? I know that "number of runs boston red sox 1998" has a representation as a scalar value, and I know that I can divide some scalar values by 3. Why can't I do this?

And so I stopped, frustrated. It couldn't guess what I'm thinking, but that's okay: I'm subtle. But I couldn't guess what it was thinking, and that's not okay.

If I could ask Wolfram Alpha, "Take the tabular data you have for the Boston Red Sox, strip out all but the 1998 season, and strip out all statistics but the number of runs scored, take the value (units intact) and divide it by a number," then I'd be finding all sorts of things to do with this, even if my Mom rolled her eyes at how complicated it was. But if you make an interface that my Mom is supposed to use, you'd better make sure I can too.

July 9, 2009 at 8:14 PM  
Blogger un1crom said...

@maynard - indeed, much of what one can do in W|A can be done in Mathematica. No mystery there, we built this on top of the mathematica stack. However, we've written thousands of new algos specific to certain domains, others speed ups for massive data access and others for all the coupling of algos and linguistics.

For reference, there are over 10,000 pages of mathematica documentation. Yes, a lot and yet there will always be things you can do that aren't documented fully or at all. That's true of any computer language.

Documenting and cataloging every feature and command is even harder for Wolfram|Alpha. There simply is no way to produce a manual for all the ways to input in Wolfram|Alpha and there's no way to exhaust the "calculators" that you can create. That said, there is a strategy for input that generally gets you what you want. There are even keyword triggers to force contexts and trigger certain algos. All these tips, tricks and strategies are being documented.

Personally, I don't take any offense at all to this analysis and its humor. We're very aware of many of the issues and we acknowledge the dangers of "being too close to your own product" to know its imperfections. This is why we put it out there and flat out tell everyone that this is just the beginning. We're only going to figure out how to improve this for "non-experts" and experts alike by letting everyone in and talking with folks.

July 9, 2009 at 9:05 PM  
Blogger un1crom said...


I do want to express as much as I can that there's more to W|A than just a collection of calculators and pre configured visuals.

examples are better to explain:

Here's something quite basic, but relevant: carbon emissions

The G8 had their summit this week, and a big topic was emissions and environmental stuff. The example above is a simple query but what's going is a bit more complex (not too hard though). First, one must understand what the G8 is. Then one must be able to figure out what you want by carbon emissions, finally one must actually get the data and visualize.

Alternatively, one might just want some news info, which is what web search does very well.

What's kind of interesting to think about is that prior to the last week, what would the web search engines come up with? probably nothing as rich as they have now that it's been published.

My point with this example is to illustrate some of what W|A is trying to do and how it's very different than web search. And yet, in the context of this blog post, is there a better way to let a user pose this question? query the system? perhaps.

here's another similar example:

Yes, this analysis could be done in mathematica. but man that would take some time to hook up the visuals and get the data in, etc. etc.

July 9, 2009 at 9:05 PM  
Blogger un1crom said...

Ok, let's try more challenging examples.

Arbitrary ranks and lists are kind of fun and useful.

Fatty foods:

Probably not too hard to figure out how to do that. But to be able to do that in lots of domains without having to explicitly define every ranking query is a bit more challenging.

22nd longest bridge
9th tallest mountain

Let's look at computations with bolts and fasteners.

here we see something different. The idea that you can immediately calculate with the real physical properties of an object. Instead of having to look up the properties, then do a calculation, you just do it all right there.

Perhaps bolts is too nerdy.

Let's do something top of mind

How about looking at City unemployment rates by major employer revenues.

Perhaps nothing surprising here, but it is nice to get some formalization of what we all know to be true.

Another top of mind thing that's fun to compare. Salaries and inflation:

So teacher salaries are finally outpacing inflation.

It's actually mildly difficult to get that type of thing out of the web because of the number of older articles that have high rank...

Here's a nice handy dandy calculation using real world data, localization plus a parameter*+%245000

yes, we could have a menu for "Sales Tax" things but where do we draw the line with menu-izing things? even this comment full of examples is fairly broad to make a browsing system seem somewhat challenging.

Another example that's neat to show is vision.

Yes, I suppose this is a calculation with data visualizations. But it's more than a trend chart. It's a simulation of sorts. This also is a good example of how the linguistic interface is kinda nice. It didn't try to divide 20/100... Yes, we could offer a menu for this. but again, this system is going to grow so big (and already is) that any browsing interface simply becomes tedious and limiting.

Lastly, here's another fun thing to play with that is at least a bit different

This is some concept of how much money facebook could be making if it was 100% sold out with its advertising.

I did that example to show how bringing together linguistics, a model (calculation), real world data, and some adjustable parameters in a one line interface is quite useful.

Alas, no amount of examples is quite convincing enough, as you can find an infinite number of things that don't work. In many cases they should, other cases not yet, other cases maybe never.

We're in this for the long all and REALLY appreciate all the feedback, suggestions, support, skepticism... software only gets better through use and questioning of assumptions.

...argh out of time for now. apologize for the scattershot comment. Blogger comment field is hard to think straight in ;)

@pies we do have an API forth coming. So folks will get a chance to play with their own models, own visuals... many of the data sets are linked directly in the Source Information link at the bottom of most results.

July 9, 2009 at 9:06 PM  
Blogger un1crom said...


Yup, that's a bug. The sports data and parsing in current form is not particularly good. It's trying to be too clever in parsing... it's taking the 3 as a month reference... oh, and many other things.

Generally your strategy for [entity] [property] [operation] is good for inputs.

For reference, sports is getting a major effort soon.

July 9, 2009 at 9:16 PM  
Blogger ogewochavelli said...

Unfortunately, this would require Stephen Wolfram to amputate what he thinks is the beautiful part of the system...
I don't see how removing "Wolfram" from the name would fix anything. (Rimshot!)

July 9, 2009 at 11:01 PM  
Anonymous Anonymous said...

This is a very funny post. very. thanks.

July 10, 2009 at 12:30 AM  
Blogger David McCabe said...

Respectfully, un1crom, I think you miss the point of the article.

Wolfram can do amazing, useful, wondrous things with data, and querying and calculating with curated data is often more useful than searching the web and putting together information by hand.

The problem is that it seldom works. My personal success rate for Alpha queries -- and these are not ridiculous English sentences, but simple queries modeled on the demos -- is perhaps 10%, maybe less. One sentence will work, and then a slightly different sentence will not work.

If you want to do more research and development to make Alpha's language system accurate and useful, that's wonderful. You're advancing the state of the art. And it's great that you're letting outsiders see what you're working on. However, providing a less intelligent front-end would, in the mean-time, make Alpha a more useful product.

July 10, 2009 at 1:36 AM  
Blogger Afifov said...

Semantically speaking, search engines, be it Bing, Google, Ask or what have you, have an upper bound of what they can output in the search results, which remains the text entered in the search bar. The user shall always retain control over the results because he is the initiator. i.e. search engines can only do so much - and as Einstein said, only two things are infinite: the universe and human stupidity. (More explanation if you missed my point )

I, being an absolute nobody, think that the assumption that the internet has made our life easier is false. Sure, I no longer need to do complicated integrals, WA can save me the trouble. Although, the silver lining is, that if I don't know what I am integrating, then WA won't help . (Turing test why machines can't think goes here). It's like doing research using Google Scholar- its just a tool to help me find what I want, yet what I want depends entirely on myself. Advanced UI or else, is just variations of the same thing. If I know what I want- voila. If not, then it's like saying "I don't know the person's name or phone number, but when I see it I'll know it" and going through the London phone book one by one.
Sadly, people most often than none don't know what they want. So they google it. (insert social impact of technology on human behavior here)

July 10, 2009 at 3:07 AM  
Blogger adion said...


I am indeed impressed with some of the things WA knows, but one of the problems I'm having with it is that I have no clue what it knows and what it doesn't.
Therefore I'm not even sure if the problem was the way I formulate my question, or that there is simply no data on the subject.

I also think I understand what the author of the blog means when he says the search engine would have to know what you are thinking.
When I saw your link to 'unemployment+rate+detroit+vs+GM+revenue' what I thought I would get was a correlation, a graph with unemployment rate on one axis and revenue on the other, maybe even with correlation calculated for me.

The problem is, how will I ever be able to know if WA can get me this information?
Again I will never be sure if my question was formulated incorrectly, or if it is something that WA simply can't do.

So while I'm sure a lot can be improved, I don't think it's a fact yet that in the end it will even be possible to create such a system that will be able to answer any question in a satisfying way.

July 10, 2009 at 4:48 AM  
Anonymous hgs said...

@un1crom: The examples that work don't really help because we know there are plenty that work, but we cannot generalize from them correctly.

Example: the vision simulation is nice, but whilst I can get 6/60 vision to work, I can't see how to get this to work:

6/60 vision +12 diopters

"Wolfram Alpha isn't sure what to do with your input" is not feedback we can use. Which part doesn't it understand? Is there ambiguity? (Could it give us a choice?) Does it need a preposition in there? Could it offer a number of plausible responses? Maybe if it told us the ways it had tried to interpret the query before it gave up, we could reframe it better? Play some modern text adventure games (interactive fiction) and see how they handle this kind of thing.

An "unnatural" language interface could be reached through the same UI by puctuation, XML-style query and /query tags around it or something. You can have both interfaces from the same web page that way. Also, blogging about things that have actually been fixed would help.

July 10, 2009 at 5:25 AM  
Blogger un1crom said...

@David M

Oh, the article makes the point quite clear! I'd be very naive to not get the point. Yes, there is a very strong case to make a simplified, more browse-able interface ('less intelligent'). With the API, this is most certainly going to be what people do as well as make domain specific interfaces.

Should we do this on right now? Hmmm. My point, which I did not make clearly, is that it would be almost impossible to make an interface that contained everything you can do with UI that wasn't some free form input. We could make many of the things available in that way, and that could be quite useful.

@adion -completely agree with you! it is an open question of whether all queries can be answered in an obvious and satisfactory way. Yes, we can make improvements to tell you whether WA really didn't understand the query or whether it simply doesn't have the data... some of these improvements are on the way. (i know, i know... everything is always a fix away!)

July 10, 2009 at 6:37 AM  
Blogger un1crom said...


a) we do blog about fixes. not frequently enough. Probably need to keep a regular fix log in some obvious spot.

b) yes, you're right that some of the examples are hard to generalize. what you are trying to do in vision seems like it should work. Understand the frustration there. One thing we're trying to figure out is how to compute within the context of the previous computation. That would enable this example and many of the other ones people point out that should work.


Agree on getting some of the statistics of a multiple series graph.

July 10, 2009 at 6:43 AM  
Blogger G. M. Palmer said...

WA did just do a good job of telling me how large 6 trillion square feet of land is.

July 10, 2009 at 7:55 AM  
Blogger Glen said...

un1crom wrote:
"So teacher salaries are finally outpacing inflation"

The teacher salary query and the conclusion you drew from it illustrates the problem nicely. The word "finally" suggests you think teachers salaries *weren't* outpacing inflation in the past. W/A gives us a pretty chart to show us the trend since...2001? Why start then? I suspect the trend is a lot longer but need more context, so I tried adding "since 1970" to the query; no dice. okay, how about "since 1990"? Nope. There's no way to know whether the x-axis limit is due to limits in the data or in the UI.

July 10, 2009 at 9:26 AM  
Blogger Glen said...

The "egg freckles" problem with the original newton was due to an ordinary software bug. The apple hype machine built up huge expectations for a product that wasn't quite ready yet, and then the management failure is that the product was allowed to ship using an untested preference combination. Later versions were much better but Newton never recovered from the bad PR due to the poor performance of the first version of the product.

(There were several issues with the 1.0 software but the biggest was a checkbox called "only allow dictionary words" which should not have been checked in the shipping version. The effect of this preference was that if your own name wasn't in the dictionary, writing it would return the nearest word that *was* in the dictionary. This reduced the chance of small input errors at the cost of much bigger ones)

Newton 2.0 finally lived up to the promise of the original product, but was too big, too expensive, too incompatible. Palm launched at the right time with the right product to capture that market niche. Me, I'm still hoping Apple will bring back the option of pen input in a modern product one of these days. The latest newton versions had an incredible "printed only" recognizer and even had the option of graffiti-style input for those who were hooked on that. But the learning curve was too steep - good handwriting recognition turns out to currently be a power-user input feature, not a mass-market input feature. It works great for people who are willing to tweak all the settings to make it work. (If you decide how you want to write and tell the software to expect that, this can produce excellent results.)

July 10, 2009 at 10:02 AM  
Anonymous Maynard Handley said...


Thanks for the clarification.

(BTW I'm not just trying to be bitchy about Mthematica documentation. If some feature is not in the documentation it might as well not be in the program. And I'm not talking about minor technical things. For example, consider the new Table[] notation added in 6, iterating over a list, eg:

Table[2i , {i, data}]

Note specifically the use of 2i rather than 2 data[[i]]. Nothing about this in even the 7 docs.

It's a real shame, and I just hope that if enough of us outside the company complain, these sorts of niggling features will be fixed. )

And regarding your carbon emissions example --- that's actually a perfect example of what drives all us users crazy. Your example works with G8. Now replace G8 with EU15 --- and this is not a contrived example, I was trying to do EXACTLY this about a week ago. Sadly WA does not understand either EU15 or EU-15. Given my experience with that, I'd never even have bothered assuming it might understand G8. What's the solution? Beats me? Hook it up to Cyc maybe?

July 10, 2009 at 11:34 AM  
Blogger un1crom said...


Yup. EU 15 should be there. Another country class that needs to be added. And yes, I agree on your thought that if you try something and it doesn't work it's not reinforcing to assume something similar will work.

The page you get on EU15 actually does suggest EU, which is an indicator that there are perhaps related or broader/narrower options. We're finding that people do not use those features as much as they can on "don't know" pages so we're working on improving the utility there to give you more navigation options, more data and more obvious instruction on what it does and doesn't know.

As for you Mathematica example. The old way still works
Table[2 data[[i]], {i, data}]

The documentation does not make it clear that you can use old or new way. You're right. I've made note internally of both your concerns. Thanks!

July 10, 2009 at 11:51 AM  
Blogger altracaz said...

isn't the whole AI been "hubristic"?

July 10, 2009 at 3:45 PM  
Anonymous Anonymous said...

By bringing up the Newton's handwriting: since it couldn't be done properly then should all research into the area be stopped as it is hopeless?

Give WA some time. I also think your "issac newton born" example was somewhat google-centric. From what I understand WA is trying to combine multiple sources of information and come up with something new. Perhaps, for example, asking it "how much should I insure my oceanside house in Florida for in 2011?" and it knowing enough to look up weather trends as that influences the price of insurance, etc.

July 10, 2009 at 4:07 PM  
Anonymous P.M.Lawrence said...

To help with this sort of problem, here are two words: Jef Raskin. Here are another two: humane interface. Google them, follow up the links and learn much about this area.

July 10, 2009 at 7:05 PM  
Blogger un1crom said...

@ PM Lawrence...

just a thought. from the wikipedia on Jef R.

"An end to stand-alone applications - every software package should be structured as a set of tools available to users on any document. For example, in the middle of writing a text document, a user should be able to do a mathematical computation by writing out the computation in the document, then hitting some "calculate" function."

and there we get to heart of some of what we're trying to do with W|A. Can we make the system that allows for that function in any document, medium... obviously not yet... but just a thought!

July 10, 2009 at 8:14 PM  
Anonymous Anonymous said...

If there's one thing I think you're missing here (and there's not much, this was an excellent article), there's one more reason why hubris - not just in interfaces, but throughout Computer Science - remains.

It's that technology has already achieved so much of what was once thought to be impossible. Alright, it's 2009 and we don't live on the moon, but computers that used to fill a room now live in our wristwatches. OK, we don't have video-phones (well, we do, but nobody uses them), but everyone in the civilised world has a mobile phone now. True, we've yet to bring about World Peace, but it's been 65 years since the last World War and we're looking good so far (touch wood)...

Hubris is so easy to fall into because without it, you guess far too conservatively and fall behind. I once predicted mobile phones would never take off - I was completely wrong by every possible metric (there are now more mobile phones in the United Kingdom than there are people. So when the next "big thing" comes along - there is a strong temptation to back it to the hilt and hype it to hyperbole, because if you don't there's a risk you will be a dinosaur.

July 11, 2009 at 7:08 AM  
Anonymous Katto said...

A proverb:

"Never take advice from a loser."

July 11, 2009 at 7:29 AM  
Blogger altracaz said...

@bittermanandy This is just a partial excuse for hubris. For example, the AI sillogism ("brain is a machine that processes symbol, computer too, and so computer will be able to make everything the brain does") is just plain wrong, and no computing power will make it right.

July 11, 2009 at 7:34 AM  
Blogger AZDean said...


Thanks for showing those examples of what WA can do. They are intriguing. Unfortunately, like others have said, the percentage of answers that I get out of WA is exceedingly low. And once it does answer something, I really dislike that I can't go further with it.

For example, I do like that it can tell me what the 7th highest mountain is, or even the 7th highest mountain in the US, but it can't tell me what the tallest mountain in Arizona is, nor can it show me a picture of the mountain that it does tell me about, or for that matter anything else about it. All I get is the mountain's name. Why I'm I forced to cut-and-paste the mountains name into Google to find out more about this mountain and actually get a picture of it?

Or even better, why can't WA offer possible other queries that I might find useful, like when it was discovered, or who first climbed it, or how far away it is from where I am locally, or whether the mountain is growing or not? Why not incorporate an algorithm like NetFlix uses that can offer suggestions based on users who have had similar queries as mine?

Why oh why can't WA simply give me Google-like links to information on the answers it provides? Why does every query leave me feeling like I've reached a dead-end?? Argh!!

And why is it so hard to do math with a given answer it's already given me? Why can't there be a syntax as simple as a parenthesis that allows me to tell WA to treat the thing in the parenthesis as one query that I will then use to do a computation with? Why can't I say something as simple as, "(9th tallest mountain in the US) in miles"?

In my view, you absolutely have to avoid dead-ends, and you have to allow people to do calculations with the results you provide them.

I think WA would be much better if it NEVER came back saying it doesn't know what to do. It should ALWAYS offer something, offering a whole list of possibilities based on what previous people have queried for. You have to let people explore. You have to give them options. Never let people get stuck not knowing where to turn.

Or they'll turn away from WA and never come back.


July 11, 2009 at 8:39 AM  
Blogger Greg Little said...

This comment has been removed by the author.

July 11, 2009 at 11:06 AM  
Blogger Greg Little said...

@Brainslug: You give an example in a comment where you try typing "number of runs boston red sox 1998 / 3" into WA, and it returns 0.

You also complain "But I couldn't guess what it was thinking, and that's not okay."

The first box returned by WA called "Input interpretation:" gives the following "Boston Red Sox | Runs | March 1998".

My guess is that it is interpreting the 3 as the 3rd month of the year. I verified this by replacing 3 with 4, and got "April" instead of "March".

I think WA tries very hard to make sense of your input. If you type "(number of runs boston red sox 1998) / 3" as you tried, it does give "Related inputs to try:", the first two of which seem relevant: "boston red sox 1998" and "runs boston red sox".

It seems like you tried an input, were satisfied with the results, and kept making it more complicated until you were not satisfied with the results.

July 11, 2009 at 11:13 AM  
Blogger BlakeyRat said...


OS X has the option of pen input right now. The catch is, of course, that since Apple doesn't build a tablet (and seemingly hates the entire concept), you have to buy and provide your own input device.

July 11, 2009 at 1:41 PM  
Blogger BlakeyRat said...

I should also mention that Windows Vista's handwriting recognition is much, much better than Apple's. (So is its voice recognition.) Things flip-flop very quickly in the computer world.

July 11, 2009 at 1:46 PM  
Blogger Glen said...

Newton 1.0's mixed recognizer was based on Paragraph's Calligrapher; many windows portable devices were based on a later version of this engine. The Newton 2.0 'printed' recognizer was developed in-house and called "Rosetta"; this was still the state of the art even as late as 2000 but I wouldn't be surprised if it's since been surpassed. The holy Grail for me is the Newton Notepad experience - word-based recognition at least as good as that with editing gestures and the ability to mix text with graphics...but in a roughly iPhone-sized programmable device.

July 11, 2009 at 5:58 PM  
Blogger Richard said...

The thing that bothers me about W|A is my inability to jump from interesting demos to actual useful data manipulation. There's no way for me to jump from an extensional set to intensional, or ask for multiple properties, so I can ask

"weather last year Cody, WY and Casper, WY and San Jose, CA"


"elevation Cody, WY and Casper, WY and San Jose, CA"

but I have no way of correlating temperature information and elevation, such as

"elevation temperature Cody, WY and Casper, WY and San Jose, CA"

nor do I have any way of working with computed sets of places:

"elevation of cities in Wyoming"

This leads to the ridiculous situation of being able to find the elevation of every city individually, and compute the set of cities in Wyoming (though not get a list without a maddening stream of clicks on 'more'), but not perform the simple select/map/projection that I want. Copying and pasting the {Name,State,Country} fragments doesn't work.

No combination of syntax can convince W|A that I know how to group my input, or that I know it has the information that I want.

W|A has such fantastic potential, but it falls short, leaving Freebase Parallax — a research project! — more immediately useful because of its ability to manipulate sets. I should be able to perform detailed comparisons of the population of cities versus elevation, or other properties of these entities… but I can't, because there's no way to phrase my query to get at the data I know is hidden underneath.


July 11, 2009 at 8:42 PM  
Blogger S said...

The databases and the data visualizaion apps are still accessible without the 'natural language' web UI, in Mathematica. It makes sense they would not give away full-featured, direct access to that for free.

July 12, 2009 at 8:47 AM  
Blogger Greg Little said...

@Richard, you say "No combination of syntax can convince W|A that I know how to group my input, or that I know it has the information that I want."

It may be that WA doesn't support the operation you want to perform (like @S says), in which case the problem is not with the syntax or UI.

It's like complaining that the syntax of Google doesn't support the notion of searching for a page with certain keywords, and then using the contents of that page as a new query. No Google syntax will do this (that I know of), so does that mean that Google has a poor UI?

A more compelling example would be something that WA *can* do, but which is hard to express using the syntax.

July 12, 2009 at 11:41 AM  
Blogger shillo said...


The point of the article was not the criticism of the WA capabilities, but the interface. For instance:

water phase transition
phase diagram of water
phases of water
water phase
water phase diagram

Only the last one works (#4 is suggested by WA for some of the other inputs, but isn't what I wanted). There is no indication at all that I should think in argument-property way, nor that the 'phase diagram' is the correct magical word I was looking for. There is no way to select it from a menu, and there is no way to just *tell* WA to do stuff. I couldn't even find out if WA knows about phase transition diagrams without playing the guessing game.

Oh, and 'mercury phase diagram' fails with no indication of what went wrong - it doesn't even tell me that it thinks I asked about phase transition diagram for mercury but that this information hadn't been entered yet.

July 12, 2009 at 4:49 PM  
Blogger Faré said...

My proposed solution:
offer both the natural language interface AND a direct access to the Mathematica query language, and MOST IMPORTANTLY, always include the fully expanded Mathematica query with the output of every natural language query. Thus, you provide a way for the users to learn how to map their simple natural language queries into bits of more complex queries that they can later compose.

July 12, 2009 at 5:59 PM  
Anonymous James Ashley said...

I plugged "Mencius Moldbug" into the Wolfram-Alpha website. It came back with "Wolfram|Alpha isn't sure what to do with your input."

Sounds fairly accurate to me.

I then threw the Chinese Room problem at it and the website went down like a Star Trek computer facing a logical paradox.

What a peculiar search engine.

July 12, 2009 at 7:30 PM  
Blogger Paul Prescod said...

"The worst kind of false affordance is an unreliable affordance - a knob that can be turned except when it can't, a chair that's a cozy place to sit except when it rams a hidden metal spike deep into your tender parts."

What about a touch-screen typing accelerator that sometimes changes your correct word to an incorrect one, while your eyes are predictably elsewhere (i.e. on the soft keyboard).

July 13, 2009 at 12:33 AM  
Blogger Christopher said...

More importantly, I think that Wolfram can do what Wolfram wants. I, too, am skeptical and think that we should let humans do what humans are good at and let computers do what they're good at.

But the fact of the matter is that Wolfram Alpha is a private company, not taking tax dollars, so if it takes them 50 years and billions of dollars to achieve a perfect WA, so be it. If they were a public institution that rely on public money I think harsh criticism would be fair and worthy of our effort. But until then, who cares? I don't think that anyone has proved without a doubt that this cannot work.

July 13, 2009 at 11:07 AM  
Anonymous Anonymous said...

You, sir, got some fine humor.

Getting no response from a query is imo much more distracting than getting the wrong information. If i get the wrong answer, i try to refine the query to get the desired one. When i get none at all, i just think WTF and walk away.

July 13, 2009 at 12:08 PM  
Blogger Brainslug said...

@Greg Little:

No, actually, I didn't start with a result I liked and worked to get one I didn't like. In fact, I started with a complicated query, got "Wolfram Unsure," and then worked backward.

I initially wanted to ask it for a runs scored / runs scored against calculation, and I was dissatisfied with the results.

I posted to the Wolfram forum with an example similar to this months ago and never heard anything from them.

July 13, 2009 at 1:48 PM  
Blogger Greg Little said...

@Brainslug: I see, that makes sense.

I wonder if there is any syntax on WA that will do what you want, or if your query is something WA simply doesn't support.

July 13, 2009 at 4:09 PM  
Blogger Christophe said...

The author misses an important truth. Yes, Wolfram Alpha is a million lines of code running on intel (?) cpu's.

And the author's mind is 100 million billion synapse values running on a set of about 100 billion neurons.

Needless to say the author's mind is also running some kind of algorithm.

Wolfram Alpha is not a database interface, since it's not about reading values out of a database. It's about constructing that database first, then presenting the reader with answers from it, which is a wholly different thing.

There is no real way to predict what wolfram alpha will answer to any non-trivial query.

And, frankly, a good use for wolfram alpha is in the middle of some calculation where you're not in the mood of looking up the actual formula. Just type in the values and wolfram alpha will present you with usually the correct interpretation.

The only real difference (ie. not a difference of scale) between the author's mind and wolfram alpha's mind is that wolfram alpha's mind is bound by the rules of mathematics and will only provide consistent answers, whereas the author's mind is merely bound by whatever rules, consistent or inconsistent (yes, thank you, Lucas, most certainly very inconsistent, now stop arguing).

The real thing you can learn from wolfram alpha are the limits of consistent thinking. No amount of correct theorems will help you open a door, given a human body, yet somehow every human succeeds in that task thousands of times daily. Talking about the problem of disliking Michael Jackson's circus, I mean funeral, that's just beyond what wolfram alpha will ever do.

That is, of course, why, in AI research, all but the most desperate of researchers have dropped the requirement of consistent thinking for an AI. This has lead to astonishingly better results than wolfram alpha (including lots of programs that pass the written Turing test). The most successfull of those programs do something very very simple : they imitate humans.

If anyone's wondering, what does one human identify as another human, given only text communication ? Just about any imitation machine, from the trivial Eliza to the more modern multidimensional markov chains, all these things do, in essence, is imitate whatever behavior they see before them ("during training"). In fact, if you merely repeated what someone said back to him literally, a person only starts thinking "hey this is a machine" after 4-5 such repetitions, Eliza has talked for days to some people, and markov chains have succeeded in convincing university professors that they were human.

Needless to say, a human mind (or anything we can't distinguish from a human mind) does not provide scientifically correct answers, or even consistent ones. A "human mind" search engine would be of very limited use.

July 13, 2009 at 4:24 PM  
Anonymous M said...

Have you guys seen this on Obama's science tsar, John Holdren? The guy is insane and very, very evil.

July 13, 2009 at 6:22 PM  
Blogger Boris said...

un1crom, that's a very interesting example with the G8 carbon emissions. Of course that data is pointless; what matter is per-capita data. Sadly, adding "per capita" or "per-capita" to the search doesn't produce any of the useful plots/graphs that break things down by country...

July 14, 2009 at 4:34 PM  
Blogger Richard said...

@Greg Little: W|A patently knows the elevations and both current and historical temperatures for all of these places.

It allows me to enter them one by one in my query string.

It is able to plot continuous values against each other.

I see no reason why it should be able to answer

but not

— both queries compute a set of entities and find a typed numeric attribute of each. That I can find the elevation of explicitly listed cities, find the set of cities, but not combine the two when I can do exactly that in a different domain is maddening.

You are arguing that I shouldn't expect it to be able to answer that query because it's not able to answer it. That is ridiculous. It should be able to answer it: it can perform identical operations, and the data are available. That it cannot is an infuriating failing.

Whether it cannot because of an internal oversight, a syntactic omission, or some other UI failing is irrelevant.

July 14, 2009 at 10:28 PM  
Anonymous Anonymous said...

If you were to pay for Mathematica, you would get access to the command line interface.

July 15, 2009 at 5:02 AM  
Blogger jevans said...

The Newton is a perfect example of the public condemning a product forever because of initial problems. I used a Newton MessagePad 2000 quite productively for 10 years. For me the handwriting recognition just worked and the interface design was spot on. I'd still prefer it over anything since (except maybe an iPhone with the same handwriting recognition :-)

July 15, 2009 at 10:30 AM  
Anonymous Anonymous said...

hilarious :)

July 18, 2009 at 4:30 AM  
Anonymous Anonymous said...


Well, you are certainly making the original post's point in spades. So I can get the G8 annual carbon emissions. Now, how much do the other 12 countries in the G20 emit?

gives "270
billions of gravitational constant metric tons of carbon dioxide equivalent per year

July 19, 2009 at 9:30 PM  
Blogger un1crom said...

Anonymous -

yes, i agree with you and everyone else and the author. In my comments I acknowledge the great amount of work that needs to happen on the linguistic interface.

I listed examples only to show that W|A provides more than a finite set of calculators. If it were a finite set of calculators we could, indeed, just have a browsable menu system to get to all the calculators.

I sincerely am asking for ideas on how to create controls and interfaces that improve the experience. We have an API launching where experimentation with alternative interfaces should be quite easy to test out.

July 19, 2009 at 10:21 PM  
Anonymous Anonymous said...

(1) A general accurate plain language interface into a general system like W|A is probably impossible for all the reasons the blog points out. You will never be able to divine what the user thought. The fact that W|A requires or encourages a fragmented shortcut language is making it worse. If W|A could parse and understand "Build a custom nutritional table for a food consisting of 2 cups of flour", it might make a bit of sense, but by making the user strip his/her language, you are loosing the context. Even a human can't figure out what answer is expected for "two cups of water, two eggs". Is it a recipe for cake, a shopping list, a pancake batter, paint, ...?

(2) For all the impressive size, W|A's database is too small to accommodate even a tiny fraction of all the possible queries, which as in my G20 example makes it very frustrating. This and (1) makes every expansion on a query is a hail Mary pass. This is very unlike Google, where a result is always returned and it's fairly straightforward to figure out how to backtrack. Obviously, the information returned by Google or Bing is messy, but dealing with that is something humans are reasonably good at.

Unless there is a quantum leap in mind reading, the general problem will probably never be solved.

Short term, two huge improvements would be a way to tag the dataset in the query "G20 carbon emissions +environment" "carbon emissions +chemistry" "LSD +chemistry" "LSD +medical", and getting rid of simple parsing errors. For example, it that G20 G8 example, take into account that almost nobody writes 20*G as G20.

July 20, 2009 at 2:03 PM  
Anonymous Anonymous said...

Interesting article, and an interesting viewpoint. A couple points:

First off, seriously, The Inquirer and The Register? Not sure why so many folks think those two British publications are such fine sources of technology news, but I've always found Technology Review to be a pretty good source, and it contains many in-depth articles, which seem to outweigh the stinkers. The coverage of the Inquirer and the Register, by contrast, seems a lot more fluffy (though I have to give the BBC credit for being far too dumbed-down and diluted in science and technology coverage to be worth anything).

Full disclosure: I am an MIT graduate, so I get Technology Review as a free subscription. So I may have some bias.

Also, any particular reason for the Aspie baiting? I mean, if the word "arbitrary" would have been more understandable than "random" to an Aspie, and if your article was technical in nature anyway, why not use the more precise term so that everyone could understand it? Why the parenthetical, seemingly snarky remark? Do you think Asperger's sufferers like being singled out? Your tone certainly seemed ... hm... irritable.

Oh, about the Newton's handwriting engine: I always found that a mixture of block print and cursive worked well. I disagree that the engine still sucks -- it definitely improved dramatically with Newton OS 1.3 (IIRC), and once I had a MessagePad 2000 (running Newton OS 2.x, again this is from memory), it worked quite remarkably well. I don't use Inkwell only because I have no need of a Wacom tablet, but I don't think it's fair to say nobody uses it. Some special needs users probably do use it, and it's better than nothing at all.

Even Apple had a sense of humor about the original handwriting software, such that the Doonesbury strip you mention was included as an easter egg in one of the MessagePad models I owned.

July 31, 2009 at 7:41 PM  
Blogger Dr. Richard S. Wallace said...

I found this post rather late I am afraid, but it resonated so I thought I might post a comment anyway. "Control interfaces must not be intelligent" echoes our own experience in the chat bot field. I would say I agree, with one caveat (below). The landscape is littered with the wreckage of dead chat bot companies whose business plan was based on replacing call centers with chat bots. But people do not want to talk to bots when they want specific answers. Another example is, who dropped their original natural language "Ask Jeeves" in favor of keyword search.

On the flip side, the most successful applications of chat bots (so far) fall under the category of "entertainment". Talking to a bot can be fun, if there is no expectation of it providing truthful or accurate information. The responses may even be untruthful or inaccurate, so long as they are entertaining.

The caveat is that if artificial intelligence ever evolves to a sophisticated enough level, to be like the imagined talking computers of science fiction, then yes, it may be possible to get call center help or navigate a spacecraft with natural language.

"Control interfaces will not be intelligent" until A.I. at least reaches the level of the human brain.

September 20, 2009 at 10:02 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home