Advertisements

Archive

Tag Archives: computer science

Tyler Cowen sends us to an essay by Gary Marcus arguing that modern artificial intelligence (AI) has failed. Not only because it has failed to live up to its goals, but also because it has the wrong goals. At the lowest common denominator – between computer scientists and everyone else – AI’s chief mandate is to pass the so-called “Turing Test” wherein a human “judge” has a natural language conversation with a machine, and cannot tell that it is not a human. Modern AI agents, like Siri, Watson or your neighborhood NSA, basically mine troves of data trolling for correlations to find something significant. Marcus argues this is not “intelligence” because even the smartest and most powerful machines cannot answer simple, common sensical questions like,

“Can an alligator run the hundred-metre hurdles?”

or

Joan made sure to thank Susan for all the help she had given. Who had given the help?

a) Joan
b) Susan

because billions of Google searches will not yield any patterns to this effect: since nobody has asked this question before. He also points out most machines that “pass” a Turing Test do so with misdirection and deception, not signs of innate brilliance.

At a theoretical and conceptual level maybe Marcus is right. There is something profoundly unintelligent about using “big data” to solve human problems. That is because, “theoretically” by definition , anything that fails to use theory derived from axiomatic laws is not intelligent. It is why mathematical economists – even after their time has come and perhaps gone (with apologies to Al Roth who is a mechanism designer that has used theory to profoundly improve our lives) – hold their nose high towards the empiricists. It is why theoretical computer scientists have mid-career forgotten how to actually program. 

Theory is brilliant for some things. Thank god that we are not using randomized experiments and inductive reasoning to conclude that for a right-angled triangle the sum of the squares of each side does indeed equal the length of the hypotenuse squared. But a practitioner, capitalist, or your average Joe would look at Marcus’ critique another way. Who really cares that computers cannot answer questions that nobody has asked. Computers that are deeply common sensical by definition are not targets for artificial intelligence.

I would normally not write about computer science. Contrary to my choice of major, I’m quite a bit more confident with my command of economics than abstract philosophy or computer science. But I do understand the Turing Test. And any computer witty enough to “trick” humans is smart enough for me. I do not recall Alan Turing issuing an exception for remarkably sarcastic computers.

I also know a bit about capitalism and why it works. We may experience any sort of “market failure”. Maybe it’s too cheap to pollute. Maybe money demand is too high in a liquidity trap. But, by and large, markets work. That means useless companies go out of business and good ones stay. It means Apple’s iPad brings in treasure chests of profit while Microsoft’s Surface does… I do not know what.

It means artificial intelligence answers questions people give a shit about. Private enterprise has done wonders for the tech world. And the tech world is busy fixing problems that substantially improve our lives. 

Marcus does not consider the flip side of his claim. He is embittered by an industry that attempts to “trick” humans (not in general, but only when they are specifically asked to), but is upset computers cannot answer heroically contrived questions similarly designed to trick insightful algorithms, with the failure of a non-formal language such as English. In fact, computers can understand basically every colloquially important part of English. It would take a computer scientist to design a question that computers cannot effectively answer. 

Markets work. Artificial intelligence may not change the world as once did steam engines or double-entry bookkeeping. But it is answering questions profound to our social existence. Some, like Marcus, are upset that the artificial intelligentsia is keen on making programs only to trick human readers. He believes this does not take Turing’s argument in good faith, for he surely could not have foreseen the preponderance of big, evil data! Yet, at a theoretical level, Turing’s very insight demanded that machines trick minds, for if an algorithm convincing a judge that it is a human is not a “trick”, I do not know what is. More importantly, passing a silly test hardly defines the profession any more. It is about answering questions that generate mass profit, and hence mass welfare.

Contemporary AI has “forgotten” about the question of philosophical intelligence because it is not a well-defined phenomenon. An essay in the New Yorker a definition of intelligence does not make; indeed there is a far more philosophically elegant beauty in the Turing Test than querying a computer about the habits of Alligators. And I have a theory for that.

Advertisements

Ritwik Priya sent me an intriguing paper from Philip Maymin arguing that an efficient solution to NP-complete problems is reducible to efficient markets, and vice-versa. In other words the weak-form efficient market hypothesis holds true if and only if P = NP. This result is not published in a peer reviewed journal (as far as I can tell), but purports a remarkable discovery. My bad, looks like it’s published in Algorithmic Finance which has Ken Arrow and Myron Scholes on its editorial board. I’m still surprised NBER doesn’t point me to any future citations. As Maymin himself notes, he seems to have married the definitive question in finance (are markets efficient) with the holy grail of computer science (does P = NP).

There are reasons to be skeptical, and a lot of questions to be asked, but first I want to summarize the paper. I Googled around, to see where the paper showed up and, maybe not surprisingly, this MarginalRevolution post was a top hit. But Tyler Cowen seems unfairly dismissive when he says “Points like this seem to be rediscovered every ten years or so; I am never sure what to make of them.  What ever happened to Alain Lewis?”.

I don’t know much about this Alain Lewis. I can see he has written papers like “On turing degrees of Walrasian models and a general impossibility result in the theory of decision making”. I can’t even pretend to understand the abstract. On the other hand, reading Maymin’s paper didn’t really change my mind about efficient markets, but it gives an intriguing example of market capabilities. Anyone with thirty minutes and a mild interest in computer science should read the whole paper, because I think it gives a very good heuristic on understanding the debate of EMH itself, even if not resolving it thereof.

Indeed, some commenters on the net (at MarginalRevolution and elsewhere) joke that this paper is just another guy hating on the EMH. They say this, of course, because they have an incredibly high subjective belief that P ≠ NP (will discuss this later). They have not read the paper, because this disregards the fact that the author is a blatant libertarian citing Hayek favorably within.

Before I give a brief explanation of Maymin’s proof, I will add that I am skeptical as this result seems not to be replicated (with regard to his empirical evidence) in any prominent journal, economic or mathematical. While one may cite the “natural conservativeness” of the former profession as an explanation, the proof is simply too cool not to receive more attention. My understanding of theoretical computer science is limited, and to the extent that I am a good judge, the paper makes  sense on first read. (Strike one against comparing him to Alain Lewis, whose very titles make me shiver?) I do have some quibbles which I note along the way.

A Summary

Maymin ventures to prove biconditionality between the weak-form of the EMH and P = NP. He notes this would be an interesting result as the majority of financial economists have a fair degree of belief that markets are efficient, contrasted with computer scientists who very much doubt that P = NP. (It is this relationship that I will critique, but that later.) The first part of the proof shows that efficient markets imply that P = NP.

The weak-form of the EMH asserts the following:

  • If a certain pattern – such as Amazon always moves with Google with a lag of seven days – is observed, it will immediately disappear as the market incorporates this information into natural prices.
  • Because markets are informationally efficient, said pattern will be found immediately.

Richard Thaler calls these, respectively, the “no free lunch” and “price is right” claims of the EMH. Maymin’s result suggests that for the latter to be true, there must exist polynomial time algorithms to NP-complete problems (specifically, the Knapsack problem). We assume that there are n past price changes (1) for UP and (0) for DOWN. We take that a given investor can submit either a long, short, or neutral position at each price change. Therefore, the total number of strategies is 3^n. We note that verifying whether a given strategy earns a statistically significant profit requires only a linear pass through the n past time changes and is hence in NP. (That is, given some model, is there a 95% chance that your strategy beats a monkey throwing darts at the WSJ at coffee each morning). Remember this whole thought experiment is an exercise in finding some pattern of ups and downs associated with some future ups and downs which will always hold true and hence may be exploited.

Maymin notes that in practice, popular quantitative strategies are based on momentum, and hence some fixed-lookback window t. He notes the joint-hypothesis problem from Fama (1970) that the EMH says we cannot, given some equilibrium model with a normal profit K, earn in excess of K for a period of time. He resolves what I find to be an important debate among EMH skeptics quite well; that is how we reasonably search across the 3^n possible strategies. Some argue that we should stick specifically to economic theory, others submit blind data mining, and others still machine learning. Maymin notes that this is irrelevant to the question at hand, as the employed search strategy is endogenous to the determination of K.

Maymin agrees that determining whether a strategy iterated on one asset can earn supernormal profits is polynomial time. However, he notes that under the assumption that a) investors do not have infinite leverage and b) operate under a budget constraint, answering the question “does there exist a portfolio of assets earning supernormal profits within our budget” to be akin to solving the Knapsack problem.

For those who do not know, the Knapsack problem – a canonical introduction to discrete optimization and intractable problems – asks one, given n items represented as {value, size} to maximize the sum total value keeping the total size under a constraint C. In the analogy, size is the price of an asset following a t-length time series where t is the lookback window, value is the future return of the asset following the same time series, and the strategy on each asset is picked from a t-sized set U with {long, neutral, short} as possible options. Hence, in Maymin’s words, “the question of whether or not there exists a budget-conscious long-or-out strategy that generates statistically significant profit relative to a given model of market equilibrium is the same as the knapsack problem, which itself is NP-complete. Therefore, investors would be able to quickly compute the answer to this question if and only if P = NP.”

Maymin concludes that this algorithm is exponential in t, the size of the lookback window. He suggests that because this grows linearly with n (the total time series of all history) markets become intractable rapidly. I must quibble, theoretically if not empirically (which seems soundly in Maymin’s favor). Is there reason to assume that t ~ n? Is it not possible that for asymptotically large ntlog n? Indeed, for the market as a whole, if this were the case the problem would be linear in time. Empirically, however, linearity seems to be a fair assumption. I might add that time series analyses are restricted by the assumption of stationarity. In the future, the possible window of reasonably assuming such might be more than linearly larger than it is today. This would work in Maymin’s favor.

I have not yet explained why this means markets are efficient if and only if P = NP. Let’s say there are a group of investors searching through the total strategy set U, which is 3^n in size, for a supernormally profitable strategy. Let’s say, by miracle on one of my first guesses, I happen to find one such strategy. If P = NP, theory suggests that most everyone else will also immediately find this strategy, and hence it will be “priced into the market”.

However, if P ≠ NP, it might take years for someone else to find the same strategy, allowing me to earn a supernormal profit for a large period of time. This would render even the weak form of the EMH false. What are the empirics in favor of this idea? Well, this is something that probably deserves further research and I’m not happy with what’s provided, but Maymin cites Jegadeesh and Titman (1993) as a plausible example. Jegadeesh and Titman are credited with developing an investment strategy based on market momentum. Their strategy was largely unknown in the experiment windows (1965 – 1989) and therefore not priced into the market. Maymin’s result would suggest that this strategy becomes increasingly effective against the market as other participants content against a linearly growing events for an exponential-time algorithm. He offers this as evidence:

Image

I don’t see it as such. First, assuming stationarity across 1927 to 1989 is incredibly iffy. Second, backtracking a current strategy onto historical trends tells us what? I am positive I can also find some strategy (not momentum-6) which finds just the opposite. So what? Rather, Maymin touches on the empirical evidence that would work in his favor. That is, NASDAQ was added to the data set in 1972, vastly increasing the number of data points. If some strategy earned supernormal profits, it would be exponentially harder to mine it after the inclusion of data. To the extent that this strategy remains broadly unknown, its performance against the market should increase relative to baseline after 1972. But he doesn’t cite this data.

On the one hand, I’m glad he offers the correct framework on which to make his prediction falsifiable. On the other, presuming the above printed data from “Table 1” as in support of his hypothesis seems somewhat sly. I read this part quite favorably on my first parse, but employing this dataset is obviously incorrect for the hypothesis he attempts to prove.

The Corollary

As interestingly and more convincingly, Maymin argues that an efficient market implies that P = NP. To do this, he assumes that markets allow participants to place order-cancels-order transactions. I can say that I want to buy the Mona Lisa if it falls to $10, or sell David if it hits $15, but as soon as market conditions are such that one order is fulfilled, the other is automatically cancelled. We must actually assume that such orders with three items may be placed. Computer science nerds will know where this is going. Maymin wants to program the market to efficiently solve 3-SAT, quite literally the mother of NP-complete problems. It is beside the point of this scope to explain the dynamics of this problem, but enough to know that its solution is reducible into solving many other intractable problems, including factoring large prime numbers and hence breaking into your bank account.

The logical form of the problem is such:

Let y = (a | b | !c) & (b | !d | a) & (z | m | a) & … & (a | b | !m), where all variables are boolean

Within the literature, this is known as a “conjunctive normal form”. Each parenthetical phrase is a clause, which must consist of a disjunction between three “literals”. Solving 3-SAT involves finding the state (true or false) of each literal such that the whole statement is true (or known to be impossible to solve). 3-SAT is exponential in the number of clauses.

We can think about each clause as an order-cancels-order (OCO) option, consisting of three possible transactions. A literal can imply a sale and a negated literal a purchase, or vice-versa. Now let us price each asset (literal) at the midpoint of the bid-ask spread. Therefore, it yields a supernormal expected profit for all participants (and will be immediately arbitraged in markets are efficient).

Once we place the set of OCOs, they should all be executed within an arbitrarily small time period, as each by itself is a contradiction of of the “no free lunch condition” of efficient markets. In fact, each of the OCOs must be executed to maximize profits, and that is what proponents of the EMH suppose they do. Maymin does not state his implicit assumption that the time it takes to clear his transaction on the open market may not be instantaneous but within the weak EMH bounds of “as quickly as possible”. I would say this is a big ding to the theoretical equivalence of the two efficiencies (as he does not offer a mathematical or logical reason why one must be the other, but equivocates the terminology). I wish he had made this more clear. But I still think it’s an important result because the EMH would be a toothless truth if the “as quick as possible” included the years it would take to solve a sufficiently large 3SAT. Even then, without the theoretical equivalence, the structural similarities are striking. Note, that the example I provided above is rather easy as there are almost as many variables as there are literals. In reality, the question is a lot harder. Therefore, the market mechanisms that would “solve” such a set of OCOs in a polynomially-small time period would have done something remarkable. If I want to solve some given 3SAT problem, I just pick any given stock for each variable, encode nots as sell orders and literals as buys, and place the total transaction which must be completed in the profit-maximizing manner.

I found this result far more compelling than the first, perhaps this reflects my propensity towards computer science over finance.

Discussion

I’ve read quite a bit of stuff on how Walrasian equilibria are NP-hard, this or that. There seems to be a lot of literature relating economic games and equilibria with computational tractability. The question of the EMH is inherently different, and logical – not mathematical – in nature. The symmetry between the two fields here is one-to-one, even self-evident. So, I disagree with Cowen’s quip that stuff like this comes up once every ten years. I can’t put my finger on it, but the previous literature suggesting such similarities had more to do with solving for some equlibrium that’s hard, rather than processing the market itself.

Specifically, this is why I find the second result (reducing markets to 3SAT) to be mind-boggling.

Regardless, something in my gut rejects the logical biconditional between P = NP and the EMH. However, I think this result supports the idea that one should form his priors on both with a similar heuristic (which may yield a different result for either depending on the heuristic used).

For example, Maymin notes the contradiction between most finance professionals believing the EMH to be true and most computer scientists rejecting that P = NP. Let’s take the latter case. How exactly can one form a prior that P≠ NP? Well, he can believe that when the problem is solved it will be in favor of P ≠ NP. But that’s just turtles all the way down. A better way of explaining such a prior would be “I believe that if I asked a magical black box ‘does P = NP’ the resulting answer would be in the negative”. You agree that’s a fair way of forming a subjective belief, right? This belief can be formed on any number of things but for most computer scientists it seems to be the radical implications of a positive result (breaking RSA cryptosystems, etc.)

But, to form such a prior, you must exist that in a Popperian world such black boxes can exist. However, any such magical truth tellers existing is ipso facto a stronger and more absurd reality than P = NP. Therefore, this question is not in any normal sense falsifiable (other than the standard turtles all the way down, which only talks about the resolution of the problem rather than the true implications thereof).

I would argue that even if the biconditional between P = NP and EMH does not hold, for whatever reason, the structural science does. That is to say that submitting that the EMH is falsifiable would be akin to believing in such magical black boxes. It is better to not form any priors regarding efficient markets, as a general rule. Better would be to believe whether certain forms of markets are efficient.

The analogy to computer science holds even here. Though most NP-complete problems are in the worst case exponential in scale, heuristics and randomized approximations allow computer scientists to design remarkably efficient solutions for most cases or the average case. This is scientifically falsifiable, and the correct question to ask. Similarly, we may talk about the informational efficiency of a practical market, within certain bounds, granted certain approximations. And, crucially, granted some margin of error and the risks thereof. What are the chances a randomized input to a backtrack-enabled Knapsack solver will fall to exponential time? What are the chances a market will fail to weed out inefficiencies given a level of technology? Indeed, he suggests that it is such approximations and shortcuts that make a market perhaps inefficient, but a government even more so. He compares this result to Hayek’s famous economic calculation problem, which suggests something similar.

To me, this is an absolutely non-trivial comparison. After reading this paper I genuinely believe that it is a futile question either guessing whether P = NP (or, practically, whether when the question is resolved if P will equal NP) or whether markets are efficient. However, according to the St. Louis Fed, this paper has never been cited. (I intend to change that, one day.)

Within appropriate bounds, Maymin’s paper illuminates not necessarily how we view efficient markets, but how we view the debate thereof.

Andrew Plotkin tells us that “if you already know what recursion is, just remember the answer. Otherwise, find someone who is standing closer to Douglas Hofstadter than you are; then ask him or her what recursion is.”

Recursion is a delight for any computer science student first learning it. There’s something magical about sorting a list by asking your program to sort the right half, left half, and then merge. There’s something magical about taking that ugly code festered with loops and switches and replacing it with two lines.

My high school computer science book (it sucked) kept telling me that “recursion” is inefficient but could never tell me why. Indeed, in terms of time complexity, sometimes a recursive solution is the best that exists (quick sort, for example). But as anyone with a slightly more nuanced understanding of memory allocation knows, the computer’s dashboard (if you will) looks a little bit like this:

Credit to David Malan

Every time a method (function, subroutine, procedure, whatever you fancy) is called it’s placed on top of this so-called “stack”. The canonical explanation of a stack goes something along the line of having a stack (no, literally) of tasks to complete. You look at whatever’s on the top and begin with that, but when your boss drops by and adds another task you drop what you’re doing and work on that. As nerds say, LIFO (or last-in-first-out).

The problem with recursion is that each time a method recurses, another subroutine is added to the stack. Calling a method takes both memory and time, both of which are inconveniences that might not be relevant to one of those boring, old loops. Indeed failing recursion must be the inspiration of  the famous website, StackOverflow, which is an eponym of maxing out your pre-allocated stack space.

So now let’s make the leap to real-life. For the most part, in computers, the time required to place a procedure on the call-stack is infinitesimal. So recursion is pretty cool, here, because you get a rich variety of data structures and algorithms that are elegant and effective.

Now about ten years ago, before Jimmy Wales invented Wikipedia (or hell, before Tim Berners-Lee conceived a hyperlink), imagine reading one of those terrible World Book Encyclopedias. I remember my Dad buying a whole, grand set, for god knows how many dollars when I was seven. I think I might have read two entries. It’s a better decoration to seem all polished and nuanced than anything else (much like an expensive globe).

When reading an article in one of these books, when you came across a term you didn’t know, chances are you just kept reading. The cost of finding out what that term means was just too high. You’d have to locate the volume or dictionary in which that entry would be stored, alphabetically navigate to that page, read it, and damnit there’s another term you don’t know… Before you know it, you have the whole bloody encyclopedia strewn on the floor and you haven’t learned a thing. Time for a beer.

So, knowing that it’s just not worth your time to find a small term, your algorithm might have looked something to the effect of:

define function -> readArticle(words):

     while (not finished):

          if (notUnderstood(word) and isImportant(word)):

               readArticle(findEntryOn(words[i]))

               continue

         else:

               continue

haveDrink()

And if you’re the average person the barrier to entry of “importance” will likely be quite high, lest you want to read the whole encyclopedia.

Wikipedia has fundamentally changed how we look at this algorithm. We’ve basically all but abdicated the qualifier of importance. The “cost of calling” a link, if you will, or recursing is barely anything anymore (except for my house in Chennai which might still be using dial-up). The old encyclopedia’s offered us one thing in return for their cumbersome and pretentious being, we were compelled to read only that which we need and that is important.

Today, I barely find myself reading whole Wikipedia pages anymore, and it’s a shame. I’m too busy reading an intermediary link because it’s so computationally cheap to do so.

Look I’m not a cognitive psychologist or a behavioral economist or whatever, but I wonder if this fundamentally changes the way our brains are wired – just as fundamentally as stacks are different from queues. The iPhone and Mountain Lion notification center draw your attention from whatever you were doing at a moment’s buzz.

Here’s the problem. We’re probably hitting something of the computational equivalent of stack overflow. Our brains don’t have that much memory (I read on the incredibly reliable source of YahooAnswers). Do you ever start researching for a paper, get lost in Wikipedia and then eventually just quit? Yeah, that’s probably stack overflow, or something…

Would we be more efficient if the cost of calling a function was greater, and if so, to what extent?