CMU Researchers Break Speed Barrier In Solving Important Class of Linear Systems

drats · on Oct 22, 2010

As much as the cult of genius is often overwhelming and contributing people have been written out of histories of discovery I still feel uneasy when I see a Prof and a PhD student listed as discovering something as you never know what the Prof contributed with the current state of the academy. It can always be a collaboration yes, so I am not passing judgment on this case, but there seem to be many cases where it is not a collaboration but a payment of dues to seniority. Then again, at my faculty in another discipline we have so little support or contact with the Profs it's embarrassing: so I suppose having any kind of working relationship as peers would be better than what goes on in some places.

ced · on Oct 22, 2010

On the other hand, I did some research assistant work for my prof one summer, and he put my name on the paper the came out of it (as second author). But I didn't even understand half of what the paper said! I was a first year undergrad, and he was presumably hoping to help me "build a resume" and "get my name out there". It's an awesome gesture, and I think it happens a lot.

Furthermore, "Einstein and Noname grad student's method for solving X" sounds a lot better than "Noname's method for X". People will read it more and it will disseminate faster. The student will ultimately be better off.

hugh3 · on Oct 22, 2010

I still feel uneasy when I see a Prof and a PhD student listed as discovering something as you never know what the Prof contributed with the current state of the academy

On the other hand, quite frequently the whole thing is the professor's idea, and the grad student does the drudge work of coding it up and verifying it and writing the paper, and they co-publish.

Of course if you're going to lament "the current state of the academy" you should be aware that the current state of the academy is far more likely than any previous state to give credit to the junior researchers. And in Europe this kind of thing still goes on, with professors butting in to take primary authorship on papers where their students have done 90% of the work.

Qz · on Oct 22, 2010

As a recent CMU Alum, I can say that in this case you needn't be suspect.

gojomo · on Oct 22, 2010

Do you mean, "you need not just suspect, I can confirm that in my experience at CMU the professor is over-credited", or "you need not be suspect, in my experience at CMU the credit is properly allocated"?

jgershen · on Oct 23, 2010

I interpreted Qz's comment as "this professor is deserving of the allocated credit." As a recent CMU CS grad (and student of Professor Miller) I would agree and say that this is also generally the case.

gojomo · on Oct 23, 2010

I was leaning towards that positive interpretation, given CMU's positive reputation.

But given the thread root, and not knowing any more about Qz -- indeed being unable to trivially associate that handle with a real name/project via the profile page -- a cynical comment from a disgruntled grad couldn't be ruled out.

Qz · on Oct 23, 2010

Apologies for the lack of clarity -- my original wording was possibly more ambiguous than the one I posted and I thought I had fixed that, but I guess I failed.

d4ft · on Oct 22, 2010

The article mentions a few fairly vague examples of what this type of algorithm is used for. To all of you non-neophytes, what are some specific implementations of this algorithm and where are they used?

dmlorenzetti · on Oct 22, 2010

If you write the equations describing the flows in a mass-conserving system, you can end up with a symmetric diagonally-dominant system.

As you might expect, there's a physical interpretation.

Diagonal dominance basically means that the diagonal entry of each row of the matrix is at least as big (in magnitude) as the sum of the off-diagonal entries.

Suppose the diagonal entry gives the rate of change of flow (say of a fluid, or of electric charge) into some physical location as you change some other property of that location (like its pressure, or voltage). Then the off-diagonals in the same column reflect the rates of change of flows to other locations as you change its pressure. And the off-diagonals on the same row reflect the rates of change of flows out of that location as you change the pressures in other locations.

If the flows are driven by pressure or voltage differences-- which is often the case, especially when you've linearized the system mathematically-- you get symmetry (because adding 1 volt to node A has the same effect on the A-to-B flow as subtracting 1 volt from node B).

If the system conserves mass (or electric charge or whatever), then the diagonals must at least add up to off-diagonals (in magnitude).

So then all you need is a connection to an outside node of known pressure or voltage (like ground), that doesn't change and hence doesn't contribute to the off-diagonals. That kicks one node over into having a diagonal entry greater than its off-diagonals, and then you have a nonsingular matrix and you can apply the algorithm.

Disclaimer-- it's been a while since I worked on this, so I probably messed up at least one of the directions or row-wise vs. column-wise relationships.

mturmon · on Oct 23, 2010

And it's also worth saying that diagonal dominance is a stronger property than positive definiteness. That is, something can easily be PD but not DD.

gaius · on Oct 22, 2010

Let's say you're manufacturing a product with thousands of individual components, each of which has a different cost, a different lead time, a different shelf life, and so on. These thousands of components are made into hundreds of sub assemblies, then tens, then are finally assembled and you have your product. You can crunch all of that and answer questions such as, in what order should we do this? How much of each component should we keep in stock? How does the rate at which we work affect the final cost? Would it actually be cheaper per unit to open another assembly line? And so on.

binomial · on Oct 22, 2010

Please elaborate. Solving linear equations with software is nothing new, this algorithm just does it faster. So I'd like to see prior art for what you're describing here, sounds interesting. How exactly would one apply linear algebra to solving the first and third questions for example?

jpren · on Oct 23, 2010

Back in college a few years ago, we mostly used Simplex and other tableau-pivot based algorithms for solving large linear programming problems. Link to wikipedia article on the Simplex algorithm http://en.wikipedia.org/wiki/Simplex_algorithm

For most problems that individuals face each day, linear programs are fairly simple to model and solve. However, there are lots of complex problems that are solved each day.

A few examples of large-scale linear programming problems: - For Amazon.com: what quantity of each item should be stocked at each warehouse each day to minimize inventory while also optimizing for shipping time and cost to demand nodes (customers). - For an airline: how to plan and schedule flights to all domestic and international airports to maximize profit - In shipping logistics: how to allocate trucks and set routes to minimize fuel cost while satisfying delivery time.

For complex systems, you can easily run up a linear program with millions of independent variables (producing millions of rows in the linear system).

lmkg · on Oct 23, 2010

Solving a linear system of equations is not the same as solving a linear program. The article is about the former, not the latter. Linear programming is an optimization problem, subject to constraints that are described by inequalities. Linear systems of equations are finding values of relationships defined by equalities. There are lots of matrix multiplications involved in both, but other than that the solutions don't really resemble each other. For example, Linear Programming is not even known to be solvable in strongly polynomial time (simplex is exponential in the worst case, and the best method is weakly polynomial), while the algorithm in the paper improves from one strongly polynomial result ot another.

jmckib · on Oct 22, 2010

Link to the actual paper, for the mathematically inclined: http://www.cs.cmu.edu/~glmiller/Publications/Papers/KoutisAp...

jedbrown · on Oct 23, 2010

Funny that they don't mention multigrid which is O(n) and very reliable for M-matrices (more general than SDD, which is a rarely used term in the multigrid literature). It's hard to take this seriously if they don't demonstrate that the constants are good enough to beat the various multigrids for this problem. Also note that most physical problems which lack optimal sparsity are also not M-matrices, and indeed lack H-ellipticity (necessary and sufficient condition for the existence of a pointwise smoother), even though they are often still SPD.

wtallis · on Oct 24, 2010

Wouldn't this method at least have the advantage of being a drop-in replacement, where multigrid methods require a bit more work to integrate in to your solver?

jedbrown · on Oct 24, 2010

Algebraic multigrid is usually called in the same way as a direct solver. For example, there are three distributed-memory parallel algebraic multigrid packages (and a bunch of other methods, direct and iterative) that may be used as runtime options with PETSc's same Solve call.

jasondavies · on Oct 23, 2010

> The current theoretically best max flow algorithm uses, at its core, an SDD solver.

Does anyone know how this improvement in solving SDD affects the complexity of the current best max-flow algorithm [1] i.e. (N+L)^(4/3) ?

[1]: http://web.mit.edu/newsoffice/2010/max-flow-speedup-0927.htm...

fbcocq · on Oct 22, 2010

I'm terribly sorry if this is an ignorant question but does this somehow apply to manually solving a, say, 5x5?

dmlorenzetti · on Oct 22, 2010

Probably not worth the effort on such a small system.

They specifically target large systems, because their approach (I think) iteratively approximates the matrix, then uses the solution of each simpler version as the initial guess to the solution of the next closest one to the original.

Presumably all that overhead pays off in the long run, but not on small systems.

joshuasmyth · on Oct 22, 2010

In the article it mentions the runtime is s[log(s)]2 - Does this mean slog(s)log(s) or 2slog(s) ?

jessriedel · on Oct 22, 2010

It's the former, and it looks fine to me: http://i.imgur.com/g38Ix.jpg

andyv · on Oct 22, 2010

I had to chuckle about this too. Journalism and mathematics just don't work together at all...