Hacker Newsnew | past | comments | ask | show | jobs | submit | jimfleming's commentslogin

From the tree docs:

> tree has originally been part of TensorFlow and is available as tf.nest.

The tf.nest docs can be found here and may be more useful for now: https://www.tensorflow.org/api_docs/python/tf/nest

I'm glad this is being moved out of TensorFlow. It's a really useful library on its own and I've found myself reaching for it on projects without wanting to import all of TensorFlow.


I wonder what is the utility of this library?


> That needs to work before moving to more complexity.

It really depends on what level of abstraction you care to simulate. OpenWorm is working at the physics and cellular level, far below the concept level as in most deep learning research looking to apply neuroscience discoveries, for example. It’s likely easier to get the concepts of a functional nematode model working or a functional model of memory, attention, or consciousness than a full cellular model of these.

More specifically, a thousand cells sounds small in comparison to a thousand layer ResNet with millions of functional units but the mechanics of those cells are significantly more complex than a ReLU unit. Yet the simple ReLU units are functionally very useful and can do much more complex things that we still can’t simulate with spiking neurons.

The concepts of receptive fields, cortical columns, local inhibition, winner-take-all, functional modules and how they communicate / are organized may all be relevant and applicable learnings from mapping an organism even if we can’t fully simulate every detail.


The trouble is that (assuming sufficient computational power) if we can't simulate it then we don't really understand it. It's one thing to say "that's computationally intractable", but entirely another to say "for some reason our computationally tractable model doesn't work, and we don't know why".

Present day ANNs may well be inspired by biological systems but (as you noted) they're not even remotely similar in practice. The reality is that for a biological system the wiring diagram is just the tip of the iceberg - there's lots of other significant chemical things going on under the hood.

I don't mean to detract from the usefulness of present day ML, just to agree with and elaborate on the original point that was raised (ie that "we have a neural wiring diagram" doesn't actually mean that we have a complete schematic).


I'm aware of that and I've done quite a bit of work on both spiking neural networks and modern deep learning. My point is that those complexities are not required to implement many important functional aspects of the brain: most basically "learning" and more specifically, attention, memory, etc. Consciousness may fall into the list of things we can get functional without all of the incidental complexities that evolution brought along the way. It may also critically depend on complexities like multi-channel chemical receptors but since we don't know we can't say either way.

It's a tired analogy but we can understand quite a lot about flight and even build a plane without first birthing a bird.


> It's a tired analogy but we can understand quite a lot about flight and even build a plane without first birthing a bird.

The problem is we don't know if we're attempting to solve something as "simple" as flight with a rudimentary understanding of airflow and lift, or if we're attempting to achieve stable planetary orbit without fully understanding gravity and with a rudimentary understanding of chemistry.

I think it's still worth trying stuff because it could be closer to the former, and trying more stuff may help us better understand where it is on that spectrum, and because if it is closer to the the harder end, the stuff we're doing is probably so cheap and easy compared to what needs to be done to get to the end that it's a drop in the bucket compared to the eventual output required, even if it adds nothing.


Your analogy is actually quite apt here - the wright brothers took inspiration from birds but clearly went with a different model of flight, just like ANN field has. The fundamental concept of the neurons are same, but that doesn't mean the complexity is similar.

Minimally, whatever the complexity inside a Biological neuron maybe, one fundamental propery we need to obtain is thr connection strengths for the entire connectome, which we don't have. Without that we actually don't know the full connectome even of the simplest organisms, and no one to my knowledge has hence actually studied the kind of algorithms that are running in these systems. I would love to be corrected here of xourse.


Even with connection strengths I still don't think we would really have the full connectome. Such a model would completely miss many of the phenomena related to chemical synapses, which involve signal transduction pathways, which are _astoundingly_ complex. Those complexities are part of the algorithm being run though!

(Of course we might still learn useful things from such a model, I just want to be clear that it wouldn't in any sense be a complete one.)


This. I simply cannot even begin to go into the sheer magnitude of the number of ways the fundamental state of a neural simulator changes once you understand that nothing exists monotonically. It's all about the loops, and the interplay between them. So much of our conscious experience is shaped by the fact that at any one time billions upon billions of neural circuits are firing along shared pathways; each internal action fundamentally coloring each emergent perception through the timbre it contributes to the perceptual integration of external stimuli.

It isn't enough to flip switches on and off, and to recognize weights, or even to take a fully formed brain network and simulate it. You have to understand how it developed, what it reacts to, how body shapes mind shapes body, and so on and so forth.

What we're doing now with NN's is mistaking them for the key to making an artificial consciousness, when all we're really playing with is the ML version of one of those TI calculators with the paper roll the accountants and bookkeepers use. They are subunits that may compose together to represent xmcrystalized functional units of expert system logic; but they are no closer to a self-guided, self-aware entity than a toaster.


Agreed, though continuously monitoring the propagation of the signals in vivo would allow us to at least start forming models on temporal or context specific modulation of connection strengths (which in the end is what decides the algorithms of the nervous system I presume)


It's easy to see if something flies or not. How would you know if your simulation is conscious?


This is, of course, the key problem.

I mean, I know that I'm conscious. Or at least, that's how it occurs for me.

But there's no way to experience another's consciousness. So behavior is all we have. And that's why we have the Turing test. For other people, though, it's mainly because they resemble us.


^ This. The AGI crowd consantly abuses the bird/plane parable.


> we can understand quite a lot about flight and even build a plane without first birthing a bird

Or fully understanding fluid dynamics and laminar flow. No doubt that the Wright Brothers didn't fully grok it, at least.


But we understand tons of things without simulating them.


Can you give some examples? I'm guessing there is a different in definition of understanding here.

As I interpret GP, the claim is you can't describe something in sufficient detail to simulate it, then you don't actually understand it. You may have a higher-order model that generally holds, or holds given some constraints, but that's more of a "what" understanding rather than the higher-bar of "why".


I don't think that's what they're saying. We could have the detail and understanding but lack compute.

It seems that they are saying that a simulation is required for proof. We write proofs for things all the time without exhaustively simulating the variants.


I explicitly called out the case where issues arise solely due to lack of compute in my original comment.

I never claimed that a simulation is required for proof, just that an unexpectedly broken (but correctly implemented) simulation demonstrates that the model is flawed.


> (but correctly implemented)

Do you ensure this by simulating it?


No? It honestly seems like you're being intentionally obtuse. The simulation being correctly implemented is an underlying assumption; in the face of failure the implementer is stuck determining the most likely cause.

Take for example cryptographic primitives. We often rely on mathematical proofs of their various properties. Obviously there could be an error in those proofs in which case it is understood that the proof would no longer hold. But we double (and triple, and ...) check, and then we go ahead and use them on the assumption that they're correct.


> Can you give some examples? I'm guessing there is a different in definition of understanding here.

I'm not the previous poster, but how about the Halting Problem? The defining feature is that you can't just simulate it with a Turing machine. Yet the proof is certainly understandable.


If you think you understand something, write a simulation which you expect to work based on that understanding, and it doesn't work - did you really understand it?


Maybe, maybe your simulation is just buggy. I can write a simulator of how my wife would react to the news I'm cheating on her, and fail miserably, but I'm quite positive I understand how she would actually react.


Yes, you have to debug your code. I suspect that the people who implemented OpenWorm are capable of and have done that.


OK, what if the simulation works, did you understand it before?


Not necessarily. A working simulation (for some testable subset of states) doesn't carry any hard and fast logical implications about your understanding.

On the other hand, assuming no errors in implementation then a broken simulation which you had expected to work directly implies that your understanding is flawed.


I recently saw this video of living neurons: https://www.youtube.com/watch?v=2TIK9oXc5Wo (I don't actually know the original source of this)

and just looking at the way they dance around - they're in motion, they're changing their connections, they're changing shape - is so entirely unlike the software idea of a neural network that it makes me really doubt that we're even remotely on the right track with AI research


Amazing video! And these poor neurons are squished between glass, imagine them crawling around in a 3D space.


It really depends on what level of abstraction you care to simulate.

The article starts out "At the Allen Institute for Brain Science in Seattle, a large-scale effort is underway to understand how the 86 billion neurons in the human brain are connected. The aim is to produce a map of all the connections: the connectome. Scientists at the Institute are now reconstructing one cubic millimeter of a mouse brain, the most complex ever reconstructed."

So the article is about starting with the wiring diagram and working up. My point is that, even where we already have the wiring diagram for an biological neural system, simulating what it does is just barely starting to work.


A good comparison exists with emulators, where transistor level emulation is ill advised for most hardware


This functionality is built into OpenCV[0]. If you're using a reference image (or if you know the lens properties) it doesn't require ML. It's mostly just a matrix transform.

[0] https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_cal...


Ah yes! It looks like this is related to what I was looking for. It was a long time ago that I saw a demo about this and I was under the impression that ML was used to calculate the correction matrix.


I've noticed that many JAX libraries (including those from Google) seem to adopt an object-oriented style more similar to Torch/Keras rather than JAX's functional style demonstrated in modules like jax.experimental.stax. This is disappointing since stax is quite clean and these libraries seem to use a lot of hacks to make OO work with JAX. Is there an effort to implement and maintain more full-featured functional libraries in the jax/stax style?


I've been involved w. jax/stax/trax/flax - I think the real issue w. the stax-like functional form is that it gets unwieldy very quickly when dealing w. more complicated models that are natively general-graphs as opposed to simple sequential pipelines that can be trivially mapped to a combinator expression tree. Of course there are many solutions here, but ultimately if you're building an NN library you need to build something that ML researchers actually want to use daily, and that tends to look closer to hackable pytorch-like DSLs rather than higher-order functional code - which often looks elegant but tends to hurt readability and rework speed.


More anecdata: we consistently outperform lightgbm, xgboost, random forests, linear models, etc. using neural networks even on smaller datasets. This applies whether we implemented the other algorithms ourselves or simply compared to someone else’s results with them. In my experience it really comes down to how many “tricks” you know for each algorithm and how well can you apply and combine these “tricks”. The difference is that neural networks have many more of these tricks and a broader coverage of research detailing the interactions between them.

I call them “tricks” but really they’re just design decisions based on what current research indicates about certain problems. This is largely where the “art” part of neural networks comes from that many people refer. The search space is simply too big to try everything and hope for the best. Therefore, how a problem is approached and how solutions are narrowed and applied really matter. Even simple things like which optimizer you use, how you leverage learning rate schedules, how the loss function is formulated, how weights are updated, feature engineering (often neglected in neural networks), and architectural priors make a big difference on both sample efficiency and overall performance. Most people, if they’re not just fine-tuning an existing model, simply load up a neural network framework, stack some layers together and throw data at it expecting better results than other approaches. But there’s a huge spectrum from that naive approach to architecting a custom model.

This is why neural networks are so powerful and why we tend to favor it (though not for every problem). It’s much easier to design a model from the ground up with neural networks than it is for e.g. xgboost because not only are the components more easily composable thanks to the available frameworks but there’s a ton more research on the specific interactions between those components.

That doesn’t mean than every problem is appropriate for neural networks. I completely agree with you that no matter what the problem is you should never jump to an approach just because its popular. Neural networks are a tool and for many problems you need to be comfortable with every one of those decision points to get the best results and even if you’re comfortable it can take time and that isn’t always appropriate for every problem. My other point is that I wouldn’t draw too many conclusions about a particular algorithm being better or worse than another. I’m not saying that was the intention with your comment but I know many people in the ML industry tend to take a similar position. It really depends on current experience with the applied algorithms, not just experience with ML in general.


This was a really interesting and insightful comment, thanks for sharing. I think the conclusion I shared in my sibling comment was probably a little too broad.

I particularly like this:

> In my experience it really comes down to how many “tricks” you know for each algorithm and how well can you apply and combine these “tricks”. The difference is that neural networks have many more of these tricks and a broader coverage of research detailing the interactions between them.

This is pretty true - the lack of knobs to turn on something like XGBoost or LightGBM both make it pretty easy to get good results and harder to fine tune results for your specific problem. Maybe this isn't the most correct way to look at it, but I've always sort of pictured it as curve where you are plotting effort vs results, and the one for LightGBM/XGBoost starts out higher but is more flat, and the NN one is steeper.

I guess reading your post makes me wonder where the two curves cross? Do you have good intuition for that, or do you feel so comfortable with neural networks that they are sort of your default? I peeked at the company you have listed in your bio, and it looks like you have pretty deep experience with neural networks and work with other people who have been in research roles in that area too, and I wonder how that changes your curve compared to the average ML practitioner? Certainly figuring out how to pick the best layer combinations, optimizer, loss functions, etc benefits hugely from intuition gained over years of experience.


I think your conclusions are accurate. For many problems LightGBM or xgboost can often yield decent results in short amounts of time and for many problems that’s sufficient. A lot of the work we do is about pushing the results as far as we can take them and the business case justifies the extra time it can take to get there. For those types of problems, today, we would probably choose a neural network because then we have a lot more knobs as you mentioned.

Just like the rest of ML, whether neural networks are the right choice still depends on the problem at hand and the team implementing the solution. It definitely impacts where the performance / time curves intersect. If we just need something decent fast, or we’re working with another team that doesn’t have the same background, we tend to focus on approaches with fewer moving pieces. If we need the best possible performance, have a qualified team to get there, and have the time to iterate on development then the curves would favor neural networks.


Do you have any advice on how to increase performance of NN? Id be interested to see some examples of NN doing better than lgbm benchmark on medium size tabular data. What black magic is needed to achieve this? Would be super valuable to my job:)

This tuning approach gets good results for Lightgbm. I'd recommend using TimeSeriesSplit.

https://www.kaggle.com/nanomathias/bayesian-optimization-of-...

I've seen colleagues do something like this, or random search over NN architecture (NUM layers, nodes per layer, learning rate, dropout rate), always falling short of results this archives, despite far longer time to code up an tune model.


It’s really very problem dependent. I allude to a few low-hanging things in my post above: e.g. feature engineering. Just because neural networks have an easier time learning non-linear feature transformations doesn’t mean its good to ignore feature engineering entirely.

Possibly more important is to focus on the process for how you derive and apply model changes. You get some model performance and then what? Rather than throwing something else at the model in a “guess-and-check” fashion, be methodical about what you try next. Have analysis and a hypothesis going in to each change that you make and why it’s worth spending time on and why it will help. Back that hypothesis by research, when possible, to save yourself some time verifying something that someone else has already done the legwork on. Then verify the hypothesis with empirical results and further analysis to understand the impact of the change. This sounds obvious (it’s just the scientific method) but in my experience ML practitioners and data scientists tend to forget or were never taught the “science” part. (I’m not accusing you of this; it just tends to be my experience.)

Random search, AutoML, hyperparameter searches, etc. are incredibly inefficient at model development so they’ll rarely land you in a better place unless a lot of upfront work has been put in. For us, they’re useful for two things: analysis and finalization. For analysis the search should be heavily constrained since you’re trying to understand something specific. For finalization of a model before going into production, a search on only the most sensitive parameters identified during development usually yields additional gains.


Any good references on the art part, or is it just an intuition you develop over time? In my experience, all the ML education will teach you a ton of theory and basics but none of the practical details you're referring to.


It takes time and a lot of hands-on experience. Many ML teams tend to work on one or just a few tightly coupled project for years. By contrast, we’ve worked on a lot of unique projects with real-world constraints so it gives us a different perspective. An important part has been developing a rigorous process; sort of a framework for applying the “art”. As you mention, this often isn’t covered in ML or data science education, which tends to focus on (important) fundamentals.


> Seems indie frameworks in AI can't survive?

AI frameworks are enormously complex pieces of software—a mixed bag of GPU acceleration, math utilities, low and high-level implementations of state-of-the-art components, and a (hopefully, eventually, but not usually) friendly interface to tie it all together.

While Keras is certainly useful, it was only possible because of the underlying libraries (i.e. Theano, TensorFlow, etc.) However, maintaining useful frameworks on a shifting landscape of underlying libraries often results in mismatches which leads to weird incompatibilities, performance regressions, numerical instabilities, etc. Even the big players deal with these issues internal to their own AI frameworks.

Moreover, due to performance and deployment requirements, and the rapid development of the field, these underlying libraries are often tightly coupled or have poorly defined API interfaces for third-parties outside of "model development".

It's not impossible to have well-defined abstractions but "AI" is too new, too varied and too ill-defined itself to have developed those abstractions sufficiently for _large_ indie frameworks to be successful to a broader community. It would be similar to trying to design a high-level framework for "software development". Again, not impossible—Keras did much better than most—but exceedingly difficult for independent developers. Therefore, we're left with only the biggest players.

It's not all bad news, though. There's certainly room for independent development, but outside of hobby projects or thought experiments, I don't think these contributions are effective at the broadest level of scope. The best way for independent developers to contribute and keep their sanity is on implementations of individual components, utilities, or abstractions for specific use cases which don't try to be everything to everyone.


The article's demonstration of a counting model is horribly inaccurate to the point where I'm not sure why it was included. Most people see "AI" as being either good at something or not. There's little nuance such as some models being better than others. This kind of demonstration just weakens the reader's confidence in more fully developed results or different approaches. I'm not sure this brief statement offsets the prominent visuals:

> On the day of the protest, Mr. Yip and the A.I. team used technology that is much more advanced. They spent weeks training their program to improve its accuracy in analyzing crowd imagery.

Setting aside the presentation, from the photos the researchers appear to be using object detection rather than density estimation. This choice is problematic given the quantities involved and the need for temporal consistency.

I'm also skeptical of using human volunteers and surveys to calibrate the model. Humans are terrible at counting large numbers of people in real-time. That's a central point of the article with different groups of people providing wildly different counts.


You can count from a still picture or a video segment and use that to test or calibrate. So it may just be inaccurate reporting to give the impression that the calibration depends on humans surveying live action, which is exactly what they are working to replace. If the researchers publish their results the exact methodology used will tell but I assume they are competent.


To those that are upvoting this and previous Swift + TF announcements: What are you excited about, specifically? Why Swift? Why not Julia? Is it the syntax? Types? Compilation? Performance? Community?

I like Swift and all but our ML/DL/RL/DS tools and libraries are in Python (and occasionally R). Most are missing for Swift without an awkward Python compatibility layer and I don't see a compelling reason to adopt it.


I upvoted it not because because I use Swift (or intend to use it) but because I find it interesting that the concept of differentiable programming is pushed further and further. Having a host language being used to write and compile a second language (which is implemented in a third language) just feels restrictive in many ways.

I do think what is being done with Julia, Cassette, Flux and Zygote more interesting since it's Julia all the way down (while Tensorflow's backend is still C++) and the compiler work is focusing on not being specific to one implementation or technique, but allowing any such language extensions (such as auto-parallelization and other forms of source to source transformations) to be done by any 100% Julia library. So if Tensorflow for Swift (regardless of the actual reasons behind the choice of the language) proves that the technique is a significant upgrade over what currently exists, it could spark interest in the competing approaches, and I think Julia can help pushing the concept even further.


"I like Swift and all but our ML/DL/RL/DS tools and libraries are in Python (and occasionally R"

the python interoperability allows you to use all python libraries but in swift


In the perspective of someone who writes Julia, which can also call directly any Python method (and R, Fortran, C and C++), that's a nice stopgap, but you really want a true native ecosystem. Not only there is more mental effort dealing with two languages at the same time (which might lead to people just use Python in the first place), the whole purpose of Swift for Tensorflow is having a language with first class differentiation support, which is pointless when the ecosystem is fragmented in multiple languages.

And there is the risk that the community simply ends up considering that good enough and just make wrappers (since it needs a lot of work to create something nearly as good from scratch). Thankfully that didn't happen with the Julia community, and the key is probably making the creation of the tools much easier so they can catch up to mature but constantly evolving environments.


As far as i know, Swift is the only one embedding the graph flow of the TensorFlow directly in its compiler, so those idioms can translante from IR to machine code with paralelism and a fine-tuned machine code.

As its very common to target this sort of code to GPU´s, and LLVM can target them as output, i think is mostly because you can design and shape the langage right on its high level representation and tune for high performant code in the low level.

The why, i think, its about opportunity.. Chris Lattner going to Google, and people from the language and the compiler side of the fence being open about bake this right into the compiler when necessary.


I think this article by Jeremy from fast.ai lays out a compelling case:

https://www.fast.ai/2019/03/06/fastai-swift/


”To those that are upvoting this and previous Swift + TF announcements […] Why not Julia?”

You seem to imply that people who upvote both Swift/TF articles won’t upvote Julia ones.

For me personally, I have upvoted earlier Swift/TF articles because they were well-written, and integrating differentiation fairly deeply into the compiler seemed novel to me.

I think I also have upvoted some Julia articles in the past, not because they were about Julia, but because they were interesting and well-written.

”tools and libraries are in Python (and occasionally R). Most are missing for Swift without an awkward Python compatibility layer”

An upvote need not mean “that’s immediately useful for me”. It could also mean “I like Swift, and this looks quite an improvement for it, making it more competitive with the leading technologies for machine learning” (you like Swift, so you _could_ upvote articles like these for that reason, too)


Fastai is also working on S4TF integration in the future. Here's Jeremy Howard's blogpost on WHY Swift?

Edit: I apologise, just after posting this comment I saw this link had already been posted. https://www.fast.ai/2019/03/06/fastai-swift/


You mentioned it but, for those who don’t know:

As a stopgap while all the data science tooling is being built out for Swift you can use anything from the Python ecosystem (including np and pd) by using the Python interop:

let np = Python.import(“numpy”)

https://github.com/tensorflow/swift/blob/master/docs/PythonI...


how does this work? does it actually compile/embed there python interpreter or does it make calls to the system interpreter somehow?


From the linked document:

> To accomplish this, the Swift script/program simply links the Python interpreter into its code.

I would imagine that having the interpreter in another process would be a gigantice performance hit.


yea i realized it's the obvious thing: load the dll.


i just tried this out and i'm blown away that it works. even matplotlib. does anyone know how to point it at a different interpreter? a venv for example?


My top four reasons: 1) the eventual MLIR for LLVM will empower new algorithms; 2) Lattner and team have a track record; 3) vast community of swift application developers; 4) fastai.


I assume it is because you can use it on iOS/OSX.

And people are still doing a lot of Data Science work in R and Scala so I wouldn't say it is at all Python centric.


that's not the reason. Read this link: https://www.fast.ai/2019/03/06/fastai-swift/


This is sort of true. Birds are significantly more energy efficient than planes for some kinds of flight. Birds are also able to perform maneuvers that planes cannot such as landing on a branch. For transportation we just don't care about these features.

I agree that CV does not need to mimick everything about human vision but we still have a long way to go before having vision as robust and efficient as humans. Moreover there are benefits in interpretability and explainability to mimicking human vision more closely. If CV perceived the world similarly it can fail in more similar and predictable ways.


Hyperparameter optimization (including architectures) is not really meta-learning. Meta-learning, also known as "learning to learn", is more like MAML[0], RL2[1], L2RL[2], etc.

0. https://arxiv.org/abs/1703.03400

1. https://arxiv.org/abs/1611.02779

2. https://arxiv.org/abs/1611.05763


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: