DJ Strouse

the rantings of a baby scientist

DJ Strouse header image 4

Four Big Ideas from the Open Science Summit 2010

August 4th, 2010 by djstrouse
Respond

Last weekend, half of my RSS, FriendFeed, and Twitter feeds assembled in Berkeley for the first major conference ever devoted to open science** – the Open Science Summit 2010. The talks ranged from invigorating to completely inappropriate, but the real action was not on stage; it was in the hallways. Put a couple hundred hackers, scientists, and open science fanboys in a conference hall in Berkeley, add after-hours pub crawls, and simmer for three days and you’ve got a recipe for disruptive ideas. I’ll outline my favorite four below.

1. The Synergy Between Microfinance and Open Science
At least in the US, the most typical flow of funding for science follows the pattern: taxpayer -> government -> scientists. FundScience, SciFlies, and EurekaFund ask, “Why not cut out the middle man?” Their idea is to enable citizens to fund scientific projects directly. While any one citizen probably can’t afford to fund anything but mathematics (coffee is cheap), the collective donations of many science groupies can easily add up to support more resource-intensive projects.

I really like this idea because it beefs up the incentive for scientists to adopt open science practices. Why? Consider which projects are most likely to be funded by microfinance. If I’m a citizen about to throw several hundred dollars into a scientific project, I want to be able to see the science. A published paper every few months (or year) is not enough. I want to see the process, I want live updates, and I want to feel like my donation is moving science forward. In other words, citizens will be more likely to fund open science projects than traditional proprietary projects.

Microfinance needs open science because it needs a way to attract citizens and get them excited about the ongoing science of a particular lab. Open science needs microfinance in order to create clearer incentives for scientists to share their science.

2. Reproducibility as the Standard for Open Science
Science is supposed to provide a systematic way for us bumbling fools to avoid deceiving ourselves. One way it does so is by enforcing that our theories be based on results that are reproducible. Yet consider the last paper you read. Where was the raw data from which plots were produced? Where was the simulation code? Where were the exact experimental protocols? Could you really reproduce the results of that paper without this information?

Science should not require trust in another’s scientific infallibility. If you publish an interesting new discovery, I should have the opportunity to convince myself of your discovery by reproducing it. Science that is not reproducible is not science; its marketing.*

The standard of reproducibility provides an answer to the question: “Just how open should science be?” If we truly mean to do good science and avoid deceiving ourselves, we need to release every bit of data, code, protocol, and communication that would allow a colleague to reliably reproduce our results.

If you’re interested, you can read, listen, or watch more on this idea from computational scientist and policy wonk Victoria Stodden.

3. Come for the Closed, Stay for the Open
There’s a problem with websites whose main benefits come from a large community of users – they’re only useful once plenty of people sign up and early adopters will be bored in the meantime. Successful websites should be useful to single users or small groups, even if all their friends & colleagues haven’t signed up yet.

For web apps promoting open science, this means that the successful sites will be those that prove useful to individual researchers or research groups, regardless of whether or not their colleagues also use the site. For CoLab (a website enabling online scientific collaboration that Casey Stark and I built and demoed at OSS 2010), this means creating a rich set of tools that is useful for managing the workflow of individual scientists or groups.

Doing so is essential to convincing those that are on the fence about open science to give it a try. The goal is to draw scientists in with slick project management tools for their closed group activities, expose them to the lively discussion and new collaborations being formed over the open projects on the site, and gradually convince them that openness makes science more efficient and fun.

(Thanks to Jason Hoyt at Mendeley for pointing this out.)

4. New Vision for CoLab – Enable Scienctific Debate Around Any Piece of Scientific Content
CoLab was inspired by PolyMath, Quantiki, and a few other experiments in open science from the theoretical physics & mathematics communities and was built by a pair of physics and math majors. Not surprisingly, the site is currently optimized for collaborating over projects that focus on discussion and equations. But Casey and I are aiming to make it stupid easy for all scientists to collaborate openly online, not just physicists and mathematicians. After a series of long discussions with Jean-Claude Bradley, Lee Worden, and other experimentalists who want to share more than equations, I think we’ve got a better idea of how to do so.

Our new vision for CoLab is to enable scientific debate around any piece of scientific content. We want to make it stupid easy to center a discussion around protocols, data, plots, published papers, papers in progress, simulations, code, or any other component of scientific research. As an experimentalist, I should be able to import a lab protocol, raw data, or manipulable plots based on a live feed from that raw data and discuss it online with collaborators across the globe. As a computational scientist, I should be able to import code or live simulations and troubleshoot online with anyone in the world who might be able to help. As a member of a journal club, I should be able to import a published paper and collaboratively highlight and annotate in-line with colleagues, from those in the lab next door to those in another country. As a researcher ready to publish, I should be able to host a working version of my paper online, collaboratively edit with any of my colleagues, and submit a link directly to a journal, without being forced to download the paper and make finishing touches offline. In short, as a scientist, I should be able to easily and openly discuss any piece of my science with my entire scientific community.

That’s no small task, but its what science needs and what we will continue to build.

*Update (August 4, 2010): After a fruitful discussion with Michael Nielsen (@michael_nielsen) and Seb Paquet (@sebpaquet) on Twitter, I should clarify that certain fields, such as astronomy, have fundamental barriers to reproducibility. As much as they might love to, physicists cannot summon supernovas on command. Thus, in observation-based fields, we should stress that data analysis be reproducible but not necessarily data collection. The key point is that information exchange between researchers should not be a barrier to reproducibility.

**Update (August 7, 2010): As pointed out by Greg Wilson in the comments below and Lisa Green of Creative Commons over lunch today, there have been plenty of open science conferences over the past decade. This sentence should really read: “…first major conference devoted to open science that this baby scientist & web dev noob had ever seen.”

Tags: 4 Comments

The State of Theory in Neuroscience or: How I Learned to Stop Worrying and Love the Data

July 25th, 2010 by djstrouse
Respond

If my initial foray into the territory of the biologists has taught me anything, its that theory in neuroscience is a very different game than that of theory in physics. Theoretical physicists are able to temporarily retreat into pure thought and calculation, minimizing communication with experimentalists, and yet still make significant scientific progress. Theoretical neuroscientists, on the other hand, are currently chained to their experimentalist brethren, doomed to empty speculation and crackpot theories if they naively strike off on their own.

Why the Physicists Can
There is a cultural myth that progress in theoretical physics is made by emaciated hermit-geniuses who go off into the woods for months, scribble equations and day-dream in solitude, and return with profound insights into Nature’s inner workings. Although this is an exaggeration, there is some truth to it. Theoretical physicists can go off into the woods and do their work (albeit usually in the company of others) and can make progress while spending a great deal of time in solitude (although most don’t). Though theorists and experimentalists do work closely with one another, theorists are capable of running off without experimentalists for months or years on end and still making scientific progress.

Albert Einstein famously popularized the Gedankenexperimentthe thought experiment meant to elucidate scientific truth based solely on previous knowledge, logic, scientific intuition, and imagination. Though Einstein did not work alone (contrary to popular belief, he had many collaborators), many of his ideas were inspired by thought experiments. For instance, his inspiration for special relativity was based on his mental simulation of chasing a beam of light. More recently, string theorists have argued that, while decades or more ahead of realizable experiments, their approaches to describing the fundamental laws of nature represent a new kind of science that places great confidence in both the ingenuity of the human mind and the beauty and symmetry of Nature (no reference here – this is just what I’ve heard on the physicist circuit).

For our purposes though, the important point is not to what degree theoretical physics can temporarily decouple from experiment; its that progress can be made by theorists at all without constantly holding hands with experimentalists.

Why is this possible? What is special about physics that allows this to occur?

First of all, this wasn’t always possible in physics. Go back to the toddler years of science when Galileo, Newton, and friends were paving the way for modern science, ask them whether they consider themselves “theorists” or “experimentalists,” and you are bound to get blank stares. The distinction between theory and experiment wasn’t made until centuries later. Galileo proposed in precise mathematical terms our modern concept of inertia and built pendulums and telescopes. Newton laid down his famous three laws and played with prisms and mirrors. In the early days of physics, theory could not stray far from experiment.

What allowed theory to periodically decouple from experiment was the establishment of a sufficient theoretical framework to begin with.

The Gedankenexperiment requires solidly tested laws from which to launch intuitive explorations. Galileo and Newton had no such thing. They had to stick very close to Nature and experiments because at that time, we knew very little about Nature. Einstein, on the other hand, had a little more going for him. He had Newtonian mechanics and Maxwell’s electromagnetic theory upon which to base his dreams about chasing beams of light. He didn’t have to actually try to chase a beam of light (which would have been a bit difficult) because he had a solid theoretical framework within to mentally simulate at least parts of the experience. In other words, a solidly tested theoretical framework can allow us to replace many basic physical experiments with thought experiments. Fast forward to today and physicists have established a much richer basis of well-tested laws, an environment that supports entire intellectual castles of theory, well-protected from the toils of experiment.

What I want to emphasize is that without an established base of theory, there can be no decoupling of theory and experiment. The modern state of theoretical physics has spoiled many scientists who tend to borrow their ideas of what theory work should look like in other fields from physics. But physics is quite different from other fields. Physics is very old, often deals with relatively simple phenomena, and has centuries worth of solidly tested theories. In other words, borrowing assumptions from modern physics is a grave mistake. For many fields, including neuroscience, it would be far better to borrow ideas about the coupling of theory and experiment from early physics.

Why the Neuroscientists Can’t
Modern neuroscience has very little theory to build upon. Sure, they’ve got the neuron doctrine, that the brain is made of individual cells, but that’s not much more than a special case of a more general law in biology. Beyond that, even the widely accepted notion that spiking neurons are the sole transmitter of information in the brain is a bit shaky. We are quickly gathering plenty of anatomical data, correlations between brain activity and behavior, and other interesting nuggets of phenomena, but broad theories to help us understand this sea of data are either non-existent or highly speculative.

Worse, its not even clear whether such theories will exist or what they will look like. While physicists simplified their game by focusing on “fundamental laws”, neuroscientists face the menacing challenge of historical accidents and messy hacks built by millions of years of evolution. Many (including myself) are banking on the existence of some basic laws that govern brain structure and dynamics, but these laws may look very different from the somewhat more “ahistorical” laws of physics.

What the Neuroscientists Can Do
So if you’re a bright-eyed, bushy-tailed, naive young physicist/mathematician who dreams of building theories of the brain, what do you do?

1. Avoid brain theories built by rogue engineers, physicists, and mathematicians who have never met a biologist.

To clarify my earlier comments, its not that brain theories don’t yet exist; its that good brain theories don’t yet exist. There are plenty of electrical engineers debuting their latest computer architecture models of the brain, computer scientists proposing their shiny new graphical models of learning, mathematicians arguing that synchronized feed-forward neural networks clearly solve the binding problem, and other rogue scientists who base their theories of the brain on their intuitions about how the brain should work rather than data about how the brain does work*. Plato & Descartes may have had to base their theories of how the mind works on pure introspection and phenomenology, but ever since Ramon y Cajal starting poking around neural tissue, we’ve had real live data to guide our intuitions. Decades from now, we will look back on the state of theory in modern neuroscience and wonder how so much nonsense was published. No field in science today is more polluted with bunk theories and outrageous publications than neuroscience.  If you want to understand actual brains, don’t fill your own with this drivel.

2. Form tight collaborations with experimentalists (and maybe even do a few experiments yourself).

There is room for theory in neuroscience – theory tightly coupled to ongoing experiments. Find people with patch clamps and MRI machines. Understand what they do and how they do it. Propose new experiments and try to help explain the results of old ones. The interesting and achievable projects in theoretical neuroscience of today are not the grand challenges such as explaining the hard problem of consciousness; they are explaining the tiny anomalies in experimental data that make you pause for a moment and scratch your head. Why does the distribution of synaptic strengths in rat visual cortex follow a lognormal distribution? Why does it consistently seem that roughly 90% of neurons are inactive? Why does ferret visual cortex activity rate and variance seem to rise throughout early development, peak, and then decline in maturation? These are the types of questions we need to tackle first before we can explain consciousness, thought, love, and all of that fun stuff. As much as I look forward to the possibility of understanding the brain well enough to support genuine Gedankenexperiment-style theory work, we’re not there yet. Now is the time for theoretical neuroscientists to imitate the physicists of Renaissance Europe, not those of Princeton and Waterloo. In other words, either get your hands dirty with experiments or make friends with someone who does.

Either way, don’t stray far from the data.

*For the sake of not making too many enemies, I’ll avoid references here, but you know how to use Google.

Tags:   · · No Comments.

Progress Report: Explorations in Mathematica & Complex Analysis Boot Camp

May 30th, 2010 by djstrouse
Respond

My progress reports may have been dormant for a few weeks, but the search for Levinson’s theorem on graphs has not!  (My laptop was stolen last week so my recent time with computers has been necessarily precious and could not be wasted on blog ramblings.  More on this below.)

Explorations in Mathematica
As you may remember, the last we heard from our heroes, they were in search of a version of Levinson’s theorem (relating the number of bound states to the winding of the phase of scattered states) that could be applied to graphs.  Given that neither Andrew nor I were quite sure how this relationship between bound states and the phase shift would play out, we decided to do some experiments.  First, we cooked up equations for the simplest graphs we could think of – a single weighted edge and then (drumroll) a single weighted edge with a self-loop on one vertex.  It quickly became apparent that if we wanted to take a look at meatier graphs, we were going to need some help.  So next, I hacked together a series of programs in Mathematica that could take graphs, plug a tail on some arbitrary node, calculate the transcendental equation whose solutions correspond to the existence of bound states, calculate the phase shift of scattering states, and then plot the transcendental equations alongside the phase shift, all with manipulable graph parameters.  In this way, I could jiggle the weights on the graph, watch bound states come into and go out of existence, and simultaneously monitor the phase shift.  I’ve posted my current Mathematica notebook here, so you can make sense of the above jibba-jabba and do some explorations of your own.  (WARNING: The documentation is limited to my stream of consciousness as I code.  Also, I’ll probably kill the link when I need the server space, so you might have to email me for updated the notebook if the link is dead.)

This was a really fun approach to research that I hadn’t taken before.

  1. Recognize a hand-wavy possible connection between two quantities.
  2. Investigate (by hand) a few simple cases and try to get a flavor for the relationship.
  3. Investigate (by computer) much more interesting cases and try to pin point the details of the relationship.
  4. Prove the relationship rigorously.

It was also a welcome opportunity to polish off the increasingly dusty programming portions of my brain and to expand my Mathematica repertoire.  (Mathematica can be an incredibly powerful aid to theory work if you take the time to learn it.  Its visualization tools are especially nifty.)

Bound State Zoology
Those interested in taking a peak at my Mathematica notebook will need a quick intro to bound state zoology.  It turns out that there are at least three distinct species of bound states.

  1. Confined bound states – these guys live only on the graph and have zero amplitude on the tail
  2. (Standard) bound states – these guys “leak” onto the tail; that is, they have an exponentially decaying amplitude on the tail
  3. Half-bound states – these bound/scattering chimera have a constant amplitude on the tail and exhibit some characteristics of bound states and some of scattering states

From our investigations, it looks like the first two types contribute one winding of the phase and the third type contributes (go figure) half a winding.

An Unexpected Hiatus or “The First Crime in Waterloo Since Wellington Spanked Napoleon”
Mid-mathematical adventure, we hit a snag.  Given the abundance of Mennonites, smiles, and unlocked doors in Waterloo, I simply assumed that Canada was crime-free and kept my backpack in an unlocked locker while working out at the university gym.  Little did I realize just how far the local citizenry would go to snag copies of my Mathematica notebooks and, alas, my laptop and wallet were stolen.  Unfortunately, I had not made backups of my latest research.  Every setback is of course an opportunity to learn and improve and this was no exception.

The lesson: sync your research to multiple computers and ideally a trustworthy server (Dropbox is my tool of choice for this)

The opportunity to improve: as any programmer knows, code you wrote a week ago always looks painfully clunky.  In rebuilding my Mathematica work from scratch, I was able to integrate plenty of tricks and lessons I’d learned along the way.  The result – a way sexier notebook.

Complex Analysis Boot Camp
In the last week, Andrew and I converged on the general flavor of the version of Levinson’s theorem that we think we can prove on graphs.  Thus, away goes the computer and out comes the physicist’s favorite tools – pen and paper.  In our first stabs at proving the theorem, Andrew and I took completely different approaches.  My caveman approach was to try to adapt a simple trick involving the spectral resolution of the identity that Marcel Wellner used to prove a version of Levinson’s theorem for continuous potentials back in the 1960s.  Andrew’s far more sophisticated and elegant approach was to apply the topological reasoning of complex analysis to our problem.  Since my caveman club began to look a little too primitive mid-way through my proof attempt, I decided to take this opportunity to learn a little complex analysis, one of the (many) major holes in my undergrad math education.

And sweet Feynman has it been a fun last couple days!  I picked up Tristan Needham’s Visual Complex Analysis from the UW library and this book has reminded me why I fell in love with math as a wee lad.  The book’s pedagogical approach is to teach math the way mathematicians actually think about it – visually.  Needham’s book is chock full of nifty pictures of Riemann spheres, conformal mappings, branches, and more.   (I’ll post a review sometime for those interested.)  Complex analysis is an absolutely beautiful subject when couched in geometric terms and is accessible to anyone with a bit of calculus under their belt.

This was also a change of approach to learning for me.  Usually, I get generally interested in a subject, pick a textbook, and read it cover to cover, doing the problems as I go.  This time, however, my explorations in complex analysis were motivated by a very specific application, so I dove right into the middle of the book to extract the particular pieces I needed.  Two great results:

  1. I’m honing in on a proof of our version of Levinson’s theorem using tools from complex analysis.
  2. I became so enamored with complex analysis and Needham’s book in particular that I’ve spent the whole weekend hopping from chapter to chapter learning all sorts of interesting things about the topological properties of complex functions.

I’ve noticed that I seem to learn much more quickly with an application in mind like this.  Further investigations are needed, but this might hint at a more efficient and productive approach to learning for me – first, find a particular problem that requires the tools you’re interested in and then, go off and learn them.  This is fun because every new topic I stumble upon in Needham’s book gives me new insights into the problem I’m working on.

I still want to take a crack at a proof using my original approach as well, as I might be able to get it to work with a little more insight from my adventures in complex analysis.  It would be really cool to prove our theorem using two entirely different approaches, each of which gives a unique insight into the problem.

Tags:   · · · · No Comments.