Wednesday, August 28, 2013

Becoming a better scientist (reproducibility edition)

If you follow this blog, you'll know that one of the main themes of my research is data- one of the main use cases for it is reproducibility and transparency in science.I've been attending and speaking at quite a few talking about data sharing, reproducibility and .I've even published [, ] on these topics.



In this context, I've been thinking about my own process as a scientist and whether I'm"". Indeed at theconference in March, I stood up at the end and in front of ~200 people said that I would change my work practice - we have enough tools to really change how we do science. I knew I could do better.




So this post is about doing just that. In general, my research work consists of larger infrastructure projects in collaborations and then smaller workdeveloping experimental prototypes and muckingwith new algorithms. For the former, the projects use all the standard software development stuff (github, jira, wikis) so this gets documented fairly well.



The bit that's not as good as it should be is for the smaller scale things.I think with my co-authors and I do an ok job at publishing theand the associated with our publications -- although this could be improved. (It's too often on our own websites). The major issue I have is that the methods are probably not as reproducible or transparent as they should be - essentially it's a bit messy for other people to figure out exactly what I was up to when doing something new.It's not in one place nor is it clearly documented. It also hurts my process in that a lot of the mucking about I do gets lost or it takes time to find. I see this is as a particular problem as I do more web science research where the gathering cleaning and reanalyzingdata is a critical part of the endeavor.



With that in mind, I've decided to get my act together and follow in the footsteps of the likes of andand do more of my science in a reproducible and open fashion.



To do this, I've decided to adopt as my new note taking environment. This solves the problem of allowing me to try different things out and keep track of all the parts of a project together. Additionally, it lets me "" - that is mix commentary with my code, which is pretty cool.and also contains information about how my system is setup including versions of libraries I'm relying on.



There's still a long way to go to pass(see also ), but I think this is a right step in my direction.



To honor this step, I'm giving $100 to to spread the word about how we can make scholarship better.

Filed under: , , Tagged: , , , ,
Full Post

No comments:

Post a Comment