Late in June, a MATLAB Discussions post by Mike Croucher caught my eye and my interest. It was called, “Why should you share code?” I wanted to reply then, but I was getting ready for a long trip, and so I had to wait until now.
Mike’s posts
This all started with Mike’s blog post, “Do these 3 things to increase the reach of your open source MATLAB toolbox.” In a LinkedIn discussion that followed, someone asked Mike this question:
Could you elaborate on why someone might consider opening/sharing their code? Thinking of early-career researchers, what might be in it for them?"
Mike used his MATLAB Discussions post to share this question with other MATLAB users, give his own answers, and invite others to share their thoughts.
Mike’s answer was insightful and on-point, which was no surprise given Mike’s many years of experience helping scientific researchers with software engineering issues. Mike made these points in favor of code sharing (paraphrased and condensed by me):
- A published research paper is incomplete without the vital details contained in the code.
- Providing code facilitates the use of your research by other researchers, leading to citations and even collaboration.
- Sharing code teaches good software engineering practices, which “makes it more likely that your software will give the right answers.”
Mike elaborated on these points and provided examples for some of them. In response, several MATLAB Discussions readers provided their own thoughtful perspectives. The thread fascinated and resonated with me.
My thoughts
I’d like to elaborate on a couple of Mike’s points, based on my 30 years’ experience as a MathWorks software developer, as well as my prior experience in engineering research in the area of image processing.
First, it has been my long experience a research paper based on a significant software component is unlikely to be reliably reproducible unless the code is provided. Specifically, two independent researchers who attempt to reproduce the paper’s results, using only the information in the published paper, are unlikely to end up seeing exactly the same results. And sometimes those differences can be significant.
Here’s an example from Image Processing Toolbox history. In the late 1990s and early 2000s, I and others on the development team found that all the publicly available implementations for the famous Canny edge detector produced significantly different results from each other. With further investigation, we concluded that the variations were caused by some vague wording and missing details in the 1986 IEEE TrPAMI paper that everyone was citing. We eventually worked out exactly what the toolbox implementation should do, but it took a lot of time, and we had to consult Canny’s Masters thesis for clarification about certain aspects of the TrPAMI paper.
I saw this sort of thing happen so often with published research that I eventually added something about it to my interviewing questions when hiring someone at the Ph.D. level.
Second, successfully sharing your code for use by others greatly improves the likelihood that you will be able to reproduce your own work, should it become necessary. Inexperienced software developers, and sometimes even experienced ones, tend to:
- Discount the need to reproduce one’s own work, sometimes years later
- Overlook or forget about computer and OS dependencies, or data dependencies, or certain pieces of code, that are necessary for complete and accurate reproduction
I was lucky to learn this lesson early in my career, while I was still in graduate school. When I was near the end of writing my doctoral thesis, I found that I needed to modify and rerun some of my experiments from a couple of years earlier. I managed it, but it was unexpectedly challenging, and I was a bit lucky. This is one of the reasons that I started using software version control so early, several years before software development became my career.
Thanks, Mike, for prompting this discussion.
 
  
  
  
  
  
  
