Having leverage and using it for pushing open source adoption

Featured image: JD Weiher | Unsplash (photo)

Back in late August and early September, I attended 4th CP2K Tutorial organized by CECAM in Zürich. I had the pleasure of meeting Joost VandeVondele‘s Nanoscale Simulations group at ETHZ and working with them on improving CP2K. It was both fun and productive; we overhauled the wiki homepage and introduced acronyms page, among other things. During a coffee break, there was a discussion on the JPCL viewpoint that speaks against open source quantum chemistry software, which I countered in the previous blog post.

But there is a story from the workshop which somehow remained untold, and I wanted to tell it at some point. One of the attendants, Valérie Vaissier, told me a how she used a proprietary quantum chemistry software during her PhD; if I recall correctly, it was Gaussian. Eventually she decided to learn CP2K and made the switch. She liked CP2K better than the proprietary software package because it is available free of charge, the reported bugs get fixed quicker, and the group of developers behind it is very enthusiastic about their work and open to outsiders who want to join the development.

She is now a postdoc in Van Voorhis Group at MIT. Interestingly enough, Professor Troy Van Voorhis happens to be one of the scientist behind Q-Chem, a proprietary quantum chemistry software. I am sure most of us in academia, knowing MIT’s reputation and having utmost respect for achievements of MIT’s scientists, don’t imagine having an interview for a position at MIT and acting like “I would like to continue using software I choose and this is my condicio sine qua non“. I am also sure that we are even less likely to imagine interview going like this if the software in the question was a direct competitor (in conventional economic terms) to the software your group leader is developing.

Yet, this is precisely how the interview went in Valérie’s situation. Furthermore, standing up in this way for more openness in existing academic institutions is how the world of science moves towards open source software. This is precisely how you get to use your work time to support the things you believe in.

I am sure there are skeptics. They may say: “Yeah, that might be the right thing to do in the ideal world, but you know, it does not work that way in the real world. Just give up trying, you will never pull it off.” Or maybe: “Do you really want to risk your career for promotion of your ideals?” I heard the former too many times to count, and the latter on a Marie Curie fellowship workshop. For me, there was no question about it: “Yes, of course.” Academia should be about freedom and openness first.

Surely, it can work in practice. There is no either/or relationship here, you really can demand both: excellent science in a highly respected institution and science done using open source software. There is, however, something you need to be in position to set the terms: leverage. If you have done your work, you know your stuff, and someone knows what you can do and finds your skills useful, they might be willing to bend on the software choices in your favor. After all, the group leaders want to hire excellent scientists and want them to be motivated.

For me, standing up in this way is the open source scientific software activism at its finest and I hope to see it being done more and more.

What is the price of open-source fear, uncertainty, and doubt?

Featured image: Lili Popper | Unsplash

The Journal of Physical Chemistry Letters, published by American Chemical Society, recently put out two Viewpoints discussing open source software:

  1. Open Source and Open Data Should Be Standard Practices by J. Daniel Gezelter, and
  2. What Is the Price of Open-Source Software? by Anna I. Krylov, John M. Herbert, Filipp Furche, Martin Head-Gordon, Peter J. Knowles, Roland Lindh, Frederick R. Manby, Peter Pulay, Chris-Kriton Skylaris, and Hans-Joachim Werner.

Viewpoints are not detailed reviews of the topic, but instead present author’s view on the state-of-the-art of a particular field.

The first of two articles stands for open source and open data. The article describes Quantum Chemical Program Exchange (QCPE), which was used in 1980s and 1990s for exchange of quantum chemistry codes between researchers and is roughly equivalent to the modern-day GitHub. The second of two articles questions the open source software development practice, advocating the usage and development of proprietary software. I will dissect and counter some of the key points from the second article below.

Just to be clear: I will not discuss the issues of Open Data and Open Access; they are very important and they deserve a separate post. I will focus solely on the use of free and open source software (FOSS) and proprietary software in computational chemistry research.

Reactions and replies by others

There are reactions to both articles already posted on the Internet. Christoph Jacob replied with a blog post titled How Open are Commercial Scientific Software Packages? Among the rest, he says:

To develop, test, and finally use a new idea, it needs to be implemented in software. Usually, this requires using a lot of well-established tools, such as integral codes, basic methods developed many decades ago, and advanced numerical algorithms. All of these are a prerequisite for new developments, but not “interesting” by itself anymore today. Even though all these tools are well-documented in the scientific literature, recreating them would be a major effort that cannot be repeated every time and by every research group – because both time and funding are limited resources, especially for young researches with rather small groups such as myself.

Therefore, method developers in quantum chemistry need some existing program package as a “development platform”. Both open-source and commercial codes can offer such a platform. Open-source codes have the advantage that there is no barrier to access. Anyone can download the source code and start working on a new method.

I fully agree with this idea and the rest of his post, so I will not repeat it here. What is interesting to note, however, is that Christoph is a contributor to ADF, a proprietary quantum chemistry software.

Another well put reply is posted by Maximilian Kubillus. This part is particularly well put:

A letter about scientific open source software, written by TEN authors that own, work for or are founding members of closed-source software companies, saying that [open source software] could never reach the quality of good closed-source software. It also states that good code review won’t happen in [open source] environments and efficient algorithms can only be developed by professional scientific programmers, using words like cyberinfrastructure without any reference to what they mean here and calling promoters of [open source software] naive without giving a real foundation on why their presented open software models don’t work (in their eyes).

I agree with this post as well and I will not repeat it here.

Dissecting the Viewpoint arguments

All quotations in the following text are copied from the Full Text HTML version of the What Is the Price of Open-Source Software? Viewpoint.

Is open source mostly mandated?

The notion that all scientific software should be open-source and free has been actively promoted in recent years, mostly from the top down via mandates from funding agencies but occasionally from the bottom up, as exemplified by a recent Viewpoint in this journal.

It is true that both funding agencies and individuals promote FOSS. However, authors did not cite any data from which they could conclude that promotion happened mostly in a top down way, and only occasionally in a bottom up way. In fact, I would argue that the opposite is true.

As someone who is involved in the open source community since early 2000s, I am well aware of the efforts that were put by open-source supporters to get governments and funding agencies to understand the importance of open source software and mandate it in regulations. As one example of a monumental effort it took to move public sector to open source software and OpenDocument, see Open source lobbying, a story about writing a national policy of open source in Netherlands, presented at 24th CCC in Berlin, 2007.

To summarize: for any top down mandate of open source to happen, a lot of bottom up efforts are required. These efforts are not trivial and usually take a long time, and a lot of (mostly volunteer) effort.

Does it really matter who has the software development skills?

To bring new ideas to the production level, with software that is accessible to (and useful for) the broader scientific community, contributions from expert programmers are required. These technical tasks usually cannot—and generally should not—be conducted by graduate students or postdocs, who should instead be focused on science and innovation. To this end, Q-Chem employs four scientific programmers.

The notion that a certain percentage of scientists (graduate students, postdocs) do not posses technical skills (i.e. software engineering) required for the development of complex codes makes sense; whether this still applies to most of the scientists studying quantum chemistry today can be discussed. Still, the argument does not imply anywhere that the software these “four scientific programmers” are developing should be proprietary or closed source.

Is selling licenses a sustainable model of software development, let alone the only sustainable model?

Sales revenue cannot support the entire development cost of an academic code, but it contributes critically to its sustainability. The cost that the customer pays for a code like Q-Chem reflects this funding model: it is vastly lower than the development cost, particularly for academic customers but also for industry. It primarily reflects the sustainability cost.

A software like Q-Chem earns money by selling licenses and uses this money to fund programmers who develop it, which is the traditional proprietary software business model. This model is very simple, but has many flaws.

Imagine an academic lab using Q-Chem in their protocols. Suddenly, Q-Chem changes a feature the lab cares about, or the lab’s university buys a particular HPC based on the architecture that is not very common on the market and happens to be unsupported by Q-Chem. Or even worse, imagine the company behind Q-Chem disappearing.

Should any of these scenarios occur, the lab in our story is left with a binary which runs on their current computer system. Since the lab has no access to the source code, they are unable to port the code to the new systems. There is only one provider of the service they need (i.e. the improvement of the software they use): the company behind Q-Chem, which owns the intellectual property rights to Q-Chem code. If the lab then decides to switch to another software, they are met with the unexpected additional license costs from buying another proprietary software. That is not all, however, as the update of lab protocols and retraining of lab scientists takes time and effort.

The lab from our story is in a vendor lock-in situation: they can not easily change the vendor providing support for software they use, because the software is closed source and the property of one and only one vendor. Imagine instead the software used by the lab was FOSS. Suddenly, the vendor providing support and maintenance changed conditions so they no longer suit the requirements of the lab, or the vendor goes bankrupt altogether. Another vendor can easily start working on the source code (since it is available to everyone), and this vendor can start providing support to the lab under any mutually agreed contract.

Is the scientific software equivalent to a sophisticated machine in the physical world?

Nevertheless, the software itself is a product, not a scientific finding, more akin to, say, an NMR spectrometer—a sophisticated instrument—than to the spectra produced by that instrument.

This is true. This is why projects such as Open Source Ecology provide blueprints (source-code equivalent) for industrial machines. The reason why the open source movement succeed first in the domain of software is the low cost of making copies of the software (both the source code and the binaries). The cost of the data transmission and storage became so low with the technology advancements that today the only significant cost in producing new software is the development itself. Basically, once developed, the software can be distributed to any number of users at a very low cost.

Is free as in free beer the same as free as in free speech when it comes to software?

There Is No Free Software.

Nice try. There is free software, there is even the Free Software Foundation. The authors should clearly separate the issues of software cost and software freedom, which they did not. The following text demonstrates this clearly.

Are free and open source software companies and customers naı̈ve?

Gezelter acknowledges the cost of maintaining scientific software and suggests alternative models to defray these costs including selling support, consulting, or an interface, all the while making the source code available for free. These suggestions strike us as naı̈ve, something akin to giving away automobiles but charging for the mechanic who services them. Such a model creates a financial incentive to release a less-than-stellar product into the public domain, then charge to make it useful and usable. It is better to release a top-of-the-line product for a nominal fee.

A free and open source quantum chemistry tool can have a graphical user interface (GUI) which would specifically target common lab protocols in, say, material science. If the vendor makes the GUI proprietary, no functionality of the original software is lost. The GUI just makes the same functionality available in a different way, potentially simpler to use. If you want to use the GUI and save time, you have to pay the license fee. If you want to use the quantum chemistry software without the GUI, you are free to do so, and you can even write your own GUI, and even give it out under a free software license. Your freedom to use the original FOSS tool is preserved.

It is in the interest of the software vendor to make both the software and its GUI as high quality and as easy to maintain as possible, to attract code contributions from the outside. In terms of proprietary software, these contributions are equivalent to getting the development work done for free (i.e. without paying the programmer doing it).

As for this business model being naı̈ve, consider the open source leader, Red Hat, which has a 1.5 billion dollar in revenues per year. As for Fortune Global 500 companies, 100% of airlines, telcos, healthcare companies, commercial banks, and U.S. Executive Departments rely on Red Hat. If you look up the names of the companies, you will find out they are anything but naı̈ve.

Finally, the car analogy the authors use is completely flawed. Red Hat did not score the business from NASA, NYSE, or any other organization by giving crappy software for free and then charging for service fees; had they, they would have been easily overtaken by a competitor and out of business by now. Since all of the Red Hat supported software is free and open source, the potential competitor would just take the source code, improve it, and build support around it. Red Hat would find itself in a situation where they have to understand the changes their competitor is making, and the competitor would find that supporting better code would be easier and much cheaper.

To summarize: Red Hat has to be high-quality as it does not have the luxury of owning the source code, which would automatically exclude the competitors from the market.

Can a researcher decide which free software to support without paying the license?

Is “free” software genuinely free of charge to individual researchers? Consider software developed in the U.S. national laboratories. These ventures are supported by full-time scientific programmers employed specifically for the task, and the cost to support and develop these products is subtracted from the pool of research funding available to the rest of the community. The individual researcher pays for these codes, in a sense, with his rejected grant proposals in times of lean funding. In contrast to using one’s own performance metrics to guide software purchases, within this system, one has no choice in what one pays for. In other words, “free software” is not free for you; the only sense in which it is “free” is that you are freed from making a choice about how to spend your research money.

Research funding comes from public money, and the public should be granted full access to the research results it, in a sense, bought. In particular, this access includes access to the source code of the software developed using public funding. By transferring the ownership of the source code to any company we are basically funding private ventures with public money. Furthermore, we are letting the company that gets the ownership of the source code dictate the terms under which the public will access the results it has already paid for.

The interested companies are free to sell support for the software, additional functionality (such as a GUI) designed for the software, or even their development services (say, an implementation or an integration of a particular feature in the open source way), but the part of the software developed using the public money must remain available to everyone under a FOSS license.

Claiming that the individual researcher who did not receive funding for research paid anything is a flawed argument in any sense. However, an individual researcher has a choice what FOSS he will support: as a group leader, he can assign the implementation of his research requirements in a particular software of his choosing to his students and postdocs, he can do the implementation himself, or he can pay an external contractor to do it for him. Furthermore, he can choose an external contractor among many, which is impossible in case of proprietary software since – again – the company behind the software has exclusive access to the code.

Is saving time worth losing freedom?

Computational chemistry software must balance the needs of two audiences: users, who gauge their productivity based on the speed, functionality, and user-friendliness of a given program; and developers, who may be more concerned with whether the structure “under the hood” provides an environment that fosters innovation and ease of implementation. As a quantitative example, consider that the cost of supporting a postdoctoral associate (salary plus benefits) is perhaps $4,800/month. If the use of well-supported commercial software can save 2 weeks of a postdoc’s time, then this would justify an expense of ≳$2,000 to purchase a software license. This amount exceeds the cost of an academic license for many computational chemistry programs. Given the choice between a free product and a commercial one, a scientist should make a decision based on her own needs and her own criteria for doing innovative research.

This is a sensible argument. However, it does not address the freedom issue already discussed above. The lab that buys the license to use the software depends on a single vendor to maintain the software for them. Furthermore, the lab is not granted the rights to modify the code to fit their needs and to redistribute their modifications to their colleagues. The issue here is not the price of the license, but the freedom that is taken away from the paying user.

What Is “Open Source”?

The term “open source” is ubiquitous but its meaning is ambiguous. Some codes are “free” but are not open, whereas others make the source code available, albeit without binary executables, so that responsibility for compilation and installation is left to the user. Insofar as the use of commercial quantum chemistry software is a mainstay of modern chemical research and teaching, there exists a broad consensus that the commercial model offers the stability and user support that the community desires.

Wikipedia provides a definition for open source that says: “open source as a development model promotes a universal access via a free license to a product’s design or blueprint, and universal redistribution of that design or blueprint, including subsequent improvements to it by anyone”. A simple, clear-cut definition.

The authors again confuse freeware with FOSS, and then talk about requirement to compile and install FOSS from source as an issue, which it simply is not. GNU/Linux distributions such as Debian Science and Fedora Scientific provide ready to use binaries for end users that prefer to avoid compiling software.

Finally, the data supporting “broad consensus that the commercial model offers the stability and user support that the community desires” is lacking (and the term itself is ambiguous); I would honestly like to see relevant market share data of quantum chemistry tools presented and discussed. For illustrative purpose, let’s assume a number of citations as a rough metric for number of users and therefore a consensus on the usage of proprietary vs the usage of FOSS codes. Searching for citations in Web of Science gives 542 articles for Q-Chem 2.0 paper and 1581 articles for Advances in methods paper; on the other hand, CP2K QUICKSTEP paper has 938 citations. Despite CP2K having a lower total number of articles citing the relevant paper then Q-Chem, the number of citations is comparable in the order of magnitude. Since there are other FOSS codes (for example, NWChem and Quantum ESPRESSO) as well as other proprietary codes, this result does not prove much. However, this result questions the “broad consensus” claimed by the authors.

Does being open source imply anyone can modify (and break) main source code repository?

Strict coding guidelines can be enforced within a model where source code access is limited to qualified developers, and this kind of stability offers one counterbalance to the “reproducibility crisis”. To the extent that such a crisis exists, it has occurred in spite of the existence of open-source electronic structure codes such as GAMESS, NWChem, and CP2K.

Strict coding guidelines can be enforced in any project, be it FOSS or proprietary software. The ns-3 network simulator and Linux kernel are both good examples of FOSS projects with strict rules on coding style, API usage, and on not breaking the existing functionality.

The “reproducibility crisis” is two separate issues: being able to run the code someone else had ran previously and having the code produce the same result within tolerance despite changes over time. The first issue is actually better solved by open source software since anyone can access the code, and the second one is unrelated to code being open or proprietary, as described above.

Does a good description of the algorithm make the implementation code unnecessary?

Occasionally the open-source model is touted on the grounds that one can use the source code to learn about the underlying algorithms, but this hardly seems relevant if the methods and algorithms are published in the scientific literature. Source code itself rarely constitutes enjoyable reading, and using source code to learn about an algorithm is a last resort forced by poorly written scientific papers. Better peer review is a more desirable solution.

This is true, and we should also note that having both the source code and its detailed description is an ideal situation: you can study both, you can learn implementation tricks which could easily have been omitted from the description, and you can modify the algorithm without having to reimplement it first.

Is freely available to academics free enough?

A more practical use of openly available source code is to reuse parts of it in other programs, provided that the terms of the software license allow this. Often, they do not. Some ostensibly “open” chemistry codes forbid reuse, or even redistribution.

Here the authors cite ORCA and GAMESS. ORCA and GAMESS are not free and open source software. (They are available to academics free of charge – it might be that this fact was the source of confusion.)

Is viral license the problem for adoption?

 Others, such as CP2K, use the restrictive General Public License that requires any derivative built on the original code to be open-source itself. Variation in design structure from one program to the next also severely hampers transferability, even if the license terms are amenable.

GNU General Public License (GPL) is a viral license, meaning that any code which reuses GPL code must also be licensed under GPL. This way, more and more code becomes FOSS over time. The authors are trying to imply that FOSS quantum chemistry tools are the problem for the quantum chemistry software ecosystem due to the GPL. Such implication is a misunderstanding of how FOSS works. Analogous misunderstanding was presented by Steve Ballmer, back in 2001, who said Linux was “a cancer that attaches itself in an intellectual property sense to everything it touches” due to it being licensed under the GPL. Microsoft’s lost decade followed, and one can argue there is a causation since the companies like Apple, Google, and Facebook gladly reused FOSS when they could, and subsequently contributed their improvements back to the FOSS community.

Is it open enough if the software project is open by invite?

To facilitate innovation by developers, source code needs only to be available to people who intend to build upon it. This is commonly accomplished in the framework of “closed-source” software projects by granting academic groups access to the source code for development purposes.

This is too easy to counter. Contributions to a FOSS project can come from anywhere. A student studying a particular variant of an algorithm wants to implement it and contribute it back. A professor trying different algorithms and contributing the best one. With so much FOSS out there, the potential contributor is not going to bother with software source code that is behind an NDA, an email form, or an invitation of some kind. Such potential contributor is unlikely to give his freedom away and sign his code away for the benefits of a particular private venture.

Will open source destroy proprietary software?

What would the impact be on computational chemistry of destroying other teamware projects such as Molpro, Turbomole, Jaguar, Molcas, PQS, or ONETEP, in the interest of satisfying some “open-source” mandate?

I fail to see how the open source mandate per se destroys any proprietary software. Namely, all these proprietary software projects have the option to open source their code and change their business model to compete based on quality, not code ownership. Alternatively, if they desire to continue to maintain the present development practices, they are still free to find other sources of income.

Is proprietary software more optimized?

[Open source mandate] would, in our view, detract from the merit-based review process. When evaluating grant proposals that involve software development, the questions to be asked should be:
1. What will be the quality of the software in terms of the new science that it enables, either on the applications side or on the development side?
2. How will the software foster productivity? For example, how computationally efficient is it for a given task? How usable will the software be, and how quickly will other scientists be able to learn to use it for their own research?
A rigid, mindless focus on an open-source mantra is a distraction from these more important criteria. It can even be an excuse to ignore them, and creates an uneven playing field in which developers who prefer to work with a commercial platform are put at a disadvantage and potentially forced to adopt less efficient practices.

Quality argument presented in the first point has already been addressed above. The second point does not say much without the accompanying benchmarks that would support the idea that, when both are implementing the same method, the proprietary software is computationally more efficient than FOSS. However, there is a report on CP2K performance from Bethune, Reid, and Lazzaro showing the improvement of computational performance over time. These measurements only prove that there is a FOSS project that specifically cares about computational efficiency, but it does not say anything in absolute terms.

Does open source force a scientist to open everything straight away?

Open-source requirements potentially force a scientist to choose between pursuing a funding opportunity versus implementing an idea in the quickest, most efficient, and highest-impact way. A strictly open-source environment may furthermore disincentivize young researchers to make new code available right away, lest their ability to publish papers be short-circuited by a more senior researcher with an army of postdocs poised to take advantage of any new code.

Using open source software under GPL version 2 or later allows a researcher to make private changes and never release them to the public. Namely, GPL version 2 only mandates that a release of the software in binary form be accompanied by the release of the matching source code. Therefore, if anything, the scientist has more options how (and if) to release the code, not less.

As for “army of postdocs jumping on any new code”, I see many more advantages than disadvantages of this particular situation. Namely, since nobody can claim the authorship of anyone else’s code, one can use this heightened interest in the new code to explain to other scientists the research he is doing, and open opportunities for collaboration.

Is orphaned code more common in open source than in proprietary software?

This would contribute directly to the scenario that Gezelter wishes to avoid, namely, one where students leave behind “orphaned” code that will never be incorporated into mainstream, production-level software. Viewed in these terms, an open-source mandate degrades, rather than enhances, cyberinfrastructure.

If the students were developing proprietary instead of open source software, their “orphaned” code would automatically not be available to other researchers for further development. Whether or not any code will be incorporated in production-level software depends on the code quality, its usefulness, and community interest.

Are software freedom and software quality competing features?

How should the impact of software be measured? Scientific publications are a more sound metric than either the price of a product or whether its source code is available in the public domain. Software is meant to serve scientific research, in the same way that any other scientific instrument is intended. As such, the question should not be whether software is free or open source, but rather, what new science can be accomplished with it?

True, scientific publications are one possible way to measure the impact of software. However, open source software is certainly not released in the public domain. As for the question, why not require both the quality and the freedom from software? Are really these two requirements competing against each other?

Is software freedom a political rhetoric?

Let us not allow political rhetoric to dictate how we are to do science. Let different ideas and different models (including open source!) compete freely and flourish, and let the community focus instead on the most important metric of all: what is good for scientific discovery.

The issue of software freedom is both an ethical issue and a practical one, as described above, so it is hardly a “political rhetoric”. I would propose instead that we let the community choose the software to use, based on both freedom and quality. While at it, we should stand firm on the requirement that the publicly-funded scientific software development results in free and open source software. Whether the proprietary vendors will be willing to adapt their business models is ultimately their choice.

Conflict of interest statement

I am a contributor to CP2K and GROMACS open source molecular dynamics software packages. So far, I have attended two CP2K developer meetings, one remotely and one being physically present in Zürich. For my contributions in general and for these attendances in particular, I have not received any monetary compensation from ETH Zürich, University of Zürich, or any other party involved in CP2K development.

Joys and pains of interdisciplinary research

Featured image: Leo Rivas-Micoud | Unsplash

In 2012 University of Rijeka became NVIDIA GPU Education Center (back then it was called CUDA Teaching Center). For non-techies: NVIDIA is a company producing graphical processors (GPUs), the computer chips that draw 3D graphics in games and the effects in modern movies. In the last couple of years, NVIDIA and other manufacturers allowed the usage of GPUs for general computations, so one can use them to do really fast multiplication of large matrices, finding paths in graphs, and other mathematical operations.

Partnership with NVIDIA

To become a GPU Education Center, NVIDIA required us to have at least one recurring course in the curriculum and also hold regular workshops. In return, we got the GPUs to work with. Aside from allowing us to teach, having this hardware gave us an opportunity to initiate research projects using GPU computing. If we are successful in research,  we can take the next step and become a GPU Research Center, and hopefully end up being GPU Center of Excellence at some point. Either of these would give us access to special events, pre-release hardware, special pricing, etc.

Nvidia headquarters on San Tomas Expressway
Nvidia headquarters on San Tomas Expressway. We might end up visiting this location at some point, pretty awesome ain’t it? (Image source: Wikimedia Commons.)

Roughly a year later, in September 2013, we had the Researchers night in Rijeka. The goal was to get researchers from various disciplines to showcase their work, and potentially find collaborators or options for joint projects. I came there to find scientist interested in applying computation in their research, ideally using GPU computing. I was inspired by Assistant Professor Željko Svedružić‘s enthusiasm, and saw the potential for collaboration. A bit later I joined BioSFLab to do research work in computational chemistry in my spare time. At that point I had a PhD thesis to finish and there was little time to do other things.

Dipping toes in computational chemistry

However, computational chemistry seemed worth gambling my spare time, due to a number of resons. First, I had the hardware that would eventually be obsolete, used or unused; second, there were open source software computational chemistry packages which I could contribute to; third, I wanted to move the GPU Education Center closer to becoming the GPU Research Center. Very soon Patrik Nikolić and I were in the lab AM to PM, five to six days a week. GROMACS was running day and night, and we were juggling visualizations in VMD, Chimera, Avogadro, and Marvin (occasionally we hated each of these packages). At some point, we also figured out how to do “simple” quantum mechanics calculations in NWChem and CP2K.

Ethene orbitals
Some orbitals of ethene, computed by CP2K, and visualized by VMD.

Ṛegardless of the extra work, the experience was very rewarding due to a number of things. First, both GROMACS and CP2K are meant to run on Linux. A biochemist might or might not have experience with compiling Linux software and linking it with GPU compute libraries such as NVIDIA CUDA; however, a biochemist does not want to be blocked by taking time to do these things. A computer scientist, on the other hand, is used to working with different operating systems and software. Software, and specifically scientific software, is what you do as a computer scientist. In my particular case, this experties includes both Linux and CUDA. Suddenly, the research group I was a part of started to iterate very fast since all of us did not have to learn the others domain to move forward.

The exchange of knowledge

Second, the knowledge is flowing both ways. After a couple of months, Patrik was using Linux as his primary OS, and I had no problem reading through Professor Svedružić’s copy of Lehninger Principles of Biochemistry. With each new method (e.g. molecular dynamics or nudged elastic band) we exchanged more knowledge. “Let’s try to plot this using Gnuplot” from my side was met with “why don’t we try Diels-Alder reaction” from Patrik’s. Eventually, I could assess approximations of forces resulting from different force fields as good or bad, and Patrik benchmarked GROMACS on one or more GPUs to decide how to run it.

GPUs on motherboard
Multiple GPUs on a single motherboard. GROMACS benefits from using two GPUs for calculation instead of one, kudos to developers on making that possible. We would test with three GPUs, but our most powerful systems have “only” two. (Image source: GBPublic_PR Flickr.)

There is a number of downsides as well. Instead of taking time to expand my horizons, I could have just followed the (un)written rules and take my time to work on projects that will result in papers strictly in field of computer science, because these count. I could have explored opportunities to squeeze more papers by re-exploiting my previous research work. This would enable me to avoid learning to use new software or to postpone developing new features in existing software packages. I could have done either, but I did not because I believed and still believe there are many more productive ways to use my time. (Just to be clear: we did create a publication resulting from this work. Namely, a book chapter written by our group will appear in a book by Elsevier in 2016.)

Formal critera for professorship in support of individual passion and creativity

Present classification of areas of sciences, engineering, biomedicine, biotechnology, social sciences, humanities, and arts in Croatia recognizes interdisciplinary fields of science, but only a handful of them. However, the minimal criteria for professorship in Croatia recognizes interdisciplinary papers only in sciences, biotechnology, and humanities. I am well aware it is hard to write precise criteria about a myriad of possible interdisciplinary combinations of different fields. But I am also aware that having such criteria would expand the amount of possibilities one has to get professorship, and in turn motivate more researchers to look into their options.

I might be an idealist, writing all this. I don’t expect to motivate anyone to do the same; people have very different motivations for doing the work they do. Regardless, I have sort of an addiction to epic quotes, so here is one from Ralph Waldo Emerson.

Every revolution was first a thought in one man’s mind; and when the same thought occurs to another man, it is the key to that era.

That is, we are not the only group in Croatia combining life sciences and computer science. I’m very happy to say that Mile Šikić from University of Zagreb Faculty of Electrical Engineering and Computing, working in area of computer science, has a number of papers in field of bionformatics (look for papers published in Nuclelic Acids Research). Do these papers count for professorship? I have no idea, I guess we will find out eventually, but I doubt that getting counts up was the primary motivation for writing those papers.

Open source magic all around the world

Featured image: Anders Jildén | Unsplash

Last week brought us two interesting events related to open source movement: 2015 Red Hat Summit (June 23-26, Boston, MA) and Skeptics in the pub (June 26, Rijeka, Croatia).

2015 Red Hat Summit

Red Hat provided live streaming of keynotes (kudos to them); Domagoj, Luka and I watched the one from Craig Muzilla where they announced partnership with Samsung. We made stupid jokes by substituting FICO (a company name) for fićo (a car that is legendary in the Balkans due to its price and popularity). Jim Whitehurst was so inspired he almost did not want to speak, but luckily spoke nonetheless. The interesting part was where he spoke how the predicted economics of information revolution is already coming true.

opensource.com cat
Cats are always welcome. Open source cats even more. (Image source: opensource.com Flickr.)

Paul Cormier continued on Jim Whitehurst in terms of showing how predictions come true; his keynote starts with the story how Microsoft and VMware changed their attitude towards Linux and virtualization. He also presents a study (starting at 1:45) showing that only Linux and Windows remain operating in the datacenter, and also that Windows is falling in market share, while Linux is rising. This is great news; let’s hope it inspires Microsoft to learn from Red Hat how to be more open. Finally, Marco Bill-Peter is presenting advances in customer support (in a heavy German accent).

Skeptics in the pub

Watching streaming events is cool, but having them in your city is even cooler. I was invited to speak at the local Skeptics in the pub on whether the proprietary technologies are dying. Aside from being happy to speak about the open source in public, I was also happy to speak about the computer science more broadly. Too often people make the mistake thinking that the computer science researchers look for the better ways to repair iPhones, clean up printers, and reinstall Windows. Well, in some way we do, but we don’t really care about those particular products and aren’t really focused on how to root the latest generation of Samsung phones, or clean up some nasty virus that is spreading.

That isn’t to say that any of these things should be undervalued, as most of them aren’t trivial. It’s just that our research work approaches technology in a much broader way, and (somewhat counter-intuitively) solves very specific problems. For example, one such overview would be to look at the historical development of Internet protocols at Berkeley in the 80’s and later; one such problem would be the implementation of Multipath TCP on Linux or Android.

As usual, the presentation was recorded, so it will appear on their YouTube channel at some point.

Luna Morado and I
Luna Morado introducing me as a speaker. (Photo by Igor Kadum.)

Short version of the presentation is: computers are not just PCs anymore, but a range of devices. Non-proprietary software vendors recognized this expansion first, so open technologies are leading the way in many new areas (e.g. mobile phones and tablets), and have also taken the lead in some of the more traditional ones (e.g. infrastructure, web browsers). The “moving to the open source” trend  is very obvious, at least in software. However, even in software, the proprietary vendors are far from being dead yet. Regardless, there are two aspects of the move towards the open source technologies that make me particularly happy.

First aspect is that, with open technologies taking the lead, we can finally provide everyone with the highly sophisticated tools for education. Imagine a kid downloading Eclipse (for free) to learn programming in Java, C, or C++; one does get to experience the real world development environment that developers all around the world use on a daily basis. Imagine if, instead of the fully functional Eclipse integrated development environment, the kid got some kind of a stripped-down toy software or a functionally limited demo version of a software. The kid’s potential for learning would be severely restricted by the software limitations.  (This is not a new idea, most people who cheer open source have been using this in open source advocacy for many years. I was very happy to hear Thomas Cameron use the same argument at 2015 Red Hat Summit welcome speech.)

Second aspect is that LEGO blocks of the software world are starting to emerge in a way, especially considering the container technologies. Again, imagine a kid wanting to run a web, mail, or database server at his home. Want to set it up all by yourself from scratch? There are hundreds of guides available. Want to see how it works when it works? Use a container, or a pre-built virtual appliance image, and spin a server in minutes. Then play with it until it breaks to see how to break it and how to fix it, preferably without starting from scratch. But even when you have to start from scratch, rebuilding a working operating system takes minutes (if not seconds), not tens of minutes or hours.

When I was learning my way around Linux in the 00’s, virtualization was not yet widespread. So, whenever you broke something beyond repair, you had to reinstall your Linux distribution or restore it from a disk/filesystem image of some kind. If you dual-booted Linux and Windows, you could destroy the Windows installation if you did not know what you were doing. Finally, prior to the Ubuntu and subsequent Fedora Live media, installation used to take longer than it does today. And if you think further and consider the geeks who grew up in the 90’s, it’s easy to see that they had even less ease of use in open source software available to them. Yet, both 00’s and 90’s generations of geeks are creating awesome things in the world of the open source software today.

OK, enough talk, where is the code?

I was so inspired from all this, so I got down to coding. I made a small contribution to systemd (which made my OCD hurt less by wrapping the output text in the terminal better) and a bit larger one to CP2K (which crossed off an item from dev:todo list). On the ns-3 front, we have just finished the Google Summer of Code 2015 midterm reviews, with all our students passing. Good times ahead.