Static compilation

leoalvar · September 22, 2021, 5:30pm

There is a posibility of generating static compilation of programas with gcc or gfortran for guaranteing the portability of programs to any Linux distribution?

FeRDNYC · October 1, 2021, 7:35am

Welcome, @leoalvar !

The answer is, yes… but also no. To a certain extent, it’s possible — but there are some pretty big conditions and caveats. It’s both more difficult and less effective, in practice, than it might seem at first glance. Rarely does it turn out to be what you really want to do.

In order to have a fully static binary, you not only have to compile the program statically, but you need to have compiled all of its dependencies statically. And all of their dependencies. And all of their dependencies’ dependencies. And so on, and so on…

And once you’ve done that, you’ll have a giant binary that will load all of those dependencies into memory all at once when you run it, instead of (as is the case with shared libraries) only loading what’s required.

Plus, there are limits to how far you can take it:

If your application is a graphical program, it’s possible (though not especially recommended by most of these library projects) to statically link it with libjpeg, libpng, zlib, the ffmpeg libraries, etc. Presumably Gtk+ too, though I’m not sure.
It’s even possible to create static builds of Qt. (Although that’s really targeted almost exclusively towards embedded/automotive applications, and not something anyone should be considering for a desktop Linux application.)
But you can’t bring, say, your own video card drivers, because if the ones linked in your binary don’t match the version running on the system the hardware will reject any attempts to access it.
And you should NEVER statically link in security libraries like libcrypto or OpenSSL. Because when the next Heartbleed is discovered and everyone is scrambling to update their libraries with the patched version that mitigates that vulnerability, your statically-linked binary with the old version lurking inside suddenly becomes an attack vector, and a threat to any system it’s running on.

The typical statically-linked binary is a small, self-contained command line tool or service worker that has a very focused, limited job, and benefits from (or requires) not having any dependencies. I can give you one obvious example, the one binary that’s guaranteed to be statically linked on every Linux system out there: ldconfig. The tool that manages the system’s shared libraries CANNOT have any shared-library dependencies, because it’s the tool you use to fix things when you’ve accidentally deleted a crucial symlink in /usr/lib and nothing that depends on libc.so will run!

For more traditional, graphical desktop applications, rather than statically linked binaries the trend in system- and distro-agnostic application distribution these days is to still use shared libraries, but package all of the libraries your application needs right in with it, so that it’s a self-contained bundle of binaries and shared lib dependencies.

The AppImage format is one such packaging standard (one of the earliest), it attempts to bundle applications and their shared-library dependencies into a compressed filesystem archive that gets mounted into /tmp/ on launch. The binary loads its shared libraries from that filesystem by default, although it still has full access to the system libraries as well for the things that can’t be packaged (libc, OpenSSL, the Nvidia, AMD, or Intel display drivers, etc…)

AppImage… works, for the most part. It’s not quite the panacea the designers present it as, for the simple reason that the notion of a “baseline linux system” (the set of libraries that can be assumed to be available on the host and shouldn’t be packaged into the AppImage) isn’t nearly as realistic as they present it. In theory, the advice “Build on the oldest system you support, so that your application has the lowest possible requirements which any newer system should be able to satisfy” sounds good. But in practice, Linux systems offer neither forwards nor backwards compatibility beyond a year or two’s time, at most. If you build on a Ubuntu 18.04 machine and package that binary into an AppImage, it’ll run on any Ubuntu 18.04 system you take it to, sure. It’ll also likely run on any CentOS/RHEL 6 system, or Fedora 28 system.

But try to run it on a Debian 11, Ubuntu 21.10, or Fedora 35 system (all current releases or due out before the end of this year), and chances are you’ll discover that the libraries installed there are no longer backwards-compatible with a binary that has three-year-old dependencies baked into it.

And of course, if you build on Ubuntu 21.10, or Fedora 35, then you won’t be able to run your package on any older version of any release, because you’ll have libraries packaged with it that require brand new APIs that none of those previous releases support. It becomes a balancing act, and not a particularly thrilling one to perform or observe.
Both Snap and Flatpak are competing attempts to change that situation, by using containerized sandbox environments to run packaged applications in isolation, cordoned off from the host system and its dependencies. Applications packaged in snaps or flatpaks are provided with resources managed by the container system and accessed through explicitly defined APIs, they don’t directly interface with the host system. That way, the ambiguities regarding what’s available or supported on the host system are minimized — the runtime environment provided by the container system normalizes all of those disparate host resources. Security is also greatly increased, since each application by default even has to ask permission to access local resources like the host filesystem, any hardware devices, or the network.

leoalvar · October 1, 2021, 2:35pm

Dear colleague:
Thank you very much for your answer. My problem seems to be the simplest case. I have written a lot of specific programs for my speciallity (seismology) in FORTRAN (gfortran), that run in command line and don’t use any graphical call. All graphics I do with gnuplot (and maps with GMT) while my programs create input files for them. I would like to run that programs in any Linux machine without recompiling them each time.
Thanks again
Leonardo

Akito · October 1, 2021, 2:57pm

That won’t work without a VM. You have to recompile it for every architecture. You can see an example of how different architectures are incompatible in the way you would like them to be compatible:

https://www.debian.org/mirror/list

When you scroll down, you see the architecture each mirror does or does not support. That’s the case, because you can’t compile binaries for any platform, except your code is compiled in a way that allows it to be run in a VM.
The most famous example of a language having exactly that premise is Java.

Write once, run anywhere.

Alternatively, you could run them in Docker, which would make them more or less architecture independent.

You could just create a C/D pipeline where everything gets compiled into various CPU architectures, automatically, for example, on every pushed commit.

All this is doable and I would recommend choosing one of these paths instead of chasing something that simply isn’t possible in the way described.

leoalvar · October 1, 2021, 11:04pm

Dear colleague:
Thank you very much for your explanation.
Whith my best regards
Leonardo

FeRDNYC · October 2, 2021, 3:54am

Docker is definitely a promising avenue to explore, for scientific computations in particular. You could create a container image that includes your compiled program and any shared libraries it uses, and then it’ll be runnable anywhere you can download that container image.

Researchers running scientific computations are one of the major audiences for container-based computing, because it’s also a great way to scale up across multiple heterogeneous compute servers. You just build your docker image, upload it to a hub (either Docker Hub or, if the code is sensitive/proprietary, a private registry), and then pretty much anywhere you can connect to that registry, it’s just a couple of commands to download the image and create a container runtime for your program.

(If the image is stored in a public registry, it’s pretty trivial to run container jobs on a cloud compute platform like AWS or Azure, devoting as many or as few processors as you need (and want to pay for) based on the size and scalability of the task.)