So to be honest, 90% of the time the base of the logarithm doesn’t really matter as long as we are consistent. The main property we use logarithms for is that log_b(xy) = log_b(x) + log_b(y), and this holds for any base b. In fact, the change-of-base formula tells us that we can get from one base to another just by multiplying by a constant (log_a(x) = log_b(x) * 1/log_b(a)), and so there is a strong desire to pick one canonical “logarithm” function, and just take care of any base silliness by multiplying your final result by a scaling factor if needed.
Given that, the natural logarithm is quite “natural” because it is the inverse of the exponential function, exp(x) = e^x. The exponential function itself is quite natural as it is the unique function f such that f(0) = 1 and f’(x) = f(x). Really, I would argue that the function exp(x) is the fundamentally important mathematical object – the natural logarithm is important because it is that function’s inverse, and the number e just happens to be the value of exp(1).
That isn’t really the case; while many neural network implementations make nondeterministic optimizations, floating point arithmetic is in principle entirely deterministic, and it isn’t too hard to get a neural network to run deterministically if needed. They are perfectly applicable for lossless compression, which is what is done in this article.