Java performance

Early JVM implementations were interpreters; they simulated the virtual operations one-by-one rather than translating them into machine code for direct hardware execution.

Thus, any Java performance test or comparison has to always report the version, vendor, OS and hardware architecture of the used JVM.

Using this framework, the Java virtual machine continually analyses program performance for hot spots which are executed frequently or repeatedly.

These are then targeted for optimizing, leading to high performance execution with a minimum of overhead for less performance-critical code.

[9][10] Adaptive optimizing is a method in computer science that performs dynamic recompilation of parts of a program based on the current execution profile.

With a simple implementation, an adaptive optimizer may simply make a trade-off between just-in-time compiling and interpreting instructions.

This allows performing aggressive (and potentially unsafe) optimizations, while still being able to later deoptimize the code and fall back to a safe path.

[11][12] The 1.0 and 1.1 Java virtual machines (JVMs) used a mark-sweep collector, which could fragment the heap after a garbage collection.

It splits the verification of Java bytecode in two phases:[15] In practice this method works by capturing knowledge that the Java compiler has of class flow and annotating the compiled method bytecodes with a synopsis of the class flow information.

Other compilers almost always target a specific hardware and software platform, producing machine code that will stay virtually unchanged during execution[citation needed].

Very different and hard-to-compare scenarios arise from these two different approaches: static vs. dynamic compilations and recompilations, the availability of precise information about the runtime environment and others.

[42] While it is not specified how the data was measured (for example if the original Quake II executable compiled in 1997 was used, which may be considered bad as current C compilers may achieve better optimizations for Quake), it notes how the same Java source code can have a huge speed boost just by updating the VM, something impossible to achieve with a 100% static approach.

[43] At the other extreme, an academic benchmark performed in 2012 with a 3D modelling algorithm showed the Java 6 JVM being from 1.09 to 1.91 times slower than C++ under Windows.

[56] Automatic memory management in Java allows for efficient use of lockless and immutable data structures that are extremely hard or sometimes impossible to implement without some kind of a garbage collection.

[citation needed] Java offers a number of such high-level structures in its standard library in the java.util.concurrent package, while many languages historically used for high performance systems like C or C++ are still lacking them.

In November 2004, Nailgun, a "client, protocol, and server for running Java programs from the command line without incurring the JVM startup overhead" was publicly released.

Scripts where per-application JVM startup dominates resource use, see one to two order of magnitude runtime performance improvements.

For programs in which memory is a critical factor for choosing between languages and runtime environments, a cost/benefit analysis is needed.

Performance of trigonometric functions is bad compared to C, because Java has strict specifications for the results of mathematical operations, which may not correspond to the underlying hardware implementation.

[65] On the x87 floating point subset, Java since 1.4 does argument reduction for sin and cos in software,[66] causing a big performance hit for values outside the range.

However, benchmarks comparing the performance of Swing versus the Standard Widget Toolkit, which delegates the rendering to the native GUI libraries of the operating system, show no clear winner, and the results greatly depend on the context and the environments.

In 2008,[74] and 2009,[75][76] an Apache Hadoop (an open-source high performance computing project written in Java) based cluster was able to sort a terabyte and petabyte of integers the fastest.