tensorflow 14XLA编译器用于JIT加速

概述

XLA（加速线性代数）是用于优化TensorFlow计算的线性代数的域特定编译器。

XLA 利用 JIT 编译技术分析用户在运行时创建的 TensorFlow 图表，根据实际运行时维度和类型将其专门化，将多个运算融合在一起并为它们生成高效的本机代码——适用于 CPU、GPU 之类的设备和自定义加速器（例如，Google 的 TPU）。

目前XLA是实验性的。大多数使用情况在性能（加快速度或减少内存使用）方面都没有改进。

代码示例

代码来自tenorflow源码下的tensorflowexamplestutorialsmnistmnist_softmax_xla.py

这份代码原理和前面几篇博客类似，相通的知识点就不特别说了。

开启JIT编译

在会话级别打开JIT方法如下：

config = tf.ConfigProto() config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1 sess = tf.Session(config=config) 记录元数据和timeline文件

元数据用于记录运行过程的时间和内存消耗。把这些信息导出来，可以保存为timeline文件，用chrome浏览器查看。

run_metadata = tf.RunMetadata() sess = tf.Session(config=config) sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE), run_metadata=run_metadata) trace = timeline.Timeline(step_stats=run_metadata.step_stats)

前面写过博客把元数据写到tensorboard事件日志里了。

这里写到磁盘上timeline文件里。这个文件是jason格式的，可以使用chrome可视化。在chrome浏览器打开"chrome://tracing"，把文件拖到页面上打开，可以看到运行的时间。这个和android用于分析性能的界面类似。

完整代码

我增加了把计算图结构写入tensorboard文件的代码。其它基本未变。

"""Simple MNIST classifier example with JIT XLA and timelines."""from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionimport argparseimport sysimport tensorflow as tffrom tensorflow.examples.tutorials.mnist import input_datafrom tensorflow.python.client import timelineFLAGS = Nonedef main(_): # Import data mnist = input_data.read_data_sets(FLAGS.data_dir) # Create the model x = tf.placeholder(tf.float32, [None, 784]) w = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) y = tf.matmul(x, w) + b # Define loss and optimizer y_ = tf.placeholder(tf.int64, [None]) # The raw formulation of cross-entropy, # # tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.nn.softmax(y)), # reduction_indices=[1])) # # can be numerically unstable. # # So here we use tf.losses.sparse_softmax_cross_entropy on the raw # logit outputs of 'y', and then average across the batch. cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y) train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) config = tf.ConfigProto() jit_level = 0 if FLAGS.xla: # Turns on XLA JIT compilation. jit_level = tf.OptimizerOptions.ON_1 config.graph_options.optimizer_options.global_jit_level = jit_level run_metadata = tf.RunMetadata() sess = tf.Session(config=config) tf.global_variables_initializer().run(session=sess) writer = tf.summary.FileWriter( FLAGS.log_dir + '/train', sess.graph ) writer.close() # Train train_loops = 1000 for i in range(train_loops): batch_xs, batch_ys = mnist.train.next_batch(100) # Create a timeline for the last loop and export to json to view with # chrome://tracing/. if i == train_loops - 1: sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE), run_metadata=run_metadata) trace = timeline.Timeline(step_stats=run_metadata.step_stats) with open('timeline.ctf.json', 'w') as trace_file: trace_file.write(trace.generate_chrome_trace_format()) else: sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) # Test trained model correct_prediction = tf.equal(tf.argmax(y, 1), y_) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})) sess.close()if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument( '--data_dir', type=str, default='./data', help='Directory for storing input data') parser.add_argument( '--xla', type=bool, default=True, help='Turn xla via JIT on') parser.add_argument( '--log_dir', type=str, default='./logs', help='Directory to put the log data.' ) FLAGS, unparsed = parser.parse_known_args() tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) 参考资料

腾讯云的介绍：

XLA Overview（XLA概述）Using JIT Compilation（使用JIT编译）Using AOT compilation（使用AOT编译）

谷歌中国开发者网站介绍：XLA – TensorFlow 编译器