explain my data: Big speedup for training Random Forests in scikit-learn 0.15

Wednesday, March 5, 2014

Big speedup for training Random Forests in scikit-learn 0.15

Until recently, wiseRF was the obviously fastest Random Forest implementation for Python (and thus, the best library for dealing with larger in-memory datasets). Though scikit-learn has had tree ensembles for the past several years, their performance was typically at least an order of magnitude worse than wiseRF (a boon to wiseRF's marketing team). The sklearn developers seemed to shake off their tree-building sluggishness with a Cython rewrite in the 0.14 release.

Unfortunately, as Yisheng and I discovered while working on CudaTree, even the faster Cython tree builder can still be significantly slower than wiseRF. Why is there still a performance gap when both libraries now use native implementations? wiseRF is probably doing something smarter with their choice of algorithms and/or data layout but the iron curtain of closed source software keeps us from finding out exactly what's going on.

It turns out that one important choice for building trees efficiently is the algorithm used to sort candidate splitting thresholds. The upcoming 0.15 release of scikit-learn will include some cache-friendly changes to how their algorithm sorts data. These modifications seem to have finally closed the gap with wiseRF.

Below are the benchmark times from the CudaTree paper, with the current branch of scikit-learn included under the label scikit-learn 0.15. The takeaway is that the new release will build Random Forests 2x-6x faster than the old one and that the performance differences between scikit-learn, wiseRF, and CudaTree are not significant.

Training times for 100 trees grown on a 6-core Xeon E5-2630 machine with an NVIDIA Titan graphics card:

Dataset	wiseRF 1.5.11	scikit-learn 0.14	scikit-learn 0.15	CudaTree 0.6
ImageNet subset	23s	50s	13s	25s
CIFAR-100 (raw)	160s	502s	181s	197s
covertype	107s	463s	73s	67s
poker	117s	415s	99s	59s
PAMAP2	1,066s	7,630s	1,683s	934s
intrusion	667s	1,528s	241s	199s

Information about the datasets used above:

Name	Features	Samples	Classes	Description
ImageNet subset	4,096	10,000	10	Random subset of 10 labels from the 1000 category ImageNet data set, processed by the convolutional filters of a trained convolutional neural network (amazingly attains same accuracy!)
CIFAR-100	3,072	50k	100	Same as CIFAR-10, but with more samples and more labels.
covertype	57	581k	7	Identify tree cover from domain-specific features.
poker	11	1M	10	Poker hands
PAMAP2	52	2.87M	13	Physical activity monitoring
intrusion	41	5M	24	Network intrusion

6 comments:

rdMarch 5, 2014 at 8:42 PM
This comment has been removed by the author.
ReplyDelete
Replies
rdMarch 5, 2014 at 8:47 PM
I can confirm the speedup at least in the small case of MNIST digits dataset.
http://rexdouglass.com/fastest-random-forest-sklearn/

Two anecdotal observations I haven't had time to properly test yet
1) Sklearn's memory footprint tends to higher
2) The training time for wiseRF is lower for data with both many features and many classes.

Using the development version of Sklearn from a few weeks ago, I haven't been able to completely switch from wiseRF for some of my applications but I'm optimistic.
ReplyDelete
Replies
UnknownMarch 20, 2014 at 2:44 AM
You may be interested in a library I just wrote for improving the performance of regression tree/ensemble evaluation for scikit-learn at https://github.com/ajtulloch/sklearn-compiledtrees/.

Benchmarks indicate a 5x-8x improvement in prediction speed.
ReplyDelete
Replies
AnonymousMarch 31, 2014 at 6:48 AM
Hi,

It may not be the right forum to ask this cudatree question, but I am a bit stuck.
I am quite a newbee towards using CUDA. I was trying the library cudatree and managed to install in windows with pyduca. I tried the example,

x_train, y_train = load_data("iris")
forest = RandomForestClassifier(n_estimators = 50, verbose = True, debug =True, bootstrap = False)
forest.fit(x_train, y_train, bfs_threshold = 4196)
forest.predict(x_train)

but got an exception.

Traceback (most recent call last):
File "estimate_threshold.py", line 50, in
rf.fit(x[:100],y[:100])
File "D:\data\CudaTree-master\cudatree\random_forest.py", line 330, in fit
tree.fit(self.samples, self.target, si, n_samples)
File "D:\data\CudaTree-master\cudatree\random_tree.py", line 468, in fit
self.__compile_kernels()
File "D:\data\CudaTree-master\cudatree\random_tree.py", line 242, in __compile
_kernels
cuda.memcpy_htod(const_sorted_indices_, np.uint64(self.sorted_indices_gpu_.p
tr))
pycuda._driver.LogicError: cuMemcpyHtoD failed: invalid value

Any pointers please.
ReplyDelete
Replies
al3ab banat01September 8, 2016 at 11:11 AM
Very nice. Posts shared useful information and meaningful life, I'm glad to be reading this article and hope to soon learn the next article. thank you
العاب بنات
العاب طبخ
ReplyDelete
Replies

Add comment