There is a Mercedes-Benz Greener Manufacturing competition hosted on Kaggle. Data size is small and relatively simple, so it fits well as a quick weekend diversion.
As usual, before modeling the data, pre-processing is required. In this case, the categorical variables need to be one-hot encoded.
In the cross-validation part, I am using 5-fold.
One advantage of the library is that if you provide validation data set, it will incrementally print out metric evaluated by current model on validation set, so it will be easy to spot issues like over-fitting and under-fitting at runtime.
In the code below I am mainly tuning subsample=0.8 and lambda=10 to avoid over-fitting. Meanwhile it would be better to use a grid search for parameters tuning.
I have also tried neural net with Keras, but found that it would generally be unstable in deep neural network. My guess is that the data size is too small for deep neural network with too many parameters.