| Linear Learner |
Regression & Classification |
RecordIO-wrapped protobuf, CSV |
balance_multiclass_weights, learning_rate, mini_batch_size, l1, wd |
Single or multi-machine CPU or GPU (Multi-GPU does not help) |
| XGBoost |
Classification & Regression |
CSV, libsvm, recordIO-protobuf, Parquet |
subsample, eta, gamma, alpha, lambda, eval_metric, scale_pos_weight, max_depth |
CPU for multiple instance training, single-instance GPU available |
| Seq2Seq |
Sequence-to-Sequence tasks |
RecordIO-Protobuf |
batch_size, optimizer_type, learning_rate, num_layers_encoder, num_layers_decoder |
GPU instance types only (P3 for example) |
| DeepAR |
Time Series Forecasting |
JSON lines format, Gzip or Parquet |
context_length, epochs, mini_batch_size, learning_rate, num_cells |
Single or multi-machine CPU or GPU |
| BlazingText |
Text Classification, Word2Vec |
Supervised mode: One sentence per line, Word2Vec: text file with one training sentence per line |
mode, learning_rate, window_size, vector_dim, negative_samples, epochs, word_ngrams |
Single CPU or GPU, multiple CPU instances for batch_skipgram |
| Object2Vec |
Object embeddings |
JSON Lines |
Dropout, early stopping, epochs, learning_rate, batch_size, layers, activation function, optimizer, weight decay, enc1_network, enc2_network |
Single-machine CPU or GPU (multi-GPU OK) |
| Object Detection |
Object Detection in images |
RecordIO or image format (jpg or png) |
mini_batch_size, learning_rate, optimizer |
GPU instances for training (multi-GPU and multi-machine OK), CPU or GPU for inference |
| Image Classification |
Image Classification |
Apache MXNet RecordIO, raw jpg or png, Augmented Manifest Image Format |
Batch size, learning_rate, optimizer, optimizer-specific parameters |
GPU instances for training (P2, P3) Multi-GPU and multi-machine OK, CPU or GPU for inference (C4, P2, P3) |
| Semantic Segmentation |
Pixel-level object classification |
JPG Images, PNG annotations, Label maps, Augmented manifest image format |
Epochs, learning_rate, batch_size, optimizer, algorithm, backbone |
GPU instances for training (P2 or P3) on a single machine only, Inference on CPU (C5 or M5) or GPU (P2 or P3) |
| Random Cut Forest |
Anomaly Detection |
RecordIO-protobuf, CSV |
num_trees, num_samples_per_tree |
M4, C4, C5 (training), ml.c5.xl (inference) |
| Neural Topic Model |
Topic Modeling |
recordIO-protobuf, CSV |
mini_batch_size, learning_rate, num_topics |
GPU (training), CPU (inference) |
| LDA |
Topic Modeling |
recordIO-protobuf, CSV |
num_topics, alpha0 |
Single-instance CPU (training) |
| KNN |
Classification, Regression |
recordIO-protobuf, CSV |
k, sample_size |
CPU or GPU (training), CPU (inference) |
| K-Means |
Clustering |
recordIO-protobuf, CSV |
k, mini_batch_size, extra_center_factor, init_method |
CPU or GPU (training) |
| PCA |
Dimensionality Reduction |
recordIO-protobuf, CSV |
algorithm_mode, subtract_mean |
GPU or CPU (depending on data specifics) |
| Factorization Machines |
Sparse Data, Recommendations |
recordIO-protobuf (float32) |
Initialization methods, properties of each method |
CPU (recommended), GPU (dense data) |
| IP Insight |
Anomaly Detection |
CSV |
num_entity_vectors, vector_dim, epochs, learning_rate, batch_size |
GPU (recommended), CPU |