Service Purpose Training Input Formats Key Hyperparameters Instance Types (Training & Inference)
Linear Learner Regression & Classification RecordIO-wrapped protobuf, CSV balance_multiclass_weights, learning_rate, mini_batch_size, l1, wd Single or multi-machine CPU or GPU (Multi-GPU does not help)
XGBoost Classification & Regression CSV, libsvm, recordIO-protobuf, Parquet subsample, eta, gamma, alpha, lambda, eval_metric, scale_pos_weight, max_depth CPU for multiple instance training, single-instance GPU available
Seq2Seq Sequence-to-Sequence tasks RecordIO-Protobuf batch_size, optimizer_type, learning_rate, num_layers_encoder, num_layers_decoder GPU instance types only (P3 for example)
DeepAR Time Series Forecasting JSON lines format, Gzip or Parquet context_length, epochs, mini_batch_size, learning_rate, num_cells Single or multi-machine CPU or GPU
BlazingText Text Classification, Word2Vec Supervised mode: One sentence per line, Word2Vec: text file with one training sentence per line mode, learning_rate, window_size, vector_dim, negative_samples, epochs, word_ngrams Single CPU or GPU, multiple CPU instances for batch_skipgram
Object2Vec Object embeddings JSON Lines Dropout, early stopping, epochs, learning_rate, batch_size, layers, activation function, optimizer, weight decay, enc1_network, enc2_network Single-machine CPU or GPU (multi-GPU OK)
Object Detection Object Detection in images RecordIO or image format (jpg or png) mini_batch_size, learning_rate, optimizer GPU instances for training (multi-GPU and multi-machine OK), CPU or GPU for inference
Image Classification Image Classification Apache MXNet RecordIO, raw jpg or png, Augmented Manifest Image Format Batch size, learning_rate, optimizer, optimizer-specific parameters GPU instances for training (P2, P3) Multi-GPU and multi-machine OK, CPU or GPU for inference (C4, P2, P3)
Semantic Segmentation Pixel-level object classification JPG Images, PNG annotations, Label maps, Augmented manifest image format Epochs, learning_rate, batch_size, optimizer, algorithm, backbone GPU instances for training (P2 or P3) on a single machine only, Inference on CPU (C5 or M5) or GPU (P2 or P3)
Random Cut Forest Anomaly Detection RecordIO-protobuf, CSV num_trees, num_samples_per_tree M4, C4, C5 (training), ml.c5.xl (inference)
Neural Topic Model Topic Modeling recordIO-protobuf, CSV mini_batch_size, learning_rate, num_topics GPU (training), CPU (inference)
LDA Topic Modeling recordIO-protobuf, CSV num_topics, alpha0 Single-instance CPU (training)
KNN Classification, Regression recordIO-protobuf, CSV k, sample_size CPU or GPU (training), CPU (inference)
K-Means Clustering recordIO-protobuf, CSV k, mini_batch_size, extra_center_factor, init_method CPU or GPU (training)
PCA Dimensionality Reduction recordIO-protobuf, CSV algorithm_mode, subtract_mean GPU or CPU (depending on data specifics)
Factorization Machines Sparse Data, Recommendations recordIO-protobuf (float32) Initialization methods, properties of each method CPU (recommended), GPU (dense data)
IP Insight Anomaly Detection CSV num_entity_vectors, vector_dim, epochs, learning_rate, batch_size GPU (recommended), CPU