A clear, practical walkthrough of the ML ideas and AWS services that show up again and again on AIF-C01. Built for quick review, not for reading like a wall of notes.
Before any topic, fix this mental model in your head:
Every exam question is testing one or more of these four layers. When you read a question, mentally walk down this ladder.
The "type" answers one question: What kind of data did the model learn from?
The intuition: Imagine teaching a child what a cat is by showing them 1,000 photos labeled "cat" and 1,000 photos labeled "not cat." The labels are the supervision. The child (model) learns to map photo β label.
Mechanics:
input β outputTwo flavors of supervised learning:
| Flavor | Output | Example |
|---|---|---|
| Classification | A category | Spam or not spam |
| Regression | A number | Predicted house price |
Real examples:
The intuition: Same child, but now you dump a pile of photos in front of them with no labels and say: "Sort these into groups however you want." The child might separate by color, by animal type, by background. They find structure that was already there but hidden.
Mechanics:
Real examples:
The intuition: Think of training a dog. The dog tries something β you give a treat (reward) or say "no" (penalty). Over many trials, the dog learns which actions lead to treats. There's no labeled dataset β there's a goal and feedback.
Mechanics:
Real examples:
The intuition: You have 1,000 medical X-rays carefully labeled by a radiologist (very expensive!) and 100,000 unlabeled X-rays sitting in a database. Throwing away the unlabeled ones is wasteful. Semi-supervised learning uses both.
Mechanics:
Real examples:
The "problem type" answers: What shape is the output?
This is different from "type of ML." A supervised problem can be classification or regression β those are different problem types.
Output is discrete (a finite set of choices).
| Input | Output |
|---|---|
| Email text | Spam / Not spam |
| Transaction | Fraud / Legitimate |
| X-ray image | Disease / No disease / Inconclusive |
Output is continuous (a number on a scale).
| Input | Output |
|---|---|
| House details | Price ($427,500) |
| Last 30 days of sales | Revenue forecast ($1.2M) |
| Weather data | Energy demand (450 MWh) |
Key distinction from classification: "Will this customer churn?" = classification. "How many days until they churn?" = regression.
No labels exist. Model finds groups.
Examples:
Find rare events that don't match the normal pattern.
Examples:
Why it's special: You usually have very few anomaly examples. So pure supervised learning struggles. Often handled as unsupervised or one-class.
Predict what a user will like based on their history and similar users' behavior.
Examples:
| Problem | ML Type | Problem Type |
|---|---|---|
| Spam filter | Supervised | Classification |
| Predict tomorrow's stock price | Supervised | Regression |
| Group products by similarity | Unsupervised | Clustering |
| Find rare network attacks | Unsupervised (usually) | Anomaly detection |
| "You may also like..." | Specialized | Recommendation |
This is the lifecycle of every ML system. Memorize the order.
Gather raw data: databases, logs, images, audio, IoT sensors, clickstream, public datasets.
Clean the raw data so the model can use it.
| Task | What it does |
|---|---|
| Remove duplicates | Same row twice will mislead training |
| Handle missing values | Fill (imputation) or drop |
| Normalize / standardize | Scale numeric values (e.g., height in cm and salary in $ are on wildly different scales β bad for many models) |
| Convert formats | Image β tensor, text β tokens |
| Remove noise | Drop irrelevant or corrupted rows |
| Encode categories | "Country = India" β one-hot or embedding |
Transform raw fields into useful features that help the model learn.
Why features matter: A model can't extract every pattern on its own. Smart features make patterns obvious.
Examples:
date_of_birth = 1995-03-12 β Feature: age = 30transaction_timestamp β Features: hour_of_day, day_of_week, is_weekendaddress β Features: city, pincode, is_metroThe model learns by adjusting its internal parameters to minimize prediction error.
| Component | Meaning |
|---|---|
| Algorithm | The learning method (e.g., XGBoost, neural network, k-means) |
| Training data | Examples used to teach |
| Parameters | Internal numbers the model learns (weights, biases) |
| Loss function | Measures how wrong the predictions are |
| Optimizer | Adjusts parameters to reduce loss (e.g., gradient descent) |
| Hyperparameters | Settings YOU choose before training (learning rate, batch size) |
Test the trained model on data it has never seen. This tells you if it actually generalizes or if it just memorized.
Metrics depend on the problem type β we'll cover these in detail in Topic 10.
Put the model into production so users / apps can use it.
| Type | When to use |
|---|---|
| Real-time inference | Need an answer in <1 second (chatbot, fraud check) |
| Batch inference | Score millions of records overnight (no rush) |
| Async inference | Large input, takes minutes to process (video analysis) |
| Edge deployment | Model runs on the device itself (phone, IoT sensor) |
After deployment, watch for problems:
| Issue | What it means |
|---|---|
| Data drift | Input data has changed (e.g., new product line model never saw) |
| Model drift / concept drift | Relationship between input and output has changed (e.g., COVID changed shopping patterns) |
| Bias | Model treats groups unfairly |
| Latency | Predictions getting slower |
| Cost | Endpoint is too expensive |
| Errors | Failed inference requests |
When monitoring detects problems, you retrain the model with new data. This is the loop.
Adding the correct answer to each training example. Required for supervised learning.
AWS service: SageMaker Ground Truth. It coordinates human labelers (mechanical turk-style) and uses ML to speed up the process (active learning + auto-labeling).
You split your dataset into 3 parts:
| Split | Purpose | Typical size |
|---|---|---|
| Training set | Model learns from this | 70β80% |
| Validation set | Tune hyperparameters; pick best model | 10β15% |
| Test set | Final unbiased evaluation, used once | 10β15% |
The problem: Fraud detection has 1% fraud, 99% legit. A naive classifier predicts "legit" always β 99% accuracy β useless.
Fixes:
| Technique | What it does |
|---|---|
| Oversampling minority class | Duplicate (or synthesize) fraud examples |
| Undersampling majority class | Drop some legit examples |
| SMOTE | Generates synthetic minority samples |
| Class weights | Tell the model "mistakes on fraud cost 99x more" |
| Use F1 / Recall instead of accuracy | Fix the metric, not just the data |
Create modified versions of existing data to expand your training set artificially.
For images: rotate, crop, flip, change brightness, add noise, zoom.
For text: synonym replacement, back-translation, random word dropout.
For audio: time-stretching, pitch-shifting, adding background noise.
Purpose: Reduces overfitting and helps generalization. The model sees more "variety" without you collecting new data.
This distinction shows up constantly.
Numbers the model learns automatically during training.
Examples:
You don't set these. The optimizer does.
Numbers YOU choose before (or during) training.
Examples:
You tune these. This is called hyperparameter tuning β try different combinations and see which gives the best validation performance.
| Type | Set by | Example |
|---|---|---|
| Parameter | Model (during training) | Neural network weights |
| Hyperparameter | You (before training) | Learning rate |
The model learns.
| Factor | Training |
|---|---|
| Compute | Very high (GPUs) |
| Time | Long (hours to weeks) |
| Cost | High |
| Data | Large dataset |
| Output | A trained model |
| Frequency | Periodic (once, then retrained) |
The model predicts on new inputs.
| Factor | Inference |
|---|---|
| Compute | Lower (but scales with traffic) |
| Time | Should be fast |
| Cost | Depends on volume |
| Data | One input at a time (or a batch) |
| Output | A prediction |
| Frequency | Continuous / on-demand |
| Need | AWS option |
|---|---|
| Immediate (<1s) response | Real-time inference (SageMaker Real-time Endpoint) |
| Score a huge dataset offline | Batch Transform |
| Large payloads / long processing, async OK | Async Inference |
| Sporadic, unpredictable traffic | Serverless inference |
This mapping is on the exam in many forms. Memorize it.
This is the #1 ML concept. The exam will test it many ways.
The intuition: A student who memorizes the answer key but doesn't understand the subject. Aces practice tests, fails the real exam.
The model: Learns the noise in the training data, not just the pattern. Performs great on training data, poorly on new data.
Signs:
Causes:
Fixes (memorize these β exam favorite):
| Fix | How it helps |
|---|---|
| More training data | Harder to memorize a bigger set |
| Regularization (L1, L2) | Penalizes overly complex models |
| Reduce model complexity | Smaller network, fewer features |
| Dropout (neural nets) | Randomly turns off neurons during training |
| Early stopping | Stop training when validation accuracy plateaus |
| Cross-validation | Better estimate of true performance |
| Data augmentation | Create variations of existing data |
The intuition: A student who studied only the chapter titles. Bad at practice tests, bad at the real exam.
The model: Too simple to capture the pattern. Performs poorly on both training and test data.
Signs:
Causes:
Fixes:
Closely related to overfitting/underfitting, but a deeper view.
Error from oversimplified assumptions. The model is too rigid to capture reality.
Example: Trying to fit a straight line to data that's shaped like a curve. No matter how much data you give it, a line can't bend.
High bias = underfitting.
Error from being too sensitive to the specific training set. Tiny changes in training data β wildly different model.
Example: A super-flexible squiggly curve that passes through every training point exactly. Slightly different data β totally different squiggle.
High variance = overfitting.
You can't easily have zero bias AND zero variance. Reducing one often increases the other.
| Situation | Bias | Variance | Outcome |
|---|---|---|---|
| Underfitting | High | Low | Model too simple |
| Overfitting | Low | High | Model too complex |
| Sweet spot | Low | Low | Generalizes well β |
The art of ML is finding the sweet spot.
This is where many candidates lose points. Master this section.
For binary classification, every prediction lands in one of 4 boxes:
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actually Positive | True Positive (TP) β | False Negative (FN) β β missed it |
| Actually Negative | False Positive (FP) β β false alarm | True Negative (TN) β |
Memorize this. All metrics derive from it.
Plain English: Of all predictions, how many were right?
Use when: Classes are roughly balanced (50/50 or close).
Plain English: Of everything I flagged as positive, how many actually were?
Use when: False positives are expensive. You want to be sure when you say "positive."
Classic example β Spam filter:
Plain English: Of all the actual positives out there, how many did I catch?
Use when: False negatives are expensive. You can't afford to miss a positive case.
Classic example β Cancer screening:
You can usually trade one for the other by adjusting the decision threshold.
Choose based on what's costlier in your domain.
Plain English: Harmonic mean of precision and recall. A single number that balances both.
Use when:
ROC curve: Plots Recall vs False Positive Rate at every possible threshold.
AUC (Area Under Curve): A single number from 0 to 1.
Use when:
Plain English: "How well does the model separate positives from negatives, regardless of where I draw the line?"
| Metric | Meaning |
|---|---|
| MAE (Mean Absolute Error) | Average of |prediction β actual| |
| MSE (Mean Squared Error) | Average of (prediction β actual)Β² β penalizes big errors more |
| RMSE | Square root of MSE β same units as the target |
| RΒ² | How much of the variance the model explains (0 to 1) |
| Scenario | Best Metric |
|---|---|
| Balanced classes | Accuracy |
| False positive is dangerous | Precision |
| False negative is dangerous | Recall |
| Imbalanced data, balance both | F1 |
| Compare binary classifiers, threshold-independent | AUC-ROC |
| Spam filter | Precision |
| Medical diagnosis / cancer detection | Recall |
| Fraud detection | F1 or Recall |
| Regression problem | RMSE / MAE / RΒ² |
You don't need to derive backprop. You need to know what each piece does.
A single neuron does:
| Layer | Job |
|---|---|
| Input layer | Receives raw features |
| Hidden layer(s) | Learn intermediate patterns (more layers = "deeper" = deep learning) |
| Output layer | Produces the final prediction |
Without these, a neural network is just a glorified linear model. Activations let it learn curves and complex shapes.
| Activation | When used |
|---|---|
| ReLU (Rectified Linear Unit) | Default for hidden layers in most modern networks |
| Sigmoid | Output layer for binary classification (squashes to 0β1, interpretable as probability) |
| Softmax | Output layer for multi-class classification (gives probabilities that sum to 1) |
| Tanh | Older networks, some sequence models |
The intuition: The network makes a prediction. You compare it to the truth. The error gets propagated backward through the network, and each weight is nudged in the direction that would have reduced the error.
This is how a network learns.
Three families dominate the exam.
Best for: images.
Why: A regular neural network treats each pixel independently. A CNN uses convolutional filters that detect local patterns (edges, textures, shapes) and stack them into higher-level concepts (eye β face).
Use cases:
Best for: sequential / time-ordered data.
Why: RNNs have a "memory" β output at time t depends on input at time t AND the hidden state from time t-1. This lets them process sequences.
Use cases:
Weakness: Struggles with long-range dependencies. If a sentence is 50 words long, by the time RNN reaches word 50, it's largely forgotten word 1. Transformers fixed this.
Best for: language and generative AI. The architecture behind every modern LLM (GPT, Claude, Llama, etc.).
Why they dominate: Instead of processing tokens one at a time like RNNs, transformers use an attention mechanism that lets every token directly look at every other token. This means:
Use cases:
| Data shape | Use |
|---|---|
| Images, video | CNN |
| Time series, sensor streams | RNN (or modern variants) |
| Text, language, generative tasks | Transformer |
MLOps = DevOps applied to ML. Building and shipping models reliably.
Track every version of every model.
Why:
| Stage | Meaning |
|---|---|
| CI (Continuous Integration) | Test your code, data pipelines, and model training scripts automatically on every change |
| CD (Continuous Delivery) | Deploy new models safely (canary, blue-green) |
| CT (Continuous Training) | Retrain automatically when triggers fire |
| Monitor | Why |
|---|---|
| Accuracy / business metric | Is the model still useful? |
| Latency | Is inference still fast? |
| Errors | Are requests failing? |
| Data drift | Has input distribution shifted? |
| Model drift | Has prediction quality declined? |
| Bias | Is the model fair across groups? |
You retrain when:
AWS service: SageMaker Pipelines automates this whole loop.
SageMaker AI is AWS's flagship ML platform. It covers the full ML lifecycle: prepare data, build, train, tune, deploy, monitor.
The exam will test which feature of SageMaker handles which part of the lifecycle. Memorize these.
Three levels of control:
| Option | Control | Effort | When to use |
|---|---|---|---|
| Built-in algorithms | Low | Low | Standard problems (XGBoost, K-Means, etc.) |
| Script mode | Medium | Medium | Bring your own TensorFlow / PyTorch / scikit-learn script, SageMaker manages infrastructure |
| Custom containers | High | High | Full control β bring your own Docker image |
Common built-in algorithms:
| Problem | Built-in algorithm |
|---|---|
| Classification | XGBoost |
| Regression | Linear Learner |
| Clustering | K-Means |
| Anomaly detection | Random Cut Forest |
| Recommendation | Factorization Machines |
| Time series | DeepAR |
| Image classification | ResNet (built-in) |
| Topic modeling | LDA, Neural Topic Model |
| Deployment | When | Examples |
|---|---|---|
| Real-time Endpoint | Need an answer in <1 sec | Fraud check at payment, chatbot, product recs on a website |
| Batch Transform | Score huge dataset offline, no rush | Score all customers overnight, monthly churn batch |
| Async Inference | Large payload OR long processing, async OK | Video analysis, large document processing, 1GB inputs |
| Serverless Inference | Sporadic / unpredictable traffic, want to skip server management | Internal tool used a few times a day |
Exam pattern:
These are API-based services. You don't train a model. You send data β AWS returns the AI result. Think of them as "AI as a service."
| Service | Purpose |
|---|---|
| Amazon Bedrock | Managed access to foundation models (Claude, Llama, Titan, Stable Diffusion, etc.) for generative AI apps. Pair with Knowledge Bases for RAG. |
| Amazon Q | Generative AI assistant for businesses (Q Business) and developers (Q Developer). |
| Amazon Lookout for Vision | Industrial defect detection in images |
| Amazon Lookout for Equipment | Predictive maintenance from sensor data |
| Amazon CodeWhisperer / Q Developer | AI code suggestions in IDE |
This is the table you must know cold. Every exam has 5+ questions that map directly to this.
| Use case | AWS service |
|---|---|
| Detect objects / faces in images or videos | Rekognition |
| Moderate unsafe image / video content | Rekognition |
| Extract text from scanned documents | Textract |
| Extract tables / forms / key-value from invoices | Textract |
| Convert speech / audio to text | Transcribe |
| Convert text to natural-sounding speech | Polly |
| Translate text between languages | Translate |
| Sentiment / entities / key phrases in text | Comprehend |
| Build chatbot / voice bot | Lex |
| Forecast future demand / sales | Forecast |
| Recommend products / content to users | Personalize |
| Search across enterprise documents | Kendra |
| Detect fraud (legacy) | Fraud Detector (or SageMaker) |
| Detect anomalies in business metrics | Lookout for Metrics |
| Build / train / deploy custom ML model | SageMaker AI |
| Label training data | SageMaker Ground Truth |
| Auto-build ML models | SageMaker Autopilot |
| Detect bias / explain predictions | SageMaker Clarify |
| Monitor drift in production model | SageMaker Model Monitor |
| Store / reuse ML features | SageMaker Feature Store |
| Use pretrained / foundation models | SageMaker JumpStart (or Bedrock for hosted FMs) |
| Automate ML workflow | SageMaker Pipelines |
| Real-time predictions | SageMaker Real-time Endpoint |
| Offline bulk predictions | SageMaker Batch Transform |
| Long-running / large-payload inference | SageMaker Async Inference |
| Generative AI app using foundation models | Amazon Bedrock |
| Detect manufacturing defects in images | Lookout for Vision |
| Predictive maintenance from sensors | Lookout for Equipment |
Almost never on the exam. If the dataset is imbalanced (fraud, disease, etc.) β use F1, Recall, or AUC-ROC.
Mnemonic: Transcribe writes down what was said. Polly reads things out loud.
| Situation | Choose |
|---|---|
| There's a ready API for this exact task | Pre-built AI service |
| You need a custom model on your own data | SageMaker AI |
| You want full ML lifecycle control | SageMaker AI |
| Team has no ML expertise | Pre-built AI service or Autopilot / Canvas |
A common wrong answer: "add regularization". That's the fix for overfitting. For underfitting β reduce regularization, add complexity, train longer.
The first answer is rarely "tune the model" β it's "collect more representative data" or "use SageMaker Clarify to detect and mitigate bias."
| Concept | Remember this |
|---|---|
| Supervised | Labeled data |
| Unsupervised | No labels, find patterns |
| Reinforcement | Agent learns by reward |
| Semi-supervised | Few labels + many unlabeled |
| Classification | Predict category |
| Regression | Predict number |
| Clustering | Group similar things |
| Anomaly detection | Find unusual behavior |
| Recommendation | Suggest relevant items |
| Overfitting | Memorized training data, fails on new |
| Underfitting | Too simple to learn |
| Bias | Too simple assumptions (underfit) |
| Variance | Too sensitive to training data (overfit) |
| Precision | Avoid false positives |
| Recall | Avoid false negatives |
| F1 | Balance precision and recall |
| AUC-ROC | Class separation ability |
| CNN | Images |
| RNN | Sequences |
| Transformer | Language / GenAI / LLMs |
| Parameter | Model learns |
| Hyperparameter | You tune |
| Training | Model learns |
| Inference | Model predicts |
| Real-time inference | <1 second response |
| Batch Transform | Offline, large dataset |
| Async Inference | Large payload, long processing |
| Ground Truth | Label data |
| Autopilot | AutoML |
| Clarify | Bias / explainability |
| Model Monitor | Drift monitoring |
| Feature Store | Reuse features across teams |
| JumpStart | Pretrained models, foundation models |
| Pipelines | ML workflow automation |
| Bedrock | Foundation models as a service |
| Rekognition | Image / video |
| Textract | Document / OCR |
| Comprehend | NLP analysis |
| Transcribe | Speech β Text |
| Polly | Text β Speech |
| Translate | Language translation |
| Lex | Chatbots |
| Forecast | Time-series forecasting |
| Personalize | Recommendations |
| Kendra | Enterprise search |
| Lookout for Metrics | Anomalies in KPIs |
You will not fail this exam on definitions. You will fail it on service selection under pressure. So focus accordingly.
Answer these in your head. If you stumble, re-read the relevant topic.
If you got 8+ correct, you're in great shape for Day 1. Don't try to do practice MCQs yet β let this material settle. Tomorrow, hit practice questions hard.
End of Day 1 guide. Re-read Topics 19 + 21 right before sleep β last-thing-you-read sticks best.