Comprehensive Deep Learning and Neural Networks Concepts and Applications

Flashcard Icon

Flashcard

Learn Mode Icon

Learn Mode

Match Icon

Match

Coming Soon!
Library Icon

Library

View Library
Match Icon

Create

Create More Decks
Flashcard Icon Flashcards
Flashcard Icon Flashcards
Library Icon Library
Match Icon Match (Coming Soon)

Computer Science - Artificial Intelligence

View Results
Full Screen Icon

user_hodr Created by 7 mon ago

Cards in this deck(46)
Compared to classic machine learning, Deep Learning often requires _____?
Blur Image
In order to reduce loss step by step, what direction does the gradient descent algorithm take a step in every iteration?
Blur Image
Which claim is correct when choosing a learning rate for training a neural network?
Blur Image
Which claim is correct regarding the choice of learning rate based on the gradient's absolute value?
Blur Image
Which claim is WRONG about the use of labeled data in different learning paradigms?
Blur Image
Which claim is WRONG about the branches of deep learning?
Blur Image
When updating parameters using gradient descent, which way of calculating loss works better for efficiency and robustness?
Blur Image
In mini-batch SGD training, why is it important to shuffle the training data before every epoch?
Blur Image
Logistic Regression is widely used to solve which type of problem?
Blur Image
In information theory, which event includes more information?
Blur Image
Which statement is true about activation functions in neural networks?
Blur Image
Which case is an example of overfitting in a machine learning model?
Blur Image
What approach could be used to handle overfitting in a machine learning model?
Blur Image
All regularizations (e.g., L1 norm, L2 norm) penalize larger parameters. Is this statement true or false?
Blur Image
Besides penalizing larger parameters, which regularization makes parameters more sparse?
Blur Image
In Backpropagation, which claim is true about the use of information in forward and backward passes?
Blur Image
As an activation function, does tanh avoid the vanishing gradient problem?
Blur Image
As an activation function, does ReLU solve the vanishing gradient problem?
Blur Image
About SGD optimization, which statement is NOT correct?
Blur Image
Which statement about the learning rate in Stochastic Gradient Descent (SGD) optimization is correct?
Blur Image
Does MaxPooling preserve detected features and downsample the feature map (image)?
Blur Image
If the input volume of an image is 227x227x3, and we apply 96 11x11 filters with stride 4, how many parameters are there?
Blur Image
In CNN, can two convolutional layers be connected directly without a pooling layer in the middle?
Blur Image
In the design of CNN, does the fully connected layer usually contain more parameters than convolutional layers?
Blur Image
What is the purpose of the ReLU activation function in a CNN?
Blur Image
Which statement is true about the convolution layer in neural networks?
Blur Image
What is the main advantage of using dropout in a CNN?
Blur Image
Given two stacking dilated convolution layers with kernel size 3x3, stride 1, and dilation 2, what is the size of the receptive field?
Blur Image
Given a convolution layer with input channels 3, output channel 64, kernel size 4x4, stride 2, dilation 3, and padding 1, what is the parameter size?
Blur Image
In PyTorch, which layer configuration downsamples the input size into half?
Blur Image
In the design of an auto-encoder, should the encoder and decoder follow the exact same structure?
Blur Image
How can you identify activation in relation to function and gradient?
Blur Image
Which way do we usually use to train an autoencoder model?
Blur Image
Which claim is true about attention and self-attention in neural networks?
Blur Image
What's the major purpose of multi-head attention in neural networks?
Blur Image
In the transformer neural network architecture, do the encoder blocks usually use the identical neural network structure?
Blur Image
In the transformer neural network architecture, the output of the final encoder block will go to _____
Blur Image
In the autoregressive model, does the output variable at the current step depend only on the hidden states at all previous steps?
Blur Image
In Transformer, how does the decoder use the information (features) from the encoder?
Blur Image
In the policy gradient approach for reinforcement learning, the reward R(τ^n ) is considered based on _____
Blur Image
In the two major approaches of reinforcement learning, which is usually more sample-efficient?
Blur Image
In Q-Learning, which method is more scalable for predicting a Q-value for a pair of (state, action)?
Blur Image
Describe the process of training in relation to model and layer?
Blur Image
In the two major approaches of reinforcement learning, which one is on-policy training?
Blur Image
For Discrete-event modeling, what approach do we often use?
Blur Image
Why is using stochastic gradient descent to train a generator based on the following loss function inefficient?
Blur Image

Ask Our AI Tutor

Get Instant Help with Your Questions

Need help understanding a concept or solving a problem? Type your question below, and our AI tutor will provide a personalized answer in real-time!

How it works

  • Ask any academic question, and our AI tutor will respond instantly with explanations, solutions, or examples.
Flashcard Icon
  • Browse questions and discover topic-based flashcards
  • Practice with engaging flashcards designed for each subject
  • Strengthen memory with concise, effective learning tools