Unit 3 RNN
Unit 3 RNN
1. Introduction to Sequential Data and Time Series
1.1 What is Sequential Data?
Sequential data refers to data that is ordered in time or position.
Unlike traditional datasets where individual samples are independent of each other, in sequential data the order of elements matters because each element depends (partly or fully) on previous ones.
• Example in text:
“The dog chased the cat.”
If we shuffle the words, the meaning changes completely.
• Example in finance:
Today’s stock price depends on yesterday’s price and trends.
• Example in healthcare:
Heart rate patterns (ECG) are meaningful only when observed over time.
Key property: Past influences the future.
1.2 Types of Sequential Data
1. Time Series Data
o Data collected over time at equal or unequal intervals.
o Examples: stock prices, daily temperature, electricity consumption.
o Usually represented as:
x1,x2,x3,…,xTx_1, x_2, x_3, …, x_Tx1,x2,x3,…,xT
where TTT = number of time steps.
2. Event Sequences
o Data generated by a sequence of events (not necessarily evenly spaced).
o Examples: user clicks on a website, network log events, earthquake occurrences.
3. Natural Language Sequences
o Words or characters arranged in a meaningful order.
o Examples: sentences, paragraphs, speeches.
o Contextual dependency is very strong (meaning depends on word order).
4. Biological Sequences
o Sequences in biology and medicine.
o Examples: DNA/RNA sequences, amino acid chains, ECG/EEG signals.
1.3 Challenges in Sequential Data
1. Dependency
o Future values depend on past ones.
o Example: predicting the next word in a sentence requires context from previous words.
2. Noise
o Real-world sequential data often contains random fluctuations.
o Example: sudden stock market drops due to news events.
3. Irregular Spacing
o Not all data comes at fixed intervals.
o Example: hospital visits (patients may come irregularly).
4. Long-Range Dependencies
o Sometimes distant past events affect current outcomes.
o Example: In a story, understanding the ending may require recalling the beginning.
5. High Dimensionality
o Many sequences involve multiple features.
o Example: weather data (temperature, humidity, wind speed, pressure).
1.4 Examples
1. Stock Market Data
o Each day’s stock price depends on previous days’ values, economic conditions, and news.
o If plotted over time, we see trends, seasonality, and sudden fluctuations.
2. ECG (Electrocardiogram) Signals
o ECG records heart activity over time.
o Doctors analyze sequences of peaks and troughs to detect abnormalities.
o Machine learning models trained on sequential ECG data can predict arrhythmias.
1.5 Diagrams
Diagram 1: Stock Price Time Series
Diagram 2: Sequential Text Flow
2. Recurrent Neural Networks (RNNs)
2.1 Why Feedforward Networks Fail for Sequences
A traditional Feedforward Neural Network (FNN) assumes that:
• Inputs are independent of each other.
• Order of inputs does not matter.
This assumption breaks down for sequential data, because:
1. Context is important – The meaning of a word depends on previous words. Example: “I am going to the bank” → could mean river bank or financial bank depending on context.
2. Fixed-size input/output – FNNs expect fixed-size input vectors, but sequences may be long or short.
3. No memory – FNNs cannot remember past inputs to influence future outputs.
Example: In predicting the next word:
• Input sequence = “The cat sat on the …”
• A feedforward net cannot use the order/context of previous words effectively.
Hence, we need models with memory of past inputs → RNNs.
2.2 RNN Architecture
An RNN introduces recurrence:
• At each time step ttt, it takes the current input xtx_txt and the previous hidden state ht−1h_{t-1}ht−1 to compute the new hidden state hth_tht.
• This hidden state acts as a memory of past inputs.
Formulas
1. Hidden state update:
2. Output at time step ttt:
Where:
The same weights are used across all time steps → parameter sharing.
2.3 Unrolled RNN
Instead of drawing the recurrent loop, we can “unroll” the RNN across time steps:
x1 → [h1] → y1
x2 → [h2] → y2
x3 → [h3] → y3
...
Each hidden state hth_tht depends on both xtx_txt and the previous state ht−1h_{t-1}ht−1.
Diagram: Unrolled RNN showing inputs (x1, x2, …), hidden states (h1, h2, …), and outputs (y1, y2, …).
2.4 Backpropagation Through Time (BPTT)
Training RNNs involves minimizing a loss function, e.g.,
We compute gradients using Backpropagation Through Time (BPTT):
1. Unroll the RNN across all time steps.
2. Apply standard backpropagation from last time step to the first.
3. Update weights using gradient descent.
Problem: Gradients involve repeated multiplication of weight matrices across time → leads to exploding or vanishing gradients.
2.5 Exploding and Vanishing Gradients
1. Vanishing Gradients
o When gradients shrink exponentially during backpropagation.
o Caused when eigenvalues of weight matrix WhhW_{hh}Whh are < 1.
o Formula:
→ very small values → model cannot learn long-term dependencies.
2. Exploding Gradients
o When gradients grow exponentially large.
o Happens if eigenvalues of WhhW_{hh}Whh > 1.
o Causes unstable training, very large weight updates.
Solutions:
• Gradient clipping (for exploding gradients).
• LSTM/GRU (for vanishing gradients).
Diagram: Graph showing vanishing vs exploding gradient effect over time steps.
2.6 Real-World Applications of RNNs
1. Text Prediction & Language Modeling
o Predicting the next word in a sentence.
o Example: Google’s early language models (before Transformers).
2. Speech Recognition
o Convert spoken words into text.
o RNNs can model the sequence of sound features over time.
3. Handwriting Recognition
o Recognizing characters written in a continuous sequence.
4. Financial Time-Series Forecasting
o Predicting stock prices or currency exchange rates.
2.7 Summary
• RNNs solve the limitation of feedforward nets by adding memory.
• They process sequential inputs using hidden states and shared weights.
• Training uses BPTT, but faces vanishing/exploding gradient problems.
• Widely applied in NLP, speech recognition, and forecasting.
• Limitations of RNNs motivated more advanced architectures (LSTM, GRU).
3. Long Short-Term Memory (LSTM)
3.1 Motivation for LSTMs
Recurrent Neural Networks (RNNs) are powerful for modeling sequential data. However, they face a major limitation:
• Vanishing gradients → Long-term dependencies are forgotten.
• Exploding gradients → Training becomes unstable.
Example: If you want to predict the meaning of the last word in “The clouds are in the sky, and it is going to ___”, the model must remember that “clouds” appeared long ago. A simple RNN struggles to preserve this memory.
LSTMs were designed to overcome this by introducing a memory cell and gates that regulate information flow.
3.2 LSTM Gate Mechanism
An LSTM cell has three key gates:
1. Forget Gate – decides what information to discard.
2. Input Gate – decides what new information to store.
3. Output Gate – decides what to output.
3.3 Step-by-Step Forward Pass (Math)
1. Forget previous info
2. Update memory cell
The cell state
carries forward both long-term and short-term context.
3. Compute hidden state
This mechanism allows LSTMs to remember information for hundreds of time steps.
3.4 Example: Predicting Next Word in a Sentence
Suppose we train an LSTM on news articles.
• Input sequence: “Stock market sees sharp rise in”
• LSTM keeps track of context (finance-related words).
• Predicted next word: “shares” or “prices”.
Unlike an RNN, the LSTM remembers that “stock market” implies financial terms even after many words.
3.5 Diagram: LSTM Cell
Below is a diagram representing the flow inside an LSTM cell (with forget, input, and output gates):
3.6 Python Snippet (Keras LSTM for Text Generation)
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
# Example: Text generation with LSTM
vocab_size = 5000 # Assume 5000 unique words
embedding_dim = 64
hidden_units = 128
model = Sequential([
Embedding(vocab_size, embedding_dim, input_length=50),
LSTM(hidden_units, return_sequences=False),
Dense(vocab_size, activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='adam')
print(model.summary())
# After training, we can predict the next word:
# input_seq = np.array([[word1, word2, ..., word50]])
# predicted_word = np.argmax(model.predict(input_seq))
Comments
Post a Comment