AI Practice and Technologies Interacting with Humans and the Real World — Revision Sheet

1. 📌 Essentials

NLP enables machines to interpret, understand, and generate human language naturally.
Classical NLP pipeline: tokenization, morphology/POS tagging, syntax, semantics, pragmatics.
Word embeddings (static and contextual) represent words as vectors capturing meaning.
Transformersized NLP with parallel processing and self-attention.
Large Language Models (LLMs): BERT, GPT, T5, capable of understanding and generating language.
Practical tools: spaCy (fast, rule-based), Hugging Face (state-of-the-art neural models).
Responsible NLP: address bias, fairness, privacy, and energy consumption.
Challenges include ambiguity, context-dependence, and intent recognition.
Hierarchical flow: raw text → structured understanding → response or action.
Future trends: instruction tuning, retrieval augmentation, multilingual models, safety.

2. 🧩 Key Structures & Components

Tokenization — splits text into units (words, subwords).
Morphology & POS tagging — identifies grammatical forms and parts of speech.
Syntax parsing — builds sentence structure (trees, dependencies).
Semantics mapping — assigns meaning to words/phrases.
Pragmatics — infers speaker intent based on context.
Word Embeddings — dense vector representations of words.
Static embeddings — Word2Vec, GloVe; limited polysemy handling.
Contextual embeddings — BERT, GPT; dynamic, context-aware.
Neural sequence models — RNNs, LSTMs, attention mechanisms.
Transformers — parallel, self-attention-based models.
Large Language Models — encoder-only, decoder-only, encoder–decoder architectures.
Tools — spaCy, Hugging Face Transformers.

3. 🔬 Functions, Mechanisms & Relationships

Pipeline flow: raw text → tokenization → morphology/POS → syntax parsing → semantics → pragmatics.
Embeddings: convert words into vectors; proximity indicates similarity.
Static vs. contextual embeddings: static (Word2Vec) are fixed; contextual (BERT) change with context.
Neural models: process sequences, with RNNs/LSTMs capturing order; attention highlights relevant info.
Transformers: use self-attention to model global context in parallel.
LLMs: scale models for diverse tasks—classification, translation, generation.
Tools: implement NLP tasks efficiently; spaCy for speed, Hugging Face for flexibility.
Responsible NLP: balance accuracy, interpretability, and ethical considerations.

4. 📊 Comparative Table

Item	Key Features	Notes / Differences
Classical NLP pipeline	Tokenization → Morphology/POS → Syntax → Semantics → Pragmatics	Layered analysis from raw text to meaning
Bag of Words / TF–IDF	Unordered, simple, fast; weights important words	Ignores word order and structure
Static embeddings	Word2Vec, GloVe; fixed vectors for words	Limited by polysemy; context-independent
Contextual embeddings	BERT, GPT; dynamic, context-dependent	Handle polysemy; adapt meaning based on context
Neural sequence models	RNNs, LSTMs; process sequences with memory	Struggle with long dependencies
Attention mechanisms	Focus on relevant parts of input	Improve relevance in sequence processing
Transformers	Parallel, self-attention; foundation of modern NLP	Efficient, scalable, handle long-range dependencies
Large Language Models	Encoder-only (BERT), decoder-only (GPT), encoder–decoder (T5)	Capable of understanding and generating language

5. 🗂️ Hierarchical Diagram (ASCII)

NLP & HCI
 ├─ Interaction paradigms
 │   ├─ Button/menu commands
 │   └─ Natural language understanding
 ├─ Classical pipeline
 │   ├─ Tokenization
 │   ├─ Morphology & POS
 │   ├─ Syntax parsing
 │   ├─ Semantics mapping
 │   └─ Pragmatic inference
 ├─ Word representations
 │   ├─ Bag of Words / TF–IDF
 │   ├─ Static embeddings (Word2Vec, GloVe)
 │   └─ Contextual embeddings (BERT, GPT)
 ├─ Neural models
 │   ├─ RNNs / LSTMs
 │   ├─ Attention mechanisms
 │   └─ Transformers
 ├─ Large language models
 │   ├─ Encoder-only (BERT)
 │   ├─ Decoder-only (GPT)
 │   └─ Encoder–decoder (T5, BART)
 ├─ Practical tools
 │   ├─ spaCy
 │   └─ Hugging Face
 └─ Responsible NLP
     ├─ Accuracy vs interpretability
     ├─ Bias, fairness, privacy
     └─ Sustainability

6. ⚠️ High-Yield Pitfalls & Confusions

Confusing static and contextual embeddings; static cannot handle polysemy well.
Overlooking the importance of syntax parsing in semantic understanding.
Assuming larger models always outperform smaller ones without considering resource constraints.
Misinterpreting bag of words as capturing syntax or context.
Ignoring bias and fairness issues in large models.
Believing tokenization is trivial; it varies greatly across languages.
Overestimating the interpretability of neural models.
Confusing encoder-only (BERT) with decoder-only (GPT) architectures.

7. ✅ Final Exam Checklist

Understand the stages of the classical NLP pipeline.
Differentiate between static and contextual word embeddings.
Know key models: RNNs, LSTMs, Transformers, BERT, GPT.
Be familiar with practical NLP tools: spaCy, Hugging Face.
Recognize the importance of responsible AI: bias, fairness, privacy.
Comprehend how attention mechanisms improve relevance.
Know evaluation metrics: F1, BLEU, ROUGE.
Be aware of sustainability practices: model compression, caching.
Understand future trends: instruction tuning, retrieval-augmented generation.
Grasp the hierarchical flow from raw text to meaningful response.
Recognize challenges: ambiguity, context-dependence, multilinguality.
Know the differences between model architectures and their applications.
Be prepared to discuss ethical considerations in deploying NLP systems.

End of Revision Sheet

Lernzettel: AI Language Interaction and Technologies

AI Practice and Technologies Interacting with Humans and the Real World — Revision Sheet

1. 📌 Essentials

2. 🧩 Key Structures & Components

3. 🔬 Functions, Mechanisms & Relationships

4. 📊 Comparative Table

5. 🗂️ Hierarchical Diagram (ASCII)

6. ⚠️ High-Yield Pitfalls & Confusions

7. ✅ Final Exam Checklist

Teste dein Wissen

Mit Karteikarten lernen

Similar courses

Écosystème de l’esport et médiation numérique

Listes, piles, files et arbres

Algorithmique et structures de données

Gestion des fichiers en PHP

Identification utilisateur en PHP

Automated PCB Fault Diagnosis

Erstelle deine eigenen Lernzettel