Machine Learning for Language Technology (Candidate Program, 2016)
An Introductory Course
Credits: 7,5 hp
Teacher: Marina Santini, santinim [AT] stp.lingfil.uu.se
Venue: Department of Linguistics and Philology, Hus 9.
Syllabus: 5LN454 (Swedish).
News
- Course motto: "Verba volant, (trans)scripta manent" ;-)
The original Latin proverb Verba volant, scripta manent literally means "spoken words fly away, written words remain". - 2017-01-23. Assignments have been graded. Debriefing is online.
- 2016-12-15. This course is now closed. Students can start submitting home assignments. Assignments' deadlines are listed at the bottom of this page.
- The first lecture is scheduled for Mon, 7 November 2016.
This course is a gentle introduction to the theoretical foundations of machine learning and to the Weka Machine Learning Workbench.IMPORTANT: It is important to attend the introductory lecture on Mon, November 7, 2016. In this opening lecture, the structure and the organization of the course will be explained. The course follows the "Flipped Classroom" learning strategy and relies on the Scalable Learning platform.
WARNING: Watching pre-recorded lectures, answering online quizzes, reading slides, handouts and required literature (see below) should be completed BEFORE in-class labs on the listed date. In-class labs focus on practical tasks that presuppose the theoretical knowledge acquired before meeting in the classroom. Attending labs without home preparation is ineffective and hence discouraged. Lab tasks should be carried out in groups (2 or 3 students per group).
- The course is taught in English.
- Course Timetable
Schedule and List of Topics
Last Updated: 23 Jan 2017
Lect | Date | Time | Room | Content | Required Reading |
---|---|---|---|---|---|
1 |
7/11 |
13:15-15:00 |
9-2043 (Chomsky) |
In-class: Introduction to the Course (slides) The Flipped Classroom (video) What is Machine Learning (slides) |
- Handouts (1, 2) - Witten et al. (2011): Ch 1 - Optional: What's Machine Learning by Andrew Ng |
2 |
10/11 |
10:15-12:00 |
9-2042 (Turing) |
Online: Basic Concepts
(slides: 1, 2)
In-class: Lab1 |
- Handout - Daumé III (2015: 8-10; 19-24; 26-28) - Witten et al. (2011: Ch 2; Ch 11: 407-410) |
3 |
14/11 |
10:15-12:00 |
9-2042 (Turing) |
Online: Decision Trees
(slides:
1,
2,
3)
In-class: Feedback, Lab2 |
- Transcripts - Daumé III (2015: 10-18) - Witten et al. (2011: 99-108; 192-203) - Optional: Mitchell (1997: Ch3) |
4 |
21/11 |
13:15-15:00 |
9-2043 (Chomsky) |
Online: Evaluation
(slides: 1,
2,
3)
In-class: Feedback, Lab3 |
- Transcripts - Daumé III (2015: 60-67) - Witten et al. (2011: Ch 5) |
5 |
24/11 |
10:15-12:00 |
9-2042 (Turing) |
Online: k-Nearest Neighbours (slides: 1 [JNivre]; 2) In-class: Lab4 |
- Transcripts [JNivre] - Daume' III (2015: 26-32, excl. 2.4) - Witten et al. (2011:131-138) |
6 |
28/11 |
13:15-15:00 |
9-2043 (Chomsky) |
Online: Naive Bayes (Slides: 1 [JNivre]; 2) Feature Representation & Selection (slides) In-class: Feedback, Lab5 |
- Transcripts [JNivre] - Handout - Daumé III (2015: 53-59; 107-110) - Witten et al. (2011: 90-99) |
7 |
05/12 |
13:15-15:00 |
9-20433 (Chomsky) |
Online: *Generative vs Discriminative, Linear Models (slides [JNivre], excl. Loglinear & Log.Regr.) *Perceptron (slides: 1-2 [JNivre]) *Feature Transformation (Self-Study) (slides) In-class: Feedback, Lab6 |
- Handouts (1,
2,
3,
4) - Daumé III (2015: 39-52) - Witten et al. (2011: 305-308; 314-315; 322-323; 328-329; 331-332; 334) |
8 |
08/12 |
10:15-12:00 |
9-2042 (Turing) |
Online: k-Means Clustering (Slides: 1 [AndrewNg]; 2) In-class: Lab7 Feedback |
- Transcripts - Daumé III (2015: 32-33) - Witten et al. (2011: 138-141) |
9 |
12/12 |
13:15-15:00 |
9-2043 (Chomsky) |
Online: Hierarchical Clustering (slides) In-class: Lab8 |
- Witten et al. (2011: 273-284) - Evaluation of Clustering |
10 |
13/12 |
Online |
Online: A few words about ML4LT Assignments (Pdf) A few things to remember (slides: 1, 2, 3) Optional lab - Putting all toghether: Weka Tutorial by Svetlana S. Aksenova (suitable also for weka 3.6 or later versions) |
- Domingos (2012) |
Expected Learning Outcomes
In order to pass the course, a student must be able to:* apply basic principles of machine learning to natural language data;
* evaluate the performance of machine learning schemes;
* use standard off-the-shelf software for machine learning;
* apply supervised and unsupervised models for classification.
Examination and Grading Criteria
The course is examined by means of 3 home assignments:- A few words about ML4LT Assignments (Pdf) (see also Lect 10)
- Assignment 1: Decision Trees and k-Nearest Neighbours
- Assignment 2: Naive Bayes
- Assignment 3: k-Means and Hierarchical Clustering
- Reflections about the Assignments: Debriefing
Assignments' Deadlines
18 Dec 2016: Ass1 and Ass215 Jan 2017: Ass 3
24 Feb 2017: Final submission date for all assignments
The following grades will be used:
- Underkänd (U) [Fail]
- Godkänt (G) [Pass]
- Väl Godkänt (VG) [Distinction]
Attendance
There is a mandatory 80% attendance requirement for both the lectures delivered through the online platform AND for the in-class lab sessions that will take place at the Department of Linguistics and Philology, Uppsala University, Room 9-2043 (Chomsky) or Room 9-2042 (Turing) (see schedule).Coursework and Reference List (Required Reading)
- Watching pre-recorded lectures on the Scalable Learning platform is required (see Attendance requirements above).
- Answering to online quizzes is required. Quizzes are not graded.
- Attendance to the in-class labs is required (see Attendance requirements above). Lab tasks are not graded.
- Required reading includes the references listed below:
- Handouts and Transcripts
- Hal Daumé III (2015). A Course in Machine Learning. Copyright © 2015.Only chapters specified in the timetable.
- Ian H. Witten, Eibe Frank, Mark A. Hall (2011). Data Mining: Practical Machine Learning Tools and Techniques. 3rd Edition. Morgan Kaufmann Publishers. Only chapters specified in the timetable. You can also use the 2nd edition (freely available online).
- Petro Domingos (2012). A Few Useful Things to Know about Machine Learning. Communications of the ACM, 55(10), 78-87.
- Evaluation of Clustering in C. D. Manning, P. Raghavan & H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, © 2008 Cambridge University Press. Website: http://informationretrieval.org/
Additional References (Optional Reading, might require deeper mathematical background)
(Chronological order)- Hofmann, M., & Chisholm, A. (Eds.). (2016). Text Mining and Visualization: Case Studies Using Open-Source Tools. CRC Press.
- Rogers, S., & Girolami, M. (2015). A first course in machine learning. CRC Press.
- Weiss, S. M., Indurkhya, N., & Zhang, T. (2015). Fundamentals of predictive text mining. London: Springer. Second Edition.
- Alpaydin, E. (2014). Introduction to machine learning. MIT press. Third Edition.
- Jebara, T. (2012). Machine learning: discriminative and generative. Springer Science & Business Media. Second Edition.
- Mohri M., Rostamizadeh A. and Talwalkar A. (2012) Foundations of Machine Learning. The MIT Press. Sample Chapter: Introduction
- Liu, B. (2011). Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media. Second Edition.
- Smola, A., & Vishwanathan, S.V.M. (2008). Introduction to Machine Learning. Cambridge University Press.
- Emms, M., & Luz, S. (2007, August). Machine learning for natural language processing. ESSLLI. Pdf
- Mitchell, T. M. (1997). Machine learning. McGraw-Hill Science/Engineering/Math.
© 2016. UPPSALA UNIVERSITET, Institutionen för lingvistik och filologi
Box 635, 751 26 Uppsala, Sweden. Web page updated by: Marina Santini.