DATA SCIENCE FOR CRIME SCIENTISTS (ADVANCED CRIME ANALYSIS) 2018/2019
This is the companion website for the 2018-2019 module for 3rd year undergraduate students of the BSc in Crime Science at UCL.
Resources
The module handbook provides you with all information around assessment, learning outcomes, timetables, and a general overview of the module. Use the module handbook as your go-to guide throughout the module.
Week 1: INTRODUCTION
- Lecture 1: Introduction (slides html), (pdf)
- Homework 1: Getting ready for R
- Homework 2: R in 12 Steps
Suggested reading:
- Williams, M. L., Burnap, P., & Sloan, L. (2017). Crime Sensing With Big Data: The Affordances and Limitations of Using Open-source Communications to Estimate Crime Patterns. The British Journal of Criminology, 57(2), 320–340. https://doi.org/10.1093/bjc/azw031
Tutorial:
- How to solve R data science problems, SOLUTIONS
- 17 Steps to investigate R dataframes - https://www.rstatisticsblog.com/r-tutorial/dataframe-manipulations/
Week 2: WEB SCRAPING 1
- Lecture 2: APIs and web-scraping (slides), pdf
- Homework: Getting API access, SOLUTIONS
Required reading/preparation:
- Pfeffer, J., Mayer, K., & Morstatter, F. (2018). Tampering with Twitter’s Sample API. EPJ Data Science, 7(1), 50. https://doi.org/10.1140/epjds/s13688-018-0178-0
Suggested reading:
- Solymosi, R., Bowers, K. J., & Fujiyama, T. (2018). Crowdsourcing Subjective Perceptions of Neighbourhood Disorder: Interpreting Bias in Open Data. The British Journal of Criminology, 58(4), 944–967. https://doi.org/10.1093/bjc/azx048
- Founta, A.-M., Djouvas, C., Chatzakou, D., Leontiadis, I., Blackburn, J., Stringhini, G., … Kourtellis, N. (2018). Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. ArXiv:1802.00393 [Cs]. Retrieved from http://arxiv.org/abs/1802.00393
Week 3: WEB SCRAPING 2
Required reading:
- Mozilla MDN (2018). HTML basics. Retrieved January 6, 2019, from https://developer.mozilla.org/en-US/docs/Learn/Getting_started_with_the_web/HTML_basics
- R Web Scraping Tutorial with rvest. (2018, February 27). Retrieved January 20, 2019, from https://www.datacamp.com/community/tutorials/r-web-scraping-rvest
- Solares, J. R. A. (2017, August 2). Web scraping tutorial in R. Retrieved January 20, 2019, from https://towardsdatascience.com/web-scraping-tutorial-in-r-5e71fd107f32
- Dsilva, D. (2018, May 4). Learn To Create Your Own Datasets — Web Scraping in R. Retrieved January 20, 2019, from https://towardsdatascience.com/learn-to-create-your-own-datasets-web-scraping-in-r-f934a31748a5
- Hadley Wickham (2016). rvest: Easily Harvest (Scrape) Web Pages. R package version 0.3.2. https://CRAN.R-project.org/package=rvest, pdf
Suggested reading:
- ElSherief, M., Kulkarni, V., Nguyen, D., Wang, W. Y., & Belding, E. (2018). Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media. ArXiv:1804.04257 [Cs]. Retrieved from http://arxiv.org/abs/1804.04257
Tutorial: Webscraping and APIs in R, SOLUTIONS
Week 4: TEXT DATA 1
- Lecture 4: Text data and text mining in R slides, pdf
- Homework: Text data basics, SOLUTIONS
Required reading:
- Grolemund, G., & Wickham, H. (2018). Strings. In R for Data Science. Retrieved from https://r4ds.had.co.nz/
Suggested tutorials/reading:
- Replication of Chapter 5 of Quantitative Social Science: An Introduction. (n.d.). Retrieved January 26, 2019, from https://quanteda.io/articles/pkgdown/replication/qss.html
- Example: textual data visualization. (n.d.). Retrieved January 26, 2019, from https://quanteda.io/articles/pkgdown/examples/plotting.html
Tutorial: -
Week 5: TEXT DATA 2
- Lecture 5: Advanced text analysis in R (slides), (pdf)
- Tutorial: Data cleaning and preprocessing and text mining in R, (raw Rmd file), SOLUTIONS
Required reading:
- Kleinberg, B., Mozes, M., & Van der Vegt, I. (2018). Identifying the sentiment styles of YouTube’s vloggers. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 3581–3590. Retrieved from http://aclweb.org/anthology/D18-1394
- Pérez-Rosas, V., Kleinberg, B., Lefevre, A., & Mihalcea, R. (2018). Automatic Detection of Fake News. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 3391–3401). Santa Fe, New Mexico, USA: Association for Computational Linguistics. Retrieved from http://aclweb.org/anthology/C18-1287
Week 6: MACHINE LEARNING 1
- Lecture 6: Machine learning in R 1 (slides), (pdf)
- Homework: Supervised machine learning in R, (SOLUTIONS)
Required reading:
- http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
- Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling. New York: Springer-Verlag. Retrieved from https://www.springer.com/de/book/9781461468486
- Chapter: “Introduction”
- Chapter: “A Short Tour of the Predictive Modeling Process”
Recommended reading:
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (2nd ed.). New York: Springer-Verlag. Retrieved from https://www.springer.com/de/book/9780387848570
- Chapter: “Overview of Supervised Learning”
- Chapter: “Linear Methods for Classification”
Week 7: MACHINE LEARNING 2
- Lecture 7: Unsupervised machine learning in R + performance metrics (slides), (pdf)
- Tutorial: Machine learning in R, (SOLUTIONS)
Required reading:
- Gatto, L. (n.d.). An Introduction to Machine Learning with R. Retrieved from https://lgatto.github.io/IntroMachineLearningWithR/unsupervised-learning.html
- Chapter 4: Unsupervised learning
Recommended:
- DataCamp course on Unsupervised Learning in R https://www.datacamp.com/courses/unsupervised-learning-in-r
Week 8: PROMISES AND PROBLEMS
- Lecture 8: Advances, promises and problems of data science for crime science (slides), (pdf)
- No tutorial
- Homework: peer-feedback + your project + revision
Required reading
- Coveney, P. V., Dougherty, E. R., & Highfield, R. R. (2016). Big data need big theory too. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2080), 20160153. https://doi.org/10.1098/rsta.2016.0153
Recommended reading
- Quijano-Sánchez, L., Liberatore, F., Camacho-Collados, J., & Camacho-Collados, M. (2018). Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police. Knowledge-Based Systems, 149, 155–168. https://doi.org/10.1016/j.knosys.2018.03.010
- Kadar, C., & Pletikosa, I. (2018). Mining large-scale human mobility data for long-term crime prediction. EPJ Data Science, 7(1), 26. https://doi.org/10.1140/epjds/s13688-018-0150-z
- Burnap, P., & Williams, M. L. (2016). Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Science, 5(1), 11. https://doi.org/10.1140/epjds/s13688-016-0072-6
Week 9: RECAP, CASE STUDIES, PEER-FEEDBACK
Module convenor and author: Bennett Kleinberg (bennett.kleinberg@ucl.ac.uk)
Department of Security and Crime Science, UCL