Computer vision is a core area of computing focused on enabling machines to interpret and analyze images and video. With the rapid growth of digital photography, social networks, search engines, and camera-equipped devices, the ability to process and understand visual data has become essential across domains such as robotics, medical imaging, image search, and photo management.
Python has emerged as a widely used language in scientific computing and data analysis, offering a rich ecosystem of open-source libraries for numerical computation, image processing, and machine learning. A practical introduction to computer vision using Python provides both technical depth and accessibility for learners and practitioners.
About the book
Programming Computer Vision with Python by Jan Erik Solem is a practical introduction to computer vision using the Python programming language. The book is designed as an accessible, hands-on entry point into the field, while still presenting sufficient theoretical and algorithmic foundations to support further study and experimentation.
The text emphasizes an exploratory approach. Readers are encouraged to follow examples directly on their own computers. All code is presented and explained, allowing readers to reproduce results and extend the examples independently.
The book is intended for students, researchers, and enthusiasts who want to understand how computer vision algorithms work and how to implement them using Python and its scientific libraries. Familiarity with Python and basic numerical computing concepts is expected, as the book makes use of libraries such as NumPy, SciPy, Matplotlib, PIL, and OpenCV.
This edition is a pre-production draft released under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
What you will learn
Readers will gain practical experience implementing and understanding core computer vision techniques, including:
- Image handling and processing using Python libraries such as PIL, NumPy, SciPy, and Matplotlib.
- Feature detection and description, including Harris corner detection and SIFT (Scale-Invariant Feature Transform), as well as image matching.
- Geometric transformations such as homographies, image warping, and panorama creation.
- Camera models, camera calibration, pose estimation, and augmented reality fundamentals.
- Multiple-view geometry concepts such as epipolar geometry, 3D structure computation, stereo images, and reconstruction.
- Image clustering and classification using methods such as K-means, hierarchical clustering, spectral clustering, k-nearest neighbors, Bayes classifiers, and support vector machines.
- Image search techniques, including content-based image retrieval, visual words, indexing, and ranking.
- Image segmentation methods such as graph cuts, clustering-based segmentation, and variational methods.
- Practical use of OpenCV with Python, including video processing and object tracking.
The book also includes appendices on installing required packages and working with image datasets.
Table of contents
- Preface
- Prerequisites and Overview
- Introduction to Computer Vision
- Python and NumPy
- Notation and Conventions
- Acknowledgments
- 1 Basic Image Handling and Processing
- 1.1 PIL – the Python Imaging Library
- 1.2 Matplotlib
- 1.3 NumPy
- 1.4 SciPy
- 1.5 Advanced example: Image de-noising
- 2 Local Image Descriptors
- 2.1 Harris corner detector
- 2.2 SIFT – Scale-Invariant Feature Transform
- 2.3 Matching Geotagged Images
- 3 Image to Image Mappings
- 3.1 Homographies
- 3.2 Warping images
- 3.3 Creating Panoramas
- 4 Camera Models and Augmented Reality
- 4.1 The Pin-hole Camera Model
- 4.2 Camera Calibration
- 4.3 Pose Estimation from Planes and Markers
- 4.4 Augmented Reality
- 5 Multiple View Geometry
- 5.1 Epipolar Geometry
- 5.2 Computing with Cameras and 3D Structure
- 5.3 Multiple View Reconstruction
- 5.4 Stereo Images
- 6 Clustering Images
- 6.1 K-means Clustering
- 6.2 Hierarchical Clustering
- 6.3 Spectral Clustering
- 7 Searching Images
- 7.1 Content-based Image Retrieval
- 7.2 Visual Words
- 7.3 Indexing Images
- 7.4 Searching the Database for Images
- 7.5 Ranking Results using Geometry
- 7.6 Building Demos and Web Applications
- 8 Classifying Image Content
- 8.1 K-Nearest Neighbors
- 8.2 Bayes Classifier
- 8.3 Support Vector Machines
- 8.4 Optical Character Recognition
- 9 Image Segmentation
- 9.1 Graph Cuts
- 9.2 Segmentation using Clustering
- 9.3 Variational Methods
- 10 OpenCV
- 10.1 The OpenCV Python Interface
- 10.2 OpenCV Basics
- 10.3 Processing Video
- 10.4 Tracking
- 10.5 More Examples
- A Installing Packages
- A.1 NumPy and SciPy
- A.2 Matplotlib
- A.3 PIL
- A.4 LibSVM
- A.5 OpenCV
- A.6 VLFeat
- A.7 PyGame
- A.8 PyOpenGL
- A.9 Pydot
- A.10 Python-graph
- A.11 Simplejson
- A.12 PySQLite
- A.13 CherryPy
- B Image Datasets
- B.1 Flickr
- B.2 Panoramio
- B.3 Oxford Visual Geometry Group
- B.4 University of Kentucky Recognition Benchmark Images
- B.5 Other
- C Image Credits
Book details
- Title: Programming Computer Vision with Python
- Author(s): Jan Erik Solem
- Main category: Artificial Intelligence
- Subcategory: Computer Vision
- Language: English
- License: Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License
More books in: Artificial Intelligence, Computer Vision
Legal notice: This book is shared for educational purposes only. The content is distributed under Creative Commons licenses or with explicit permission from the author. FreeProgrammingBooks may host files that comply with their respective licenses.