Name: Programming Computer Vision with Python
Availability: InStock
Author: Jan Erik Solem

Computer vision is a core area of computing focused on enabling machines to interpret and analyze images and video. With the rapid growth of digital photography, social networks, search engines, and camera-equipped devices, the ability to process and understand visual data has become essential across domains such as robotics, medical imaging, image search, and photo management.

Python has emerged as a widely used language in scientific computing and data analysis, offering a rich ecosystem of open-source libraries for numerical computation, image processing, and machine learning. A practical introduction to computer vision using Python provides both technical depth and accessibility for learners and practitioners.

About the book

Programming Computer Vision with Python by Jan Erik Solem is a practical introduction to computer vision using the Python programming language. The book is designed as an accessible, hands-on entry point into the field, while still presenting sufficient theoretical and algorithmic foundations to support further study and experimentation.

The text emphasizes an exploratory approach. Readers are encouraged to follow examples directly on their own computers. All code is presented and explained, allowing readers to reproduce results and extend the examples independently.

The book is intended for students, researchers, and enthusiasts who want to understand how computer vision algorithms work and how to implement them using Python and its scientific libraries. Familiarity with Python and basic numerical computing concepts is expected, as the book makes use of libraries such as NumPy, SciPy, Matplotlib, PIL, and OpenCV.

This edition is a pre-production draft released under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.

What you will learn

Readers will gain practical experience implementing and understanding core computer vision techniques, including:

Image handling and processing using Python libraries such as PIL, NumPy, SciPy, and Matplotlib.
Feature detection and description, including Harris corner detection and SIFT (Scale-Invariant Feature Transform), as well as image matching.
Geometric transformations such as homographies, image warping, and panorama creation.
Camera models, camera calibration, pose estimation, and augmented reality fundamentals.
Multiple-view geometry concepts such as epipolar geometry, 3D structure computation, stereo images, and reconstruction.
Image clustering and classification using methods such as K-means, hierarchical clustering, spectral clustering, k-nearest neighbors, Bayes classifiers, and support vector machines.
Image search techniques, including content-based image retrieval, visual words, indexing, and ranking.
Image segmentation methods such as graph cuts, clustering-based segmentation, and variational methods.
Practical use of OpenCV with Python, including video processing and object tracking.

The book also includes appendices on installing required packages and working with image datasets.

Preface
Prerequisites and Overview
Introduction to Computer Vision
Python and NumPy
Notation and Conventions
Acknowledgments
1 Basic Image Handling and Processing
- 1.1 PIL – the Python Imaging Library
- 1.2 Matplotlib
- 1.3 NumPy
- 1.4 SciPy
- 1.5 Advanced example: Image de-noising
2 Local Image Descriptors
- 2.1 Harris corner detector
- 2.2 SIFT – Scale-Invariant Feature Transform
- 2.3 Matching Geotagged Images
3 Image to Image Mappings
- 3.1 Homographies
- 3.2 Warping images
- 3.3 Creating Panoramas
4 Camera Models and Augmented Reality
- 4.1 The Pin-hole Camera Model
- 4.2 Camera Calibration
- 4.3 Pose Estimation from Planes and Markers
- 4.4 Augmented Reality
5 Multiple View Geometry
- 5.1 Epipolar Geometry
- 5.2 Computing with Cameras and 3D Structure
- 5.3 Multiple View Reconstruction
- 5.4 Stereo Images
6 Clustering Images
- 6.1 K-means Clustering
- 6.2 Hierarchical Clustering
- 6.3 Spectral Clustering
7 Searching Images
- 7.1 Content-based Image Retrieval
- 7.2 Visual Words
- 7.3 Indexing Images
- 7.4 Searching the Database for Images
- 7.5 Ranking Results using Geometry
- 7.6 Building Demos and Web Applications
8 Classifying Image Content
- 8.1 K-Nearest Neighbors
- 8.2 Bayes Classifier
- 8.3 Support Vector Machines
- 8.4 Optical Character Recognition
9 Image Segmentation
- 9.1 Graph Cuts
- 9.2 Segmentation using Clustering
- 9.3 Variational Methods
10 OpenCV
- 10.1 The OpenCV Python Interface
- 10.2 OpenCV Basics
- 10.3 Processing Video
- 10.4 Tracking
- 10.5 More Examples
A Installing Packages
- A.1 NumPy and SciPy
- A.2 Matplotlib
- A.3 PIL
- A.4 LibSVM
- A.5 OpenCV
- A.6 VLFeat
- A.7 PyGame
- A.8 PyOpenGL
- A.9 Pydot
- A.10 Python-graph
- A.11 Simplejson
- A.12 PySQLite
- A.13 CherryPy
B Image Datasets
- B.1 Flickr
- B.2 Panoramio
- B.3 Oxford Visual Geometry Group
- B.4 University of Kentucky Recognition Benchmark Images
- B.5 Other
C Image Credits

Book details

Title: Programming Computer Vision with Python
Author(s): Jan Erik Solem
Main category: Artificial Intelligence
Subcategory: Computer Vision
Language: English
License: Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License

More books in: Artificial Intelligence, Computer Vision

Legal notice: This book is shared for educational purposes only. The content is distributed under Creative Commons licenses or with explicit permission from the author. FreeProgrammingBooks may host files that comply with their respective licenses.

About the book

What you will learn

Table of contents

Book details