Creating an image corpus

Project goal

An image corpus is developed for a specific task such as evaluating image classification algorithms. An image corpus is such a standard dataset which helps to uniformly, objectively, and consistently evaluate image classification algorithms. In addition to raw image files, the corpus will provide meta-data about each image. For example, the class to which an image belongs to is one piece of meta-data.

Images in the corpus share certain characteristics that are same across all the images. For instance, they will all have the same size (i.e., same number of rows and columns), are either color or gray scale images, are consistently named, and each image is associated with certain meta data, if any – date of acquisition, modality of acquisition, copyright owner, and image annotation in natural language text.

The Modified National Institute of Standards and Technology (MNIST) dataset is comprised of 60,000 small square \(28 \times 28\) pixel grayscale images of handwritten decimal digits (\(0,1, \ldots, 9\)). ImageNet is another large-scale dataset for benchmarking image classification and object recognition algorithms. ImageNet features over 14 million images. Each image is manually annotated to indicate what objects are present in it. Also, object bounding boxes are provided for over one million images.

Project description

Choose an image processing/computer vision task, for example, histogram equalization. Consider teaching histogram equalization to a learner who never even heard this phrase. Brainstorm what kinds of pedagogical aids would help to effectively teach and learn histogram equalization? Develop such an image corpus. Approach this task from a learner’s perspective.

Here is a process that will help to determine the types of images that should be included in the corpus. What is image histogram equalization? It is an image enhancement technique to improve the visual quality of images. It essentially performs contrast enhancement. To illustrate contrast enhancement, we need images that are underexposed as well as overexposed. How many of each kind do we need? Should we include perfect images and underexposed and overexposed variants of these images? How much underexposure? How much overexposure? Should the degree underexposure/overexposure falls within a predefined range of values? If there are multiple algorithms available for histogram equalization, how does this image corpus help to distinguish between “good” and “not so good” algorithms?

Based on the above considerations, designing an image corpus for pedagogical purposes is an open-ended/divergent problem. Your team need to carefully consider various facets of the pedagogical task and describe the justification for your decisions.

How many images should be included in the corpus? It is difficult to specify an absolute number. The number of images depends on how many nuances are there for the topic under consideration.

Each team should work on a different image processing/computer vision task. Please post your selection of the task to the entire class so that other teams are aware of your team’s task. This helps to eliminate the possibility of two teams working on the same vision task.

There are numerous computer vision tasks for your team to consider including different types of affine transforms (aspect ratio preserving image enlargement/shrinking, rotation, shear, image registration, camera calibration), various types of filters for removing different kinds of noise using spatial and frequency domain techniques), image segmentation, boundary detection, image classification, and morphological image processing.

Your image corpus should not contain images that have copyright restrictions. Your image corpus will become an open-source dataset. With the ubiquity of high-quality digital cameras, you can easily create your own images.

Submitting your solution

Your response to this team project is simply sharing the link to your image corpus with the course TA. Post this link as response to this project on the Canvas system. In addition to the images, your corpus should provide meta-data about the images. Also, a description of how you designed and developed the corpus as well as how the corpus can be used.

Reflecting on the learning experience and teamwork

Reflect on your solution to the problem and the learning experience through this project. Trust building, connectedness, and psychological safety are the foundational elements of teamwork. Reflect on the team dynamic experienced in this project.

Team member contribution/effort assessment

Use the following rubric to rate yourself and your teams members on a scale of 1 to 5 about their individual contribution to the project (a rating of 1 being poor and 5 being outstanding). Also, please provide rationale for each rating.

Reflecting on the learning experience and teamwork

Reflect on your solution to the problem and the learning experience through this project. Trust building, connectedness, and psychological safety are the foundational elements of teamwork. Reflect on the team dynamic experienced in this project.

Team member contribution/effort assessment

Use the following rubric to rate yourself and your teams members on a scale of 1 to 5 about their individual contribution to the project (a rating of 1 being poor and 5 being outstanding). Also, please provide rationale for each rating.

Self-assessment Assess team member 1 Assess team member 2
Responsibilities/ Performance Rating Rationale Rating Rationale Rating Rationale
Attended classes and group meetings
Initiated communication and interaction with group members
Contributed and committed to the group project
Provided comments and feedback in group work
Demonstrated a positive attitude towards to the project
Any other comments


Back to course home