GENERATING REPRESENTATIVE SETS AND SUMMARIES FOR LARGE COLLECTION OF IMAGES: EVALUATION (3)

Human Based Evaluation

In this section we provide the human based evaluation results with the statistical evaluation. We start analyze the participants rating and then the number of images inside the summary. First we evaluate all different summaries individually and then the common observation. Once again we recall that we evaluate six different summaries with different number of images inside. We use three random window results namely I5-66, I3-75 and I5-85 and three sequential window results namely S3-66, S5-75 and S5-85.

Figure 8 shows the participants rating summary of 10 images. One can observe that the worst result with the highest votes is I5-66. Similarly for the bad, medium and good results with the highest votes are I5-85, I3-75 and S3-66 respectively. The most excellent result is S5-85.


Fig8Generating Representative_decrypted
Figure 8 : Participant ratings for summary of 10-images

Figure 9 shows the participants rating summary of 15 images. One can observe that the worst result with the highest votes is I5-66. Similarly for the bad, medium and good results with the highest votes are I5-66, I3-75 and S5-75 respectively. The most excellent result is again S5-85 for the summary of 15 images.
Fig9Generating Representative_decrypted
Figure 9 : Participant ratings for summary of 15-images

Figure 10 shows the participants rating summary of 20 images. One can observe that the worst result with the highest votes is again I5-66 which similar to 10 and 15 images results. Similarly for the bad, I3-75 and S5-75 have the highest and same votes. The medium results with the highest and same votes are I5-66, I3-75 and S3-66. The good and excellent with the highest votes is S5-85.
Fig10Generating Representative_decrypted
Figure 10 : Participant ratings for summary of 20-images

General observation and points about the summary of 10,15 and 20 images are as following :

• Random window result sets got negative raing with the high portion of votes as worse, bad and medium while sequence window result sets got positive rating with the high portion of the votes in meduim, good and excellent.

• If we focus random windows and sequence windows differently, we observe one common thing.When the coverage increase, the result sets with higer coverage secures good and postive ratings.

To undertand these two general observations, we have calculated scores for each summary. So the calculation is based on the evaluators rating. For the ratting worse,bad, medium, good and excellent, we assinged integer number 1,2,3,4 and 5 respectively.Now we devised a formula for claculating score of each result sets.

The formula is as following:

Total score = [((NE_worse *1) + (NE_bad *2)+ (NE_medium *3)+ (NE_good*4)+ (NE_excellent*5) ) * 100 ] / Total NE

NE: Number of Evaluators who voted.

Total score for each result set is a calculation for number of evaluators votes for each caterogy multiply the assigned integer number and again multiply 100 and the value is divided by total number of evaluators voted. Well the reason of multiplying with 100 and division by total number of eveluators voted for the particular set is to achieve unique scoring parttern. As we ask users to look summaries with 15 or 20, they might be satisfied with summeries 10 or 15 images .So, the total number of eveluators votes of summeries 15 and 20 images could be less than the summary of 10 images.