[KPhotoAlbum] Search performance (at least) linear in number of search terms?
rlk at alum.mit.edu
Mon Jul 10 02:00:21 CEST 2017
I'm reorganizing my database as follows:
I have traditionally applied a keyword to each set of photos. When I
select the photos I'm using, I apply a second keyword named "$keyword
selection" (for appropriate value of $keyword).
This led to a lot of clutter, so I've created a new keyword named "All
Selections", which I apply to the selected image. So I can select the
image I want by selecting the and of the keyword and All Selections.
I went about this via the search dialog, by selecting all of the
keywords named "$keyword selection" (something like 100 keywords,
IIRC), and then applying the new keyword to that. This procedure was
very time consuming; it took maybe a minute on my core i7-920XM. This
seems too long for 222000 images and 100 keywords (each image has no
more than two keywords, usually). I'm going through one kcachegrind
trace (which took me about 30 minutes to collect, using only 40
keywords) and haven't been able thus far to work out quite what's
going on in there. For reference, it made about 80 million calls each
to DB::ImageInfo::hasCategoryInfo(QString const&, QString const&) and
DB::ImageInfo::hasCategoryInfo(QString const&, QSet<QString> const&).
It appears that most of the work is done by intersecting the set of
tags applied to the image and the set of tags being searched for, but
the way this intersection is done results in a lot of string hashes
being calculated (as opposed to the hashes being cached).
Furthermore, since each keyword (or other category member) has a
unique integer assigned, the matching could be done by integer
comparisons. That wouldn't work if someone wanted to do a search on a
wildcard/regexp, but it appears we don't allow that anyway.
So I'm still trying to puzzle my way through this.
Robert Krawitz <rlk at alum.mit.edu>
*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net
"Linux doesn't dictate how I work, I dictate how Linux works."
More information about the KPhotoAlbum