Tuesday, March 27, 2012

Manga Mania: Of Old Boots and Live Fish

What happens when you download a million pages of manga and start playing with them using some very sophisticated computer equipment?

Exploring one million manga pages on the 287 megapixel HIPerSpace

Frankly, that’s not at all clear to me. But it sure would be fun to play around with the analytical toys created by the Software Studies Initiative at the University of California – San Diego. Here’s the main project page, with a brief description. The page also contains a link to an article, which I’ve downloaded but not read, and a slide presentation, which I have examined. The presentation's worth a look, and perhaps more than that depending on your interests, temperament, and conceptual style.

It has the feel of a fishing expedition, a fairly well-funded expedition with lots of nice gear and some pretty clever fisherman. The thing about fishing expeditions, though, is you don’t know what you’re going to get, an old rubber boot or a live fish. They seemed to have landed both, but can’t tell the difference.

Here’s one of their visualizations, of style space:

manga.pages.1_million.Xstdev_Yentropy.jpeg_medium

Each point is a single page. The X axis is identified as “standard deviation,” which has to do with how many grey values there are on a page. “The pages on the left progressively have fewer grey values; the pages on the right have a both black and white.” The Y axis is associated with entropy, which is glossed as indicating the “presence of textures which likely also means more detail, more realism and more production labor.”

OK, so that’s style space as structured by two visual features. Here’s what they say about that (from the web page):
What do we learn from this visualization? It suggests that the very concept of style as it is normally employed becomes problematic then we start analyzing large cultural data sets. The concept assumes that we can partition a set of works into a small number of discrete categories. However, if we find a very large number of variations with very small differences between them (such as in this case of 1 million Manga pages), it is no longer possible to talk about "style" of these works. Instead, it is better to use visualization and/or mathematical models to describe the space of possible and realized variations.
Well . . . . I’m not sure that any skilled and experienced analyst of art believes that “we can partition a set of works into a small number of discrete [stylistic] categories.” Though we often write as though we believe such a thing, we also acknowledge that these are very fuzzy categories and not discrete at all. We've all read Wittgenstein, or someone who's read him, and know about family resemblance. Given that I’m all but willing to grant that final sentence for nothing.

But I’m not willing simply to toss the notion of style, as they seem to suggest. Perhaps style is best measured on 193 independent dimensions rather than two. The effect of projecting style measurements into only two dimension is to loose most of the differences between styles.

In fact, they’ve used a supercomputer to score each page on eight dimensions, listed (but not explained) in the slide show as: Brightness mean, Std, Entropy, Sobel (the amount of edges found), Contrast, Correlation, Energy, and Homogeneity. Visualizing distributions in eight dimensions is, however, something of a challenge.

But they don’t really toss the notion at all. They keep it around and use it in various ways in their slide presentation as they discuss gender difference, user-supplied genre tags, features of specific titles, and this and that. There’s something going on here, but just what, well, they’ve got a bit of work to do in order to sort out the old rubber boots from the live fish. Right now they seem to be in the thick of it, just pulling out whatever their lines snag. One day, though, I expect them to have a tasty fish fry.

3 comments:

  1. I had a similar set of doubts when I read over their work. I was perturbed in particular by the lack of definitions, specifically of "entropy" -- it is NOT obvious to me that entropy has anything to do with realism. Nor was I clear about that they expected or wanted to find. But, like fishing with a bent pin in a canal, sooner or later one will haul in something -- maybe an old boot, maybe a catfish, maybe, well, who knows what.

    Tim Perper

    ReplyDelete
    Replies
    1. Yeah, I'd have liked a definition of entropy as well, and a discussion. I assume they use that term because their image processing routine takes that mathematical form as defined of color values for pixels. But I'd like to have a much better idea what it means. The slide show did have a pair of slides, one identified as displaying high entropy pages and the other having low entropy pages and, yes, there was a visible difference. The high entropy pages were more richly textured, but there's more to realism than that.

      Delete