Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsGeneralPHPASPPerlColdFusionFlashHTML, CSS, ScriptsBrowsers

Webmaster Forum / Perl / Modules / December 2006



Tip: Looking for answers? Try searching our database.

Accomodate for poor db design using Sort::Maker?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
DJ Stunks - 09 Dec 2006 05:48 GMT
Hi all,  topical question here regarding Sort::Maker

I have a poorly designed db table over which I have no control.  The
data isn't normalized but for each row I'd like to return the column
numbers which have the highest number of counts, and the highest column
number with any counts at all.  I refer to these values as mode and max
respectively, but please refer to my code and it's output below for a
better understanding.  (I have provided sample input data and mimicked
the fetchrow_hashref function provided by DBI)

Both sorts seem klunky, and after the discussion about the efficiency
gains given by Sort::Maker in another thread I'd like to take advantage
if I can (the sorts must be performed for each row returned by the
database).

I'm not sure how to pass in the information in the $hashref to
make_sorter.  Advice would be greatly appreciated.

TIA,
-jp

Code:
 #!/usr/bin/perl

 use strict;
 use warnings;

 print "id\tmode\tmax\n";
 while ( my $hashref = fetchrow_hashref() ) {
     my $id   = $hashref->{id};

     # mode = col number with greatest count
     my $mode = ( map  { $_->[0] }
                 sort { $b->[1] <=> $a->[1] }
                 map  { [ $_, $hashref->{"col$_"} ] } 1..5 )[0];

     # max = highest col number with any count
     my $max  = ( sort { $b <=> $a }
                 map  { $_ if $hashref->{"col$_"} > 0 } 1..5 )[0];

     $max = 1 if $max eq '';   #all zeros

     print "$id\t$mode\t$max\n";
 }

 { my @columns;

   sub fetchrow_hashref {

     # silence warning that $line may be equal to 0 below
     no warnings 'misc';
     my $line = <DATA> or return;

     if ( $. == 1 ) {
         @columns = split ' ', substr $line, 1;
         return fetchrow_hashref();
     }

     my %hash;
     @hash{ @columns } = split ' ', $line;

     return \%hash;
 } }

 __DATA__
 #id     col1    col2    col3    col4    col5
 1       2       8       8       0       0
 2       5       0       0       0       0
 3       3       2       8       2       5
 4       4       3       8       2       0
 5       5       7       7       0       0
 6       9       5       6       0       0
 7       6       4       5       7       0
 8       2       3       0       0       0
 9       9       2       3       3       5
 10      0       4       1       0       0
 11      0       0       0       0       0
 12      1       1       1       1       1

Output:
 id      mode    max
 1       2       3
 2       1       1
 3       3       5
 4       3       4
 5       2       3
 6       1       3
 7       4       4
 8       2       2
 9       1       5
 10      2       3
 11      1       1
 12      1       5
John W. Krahn - 09 Dec 2006 17:05 GMT
> Hi all,  topical question here regarding Sort::Maker
>
[quoted text clipped - 37 lines]
>
>       $max = 1 if $max eq '';   #all zeros

If you want efficiency then you should just use loops instead of sorting:

   my ( $mode, $curr ) = ( 1, 0 );
   for ( 1 .. 5 ) {
       if ( $hashref->{ "col$_" } > $curr ) {
           $curr = $hashref->{ "col$_" };
           $mode = $_;
           }
       }

   my $max = 1;
   for ( reverse 1 .. 5 ) {
       if ( $hashref->{ "col$_" } > 0 ) {
           $max = $_;
           last;
           }
       }

John
Signature

Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order.       -- Larry Wall

harryfmudd [AT] comcast [DOT] net - 09 Dec 2006 19:42 GMT
>>Hi all,  topical question here regarding Sort::Maker
>>
[quoted text clipped - 57 lines]
>
> John

Have you looked at List::Util? It's a standard module in Perl 5.8, and
contains a max function. So the first loop becomes something like

use List::Util qw{max};

my $mode = max (map {$hashref->{"col$_"}} 1..5);

Assuming the only zeros are trailing, the second loop could be

my $max = grep {$_ > 0} map {$hashref->{"col$_"}} 1..5

since the value of a list in scalar context is the number of elements in
the list. If you have zeros embedded, this won't give the right answer,
since what it really does is find the number of values > 0, not the
highest index with a value > 0.

Of course, for either of these you might want to make the hashref into a
real list, to avoid mapping twice.

Tom Wyant
John W. Krahn - 10 Dec 2006 07:09 GMT
> Have you looked at List::Util? It's a standard module in Perl 5.8, and
> contains a max function. So the first loop becomes something like
[quoted text clipped - 14 lines]
> Of course, for either of these you might want to make the hashref into a
> real list, to avoid mapping twice.

The OP's code produces:

id      mode    max
1       2       3
2       1       1
3       3       5
4       3       4
5       2       3
6       1       3
7       4       4
8       2       2
9       1       5
10      2       3
11      1       1
12      1       5

But your's produces:

id      mode    max
1       8       3
2       5       1
3       8       5
4       8       4
5       7       3
6       9       3
7       7       4
8       3       2
9       9       5
10      4       2
11      0       0
12      1       5

So neither $mode nor $max is correct for all lines.

John
Signature

Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order.       -- Larry Wall

harryfmudd [AT] comcast [DOT] net - 11 Dec 2006 17:43 GMT
>>Have you looked at List::Util? It's a standard module in Perl 5.8, and
>>contains a max function. So the first loop becomes something like
[quoted text clipped - 50 lines]
>
> John

My mistake, corrected by DJ Stunks in adjacent post; I was returning the
mode, rather than the index of the mode. Etc.

Tom Wyant
DJ Stunks - 10 Dec 2006 17:22 GMT
> >>for each row I'd like to return the column
> >>numbers which have the highest number of counts, and the highest column
[quoted text clipped - 44 lines]
>
> my $max = grep {$_ > 0} map {$hashref->{"col$_"}} 1..5

I appreciate the input, I don't think I was clear enough about what I
was looking for.  The counts themselves don't really matter, what I was
looking for was the column numbers associated with 1) highest count
($mode); and 2) the highest column number with any count at all ($max).

John's solution worked well.  I was only thinking of sort() because I
wanted largest of one thing with respect to another thing.  I couldn't
think of a way to get max() to operate like that without going through
all the columns twice (once to find the max, and a second time to find
which column it was associated with).

Thanks,
-jp
DJ Stunks - 10 Dec 2006 17:15 GMT
> > for each row I'd like to return the column
> > numbers which have the highest number of counts, and the highest column
> > number with any counts at all.  I refer to these values as mode and max
> > respectively

> >   print "id\tmode\tmax\n";
> >   while ( my $hashref = fetchrow_hashref() ) {
[quoted text clipped - 28 lines]
>             }
>         }

Thanks John, I knew something was nagging me about my approach, but
once you're in a mindset sometimes it's hard to break out.  Your right
of course, why sort all the elements when I'm only interested in one.

I altered your two loops to return both numbers in one loop (in
actuality there are 60 columns, not just 5, and the $mode and $max
usually occur below 30):

    my ( $max, $mode, $curr ) = ( 1,1,0 );
    for my $i ( 1 .. 5 ) {
        my $count = $hashref->{"col$i"};

        if ( $count > $curr ) {
            $curr = $count;
            $mode = $i;
        }
       
        if ( $count > 0 ) {
            $max = $i;
        }
    }

Thanks again,
-jp
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.