IPC:Shareable
|
|
Thread rating:  |
Snorik - 18 Sep 2008 14:55 GMT Hello everyone,
I am trying to speed up a few perl scripts by forking them. Unfortunately, I need to pass back to the parent. I am using named pipes at other places, but this time, I wanted to use shared memory (this being on *nix).
The first case is basically traversing a HUGE directory tree, looking for certain files and returning them.
The idea is to fork finds from a specific point of the directory tree, gather all the files in an array for each child process and then store that array as reference as value of the hash.
For this, I have done:
sub get_fbas_for_rg { my $rg = shift; my @children; use IPC::Shareable; use Data::Dumper; use constant MYGLUE => 'Test'; my %fba_hash; my $handle = tie (%fba_hash, IPC::Shareable, MYGLUE, {create => 1, mode => 0666}) or die "cannot tie to shared memory: $! \n";
my @ggs = qx (ls /default/main/www/$rg | grep -v STAGING | grep - v WORKAREA | grep -v EDITION);
foreach my $gg (@ggs) { chomp $gg; my $gg_fba = $gg."_FBAs";
my $pid = fork();
if ($pid) { push(@children, $pid); } elsif ($pid == 0) { my @fbas = (qx (/usr/bin/find /default/main/ www/$rg/$gg/WORKAREA/workarea/$gg_fba -type f )); $handle->shlock(); push (@{$fba_hash{$gg}}, @fbas); $handle->shunlock(); exit (0); } else { print STDERR "\nERROR: fork failed: $!\n"; } }
foreach (@children) { waitpid($_, 0); } return %fba_hash; }
Now, If I call this function, it seems to work fine, only the hash values contain only the scalars of the array, at least that is what Data Dumper tells me:
$VAR1 = { 'dir1' => 3858, 'dir2' => 2394, 'dir3' => 2075 };
This is what I do in the script:
my %fbas = TestPackage::get_fbas_for_rg("test");
print Dumper \%fbas;
foreach my $gg (keys %fbas) { print $gg."\n"; foreach my $fba (sort @{$fbas{$gg}}) { print $fba."\n"; } }
The foreach loop does not return anything (understandable since the hash value only contains the scalar of the array). Again, my question is: how do I manage to receive the actual array in the calling script instead of just a hash containing my designated keys and the sizes of the arrays as values?
I would be very grateful for some help.
Ben Morrow - 18 Sep 2008 16:18 GMT Quoth Snorik <clauskick@hotmail.com>:
> Hello everyone, > [quoted text clipped - 18 lines] > use IPC::Shareable; > use Data::Dumper; It's best not to 'use' modules inside a sub (except for lowercase pragmata which have lexical effect). It gives the false impression that the exporter subs are only available in that sub.
> use constant MYGLUE => 'Test'; > my %fba_hash; [quoted text clipped - 3 lines] > my @ggs = qx (ls /default/main/www/$rg | grep -v STAGING | grep - > v WORKAREA | grep -v EDITION); use File::Slurp qw/read_dir/;
my @ggs = grep !/EDITION/, grep !/WORKAREA/, grep !/STAGING/, read_dir "/default/main/www/$rg";
or
grep !/EDITION|WORKAREA|STAGING/,
of course.
> my @fbas = (qx (/usr/bin/find /default/main/ > www/$rg/$gg/WORKAREA/workarea/$gg_fba -type f )); I would use File::Find::Rule for this.
my @fbas = File::Find::Rule->file ->in("/default/main/www/$rg/$gg/WORKAREA/workarea/$gg_fba");
> $handle->shlock(); > push (@{$fba_hash{$gg}}, @fbas); You cannot assign a ref to an IPC::Shareable tied hash. The other process has no way of following that ref: it refers to data structures that aren't in shared memory. I would suggest using Storable:
use Storable qw/freeze/;
$fba_hash{$gg} = freeze \@fbas;
and then retrieve it with
use Storable qw/thaw/;
my @fbas = @{ thaw $fba_hash{$gg} };
Ben
 Signature Although few may originate a policy, we are all able to judge it. Pericles of Athens, c.430 B.C. ben@morrow.me.uk
Snorik - 19 Sep 2008 16:29 GMT > Quoth Snorik <clausk...@hotmail.com>: > [quoted text clipped - 24 lines] > pragmata which have lexical effect). It gives the false impression that > the exporter subs are only available in that sub. Ok, that is a useful remark, I will keep that in mind.
> > use constant MYGLUE => 'Test'; > > my %fba_hash; [quoted text clipped - 5 lines] > > use File::Slurp qw/read_dir/; *snip useage of File::Slurp*
Thanks for pointing that out, that module really helps working a lot! I never knew that existed.
> > my @fbas = (qx (/usr/bin/find /default/main/ > > www/$rg/$gg/WORKAREA/workarea/$gg_fba -type f )); [quoted text clipped - 3 lines] > my @fbas = File::Find::Rule->file > ->in("/default/main/www/$rg/$gg/WORKAREA/workarea/$gg_fba"); Again, thanks for pointing that out - This is so much more elegant than the normal File::Find way.
> > $handle->shlock(); > > push (@{$fba_hash{$gg}}, @fbas); > > You cannot assign a ref to an IPC::Shareable tied hash. The other > process has no way of following that ref: it refers to data structures > that aren't in shared memory. I would suggest using Storable: OK, so if I may rephrase in order to check whether I have actually understood: All that can be tied in that hash is the scalar of the array (its size), I cannot use it to follow a ref to the actual array.
> use Storable qw/freeze/; > [quoted text clipped - 5 lines] > > my @fbas = @{ thaw $fba_hash{$gg} }; I have a question concerning this (I just had a look at the Storable documentation, but this does not really clear things up):
So Storable persists (and of course serializes) any datastructure;that means I can store the hash to disk (or memory, hopefully memory?). How can I retrieve this in the calling script, as this sub is going to live in a module itself? I must admit, this is my first attempt at IPC myself.
I would be very grateful for an answer.
Snorik
Ben Morrow - 19 Sep 2008 17:21 GMT Quoth Snorik <clauskick@hotmail.com>:
> > You cannot assign a ref to an IPC::Shareable tied hash. The other > > process has no way of following that ref: it refers to data structures [quoted text clipped - 4 lines] > All that can be tied in that hash is the scalar of the array (its > size), I cannot use it to follow a ref to the actual array. Yes. I don't entirely understand why the value stored was scalar(@array): I would have expected it to be the stringification of the ref. I guess it's to do with how IPC::Shareable interprets its arguments.
> > use Storable qw/freeze/; > > [quoted text clipped - 11 lines] > So Storable persists (and of course serializes) any datastructure;that > means I can store the hash to disk (or memory, hopefully memory?). Yes. You use store/retrieve to save to and load from disk; you use freeze/thaw to save to and load from memory.
> How can I retrieve this in the calling script, as this sub is going to > live in a module itself? I must admit, this is my first attempt at IPC > myself. If you store it with 'freeze', you get it out again with 'thaw'.
Ben
 Signature I touch the fire and it freezes me, [ben@morrow.me.uk] I look into it and it's black. Why can't I feel? My skin should crack and peel--- I want the fire back... BtVS, 'Once More With Feeling'
Snorik - 19 Sep 2008 19:19 GMT > > So Storable persists (and of course serializes) any datastructure;that > > means I can store the hash to disk (or memory, hopefully memory?). > > Yes. You use store/retrieve to save to and load from disk; you use > freeze/thaw to save to and load from memory. Ok, thanks for that, I will read the documentation and actually try to understand it.
> > How can I retrieve this in the calling script, as this sub is going to > > live in a module itself? I must admit, this is my first attempt at IPC > > myself. > > If you store it with 'freeze', you get it out again with 'thaw'. Yes, I have understood that, but if I freeze a hash in one script, how can I thaw it in the other script? I do not have the reference? I tried to use a tied variable for that, figuring that this should work this time, but this failed unfortunately.
xhoster@gmail.com - 19 Sep 2008 19:32 GMT > > If you store it with 'freeze', you get it out again with 'thaw'. > > Yes, I have understood that, but if I freeze a hash in one script, how > can I thaw it in the other script? When you freeze, you get a serialized data, which is just a string. You pass that string to the other script using shared memory (or pipes).
> I do not have the reference? That is what thaw does. It makes a reference again out of the serialized data. Obviously it isn't the same reference, but deep copy of the referenced data.
Xho
 Signature -------------------- http://NewsReader.Com/ -------------------- The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Ted Zlatanov - 18 Sep 2008 16:19 GMT S> use IPC::Shareable;
Try IPC::ShareLite or even Tie::ShareLite (easiest, hash interface). They work better for me.
S> Now, If I call this function, it seems to work fine, only the hash S> values contain only the scalars of the array, at least that is what S> Data Dumper tells me:
S> $VAR1 = { S> 'dir1' => 3858, S> 'dir2' => 2394, S> 'dir3' => 2075 S> };
You're assigning @array to the hash value; the value can only be a scalar so you get the size of the array instead of its contents.
See the Tie::ShareLite docs, especially section 'REFERENCES,' for a better solution.
Ted
Snorik - 22 Sep 2008 13:30 GMT > S> use IPC::Shareable; > [quoted text clipped - 16 lines] > See the Tie::ShareLite docs, especially section 'REFERENCES,' for a > better solution. Hello,
ok, this works - however, it is very slow if I use a normal hash (the hash has about 10000 entries). I have yet to get it to work with hash references.
Thanks for your help!
Ted Zlatanov - 22 Sep 2008 15:11 GMT >> See the Tie::ShareLite docs, especially section 'REFERENCES,' for a >> better solution. S> ok, this works - however, it is very slow if I use a normal hash (the S> hash has about 10000 entries). S> I have yet to get it to work with hash references.
I've only used it with hashes of up to 1000 entries, but I'm surprised it's very slow. Can you show your code so we can see if the problem is in the module or in your code?
Ted
Snorik - 22 Sep 2008 15:57 GMT > >> See the Tie::ShareLite docs, especially section 'REFERENCES,' for a > >> better solution. [quoted text clipped - 6 lines] > it's very slow. Can you show your code so we can see if the problem is > in the module or in your code? Hello,
okay, it appears even a Solaris system needs a reboot some time - now it is pretty fast: 19 seconds for 16k entries (and that includes Data::Dumper printing out the hash and forking about 30 child processes).
================================ I do the following now:
if ($pid) { push(@children, $pid); } elsif ($pid == 0) { use File::Find::Rule; my @fbas = File::Find::Rule->file->in("/default/main/www/$rg/$gg/ WORKAREA/workarea/$gg_fba"); $ipc->lock(LOCK_EX); $shared{$gg} = \@fbas; $ipc->unlock(); exit (0); } else { print STDERR "\nERROR: fork failed: $!\n"; } }
foreach (@children) { waitpid($_, 0); } return %shared;
And in the calling script:
my %fba_ref = Package::get_fbas_for_rg("dir1"); print Dumper \%fba_ref;
================================= I am wondering: Do I even need the locks for the hash reference, does this lock the entire hash, or solely the key in question? Does this still go faster?
Snorik - 23 Sep 2008 10:56 GMT Ok, when testing with another directory tree, I saw the following:
IPC::Sha(in cleanup) IPC::ShareLite store() error: No space left on device at /opt/iw-home/iw-perl/site/lib/Tie/ShareLite.pm line 366
Then, on retry, it worked again - therefore my question: Do I have to clean up anything after I used it?
I tried using "destroy => 'yes'", but that yields "IPC::ShareLite fetch() error: Invalid argument at /opt/iw-home/iw-perl/site/lib/Tie/ ShareLite.pm line 342", telling me (not sure) that the object is destroyed too early.
Some explanation what to do for this would be great :)
Ted Zlatanov - 23 Sep 2008 14:23 GMT S> Ok, when testing with another directory tree, I saw the following: S> IPC::Sha(in cleanup) IPC::ShareLite store() error: No space left on S> device at /opt/iw-home/iw-perl/site/lib/Tie/ShareLite.pm line 366
S> Then, on retry, it worked again - therefore my question: Do I have to S> clean up anything after I used it?
S> I tried using "destroy => 'yes'", but that yields "IPC::ShareLite S> fetch() error: Invalid argument at /opt/iw-home/iw-perl/site/lib/Tie/ S> ShareLite.pm line 342", telling me (not sure) that the object is S> destroyed too early.
S> Some explanation what to do for this would be great :)
You need to increase your shared memory. This is different for every OS. For Solaris, see e.g. http://publib.boulder.ibm.com/tividd/td/ITAME/SC32-1351-00/en_US/HTML/am51_perft une34.htm
Give it at least 10 times the default, unless your system is short on memory.
Ted
Ted Zlatanov - 23 Sep 2008 14:27 GMT S> if ($pid) S> { S> push(@children, $pid); S> } S> elsif ($pid == 0) S> { S> use File::Find::Rule; S> my @fbas = File::Find::Rule->file->in("/default/main/www/$rg/$gg/ S> WORKAREA/workarea/$gg_fba"); S> $ipc->lock(LOCK_EX); S> $shared{$gg} = \@fbas; S> $ipc->unlock(); S> exit (0); S> } S> else S> { S> print STDERR "\nERROR: fork failed: $!\n"; S> } S> }
S> foreach (@children) S> { S> waitpid($_, 0); S> } S> return %shared;
S> And in the calling script:
S> my %fba_ref = Package::get_fbas_for_rg("dir1"); S> print Dumper \%fba_ref;
S> =================================
S> I am wondering: Do I even need the locks for the hash reference,
Yes.
S> does this lock the entire hash, or solely the key in question?
The whole hash.
S> Does this still go faster?
Than what? I'm not sure what we're measuring.
Your code looks OK. You should probably destroy the hash in the main process after the children are done. This is not showing all your code; your initialization should have an ID for the shared memory segment you're using. If you use the same ID, you will reuse the same space, so you don't have to destroy it if you'll be reusing it regularly. Sometimes people use the process ID $$ as the shared memory ID, and in that case you'll keep allocating more memory every time.
Ted
Snorik - 23 Sep 2008 15:42 GMT > S> if ($pid) > S> { [quoted text clipped - 43 lines] > Your code looks OK. You should probably destroy the hash in the main > process after the children are done. Ok, but how do I return it to the calling script otherwise? The idea of all of this is to encapsulate the action completely and just return something from the module without anyone having to bother what happens inside there.
>This is not showing all your code; > your initialization should have an ID for the shared memory segment > you're using. If you use the same ID, you will reuse the same space, so > you don't have to destroy it if you'll be reusing it regularly. I just noticed that reusing it means that for a second run with other parameters, I also receive part of the result from the first one. So, I better not keep it :)
For initialization, I just use the following:
my $ipc = tie %shared, 'Tie::ShareLite', -key=>1971, -mode=>0600, - create=>'yes', -destroy =>'no' or die("Could not tie to shared memory: $!")
> Sometimes people use the process ID $$ as the shared memory ID, and in > that case you'll keep allocating more memory every time. I do not want to keep it, how can I clean it up properly (again, hidden inside the module if possible) and still return a valid reference from the routine in the module?
Snorik - 23 Sep 2008 16:15 GMT > > S> if ($pid) > > S> { [quoted text clipped - 70 lines] > hidden inside the module if possible) and still return a valid > reference from the routine in the module? To illustrate, I want to do this (this is most of the calling script):
my %fba_ref = Package::get_fbas_for_rg("directory"); print Dumper \%fba_ref;
Trying to clear the memory like this (hardcoded id, I know, but if I clear it everytime I use it it should not be a problem - right?):
Package::cleanMemory(1971);
returns a "permission denied" error.
cleanMemory just looks like this:
161 { 162 my $handle = shift; 163 use IPC::ShareLite; 164 my $share = IPC::ShareLite->new( -key => $handle, - create => 'yes', -destroy => 'no') or die "caution: ".$!; 165 $share->destroy(); 166 }
I figured, I can just reuse the shared memory segment and just destroy it - but I seem to be too dense to get that.
Ted Zlatanov - 24 Sep 2008 19:51 GMT S> On Sep 23, 3:27 pm, Ted Zlatanov <t...@lifelogs.com> wrote:
>> This is not showing all your code; >> your initialization should have an ID for the shared memory segment >> you're using. If you use the same ID, you will reuse the same space, so >> you don't have to destroy it if you'll be reusing it regularly. S> I just noticed that reusing it means that for a second run with other S> parameters, I also receive part of the result from the first one. So, S> I better not keep it :)
S> For initialization, I just use the following:
S> my $ipc = tie %shared, 'Tie::ShareLite', -key=>1971, -mode=>0600, - S> create=>'yes', -destroy =>'no' or die("Could not tie to shared memory: S> $!")
Do this in the parent (the one which forks all the others) with -destroy => 'yes'
If you want to keep running children after the parent script is gone, you'll have to have a cleanup function or script, which is called before the next run happens. It's exactly like leaving files in /tmp after you're done processing them.
Ted
Snorik - 26 Sep 2008 14:28 GMT > S> On Sep 23, 3:27 pm, Ted Zlatanov <t...@lifelogs.com> wrote: > [quoted text clipped - 12 lines] > S> create=>'yes', -destroy =>'no' or die("Could not tie to shared memory: > S> $!") Ted, sorry it took me some time to respond...
> Do this in the parent (the one which forks all the others) with > -destroy => 'yes' I think I am doing this in the parent, that means before before the fork() occurs (which should be the parent):
my $ipc = tie %shared, 'Tie::ShareLite', -key=>1971, -mode=>0600, - create=>'yes', -destroy =>'yes' or die("Could not tie to shared memory: $!");
use File::Slurp qw/read_dir/; my @ggs = [... removed to shorten - it does return an array] foreach my $gg (@ggs) { chomp $gg; my $pid = fork(); if ($pid) { push(@children, $pid); } elsif ($pid == 0) {
use File::Find::Rule; [here stuff happens...]
I receive this for each element of the array IPC::ShareLite fetch() error: Invalid argument at [..]Tie/ShareLite.pm line 342
To me that says that the object is already destroyed?
*snip useful advice about child processes*
Thank you for that gem of useful information!
Ted Zlatanov - 26 Sep 2008 21:36 GMT Rather than explaining how to fix your example, here's a working program that shows how to use Tie::ShareLite. Each child will lock, write 'hello' to the key of its PID, then unlock. You should lock and unlock around every access to the tied hash. The parent waits for the children to finish and then prints the summary (also locking and unlocking, to protect from other processes that might be accessing that shared memory).
The get_ipc() function is just for convenience. Key 1971 is just an example from the Tie::ShareLite docs, you can use any value.
Note I clear %shared every time I start. It's not destroyed at the program's end. Setting destroy to yes in the parent doesn't work for me, and I didn't debug it (no time :)
Ted
#!/usr/bin/perl
use warnings; use strict; use Data::Dumper; use Tie::ShareLite;
my %shared; my @children;
my $ipc = get_ipc(1); %shared = ();
foreach (1..10) { my $pid = fork(); if ($pid) { push(@children, $pid); } elsif ($pid == 0) { my $ipc = get_ipc(); $ipc->lock(); $shared{$$} = 'hello'; $ipc->unlock(); exit; } }
waitpid $_,0 foreach @children;
$ipc->lock(); print Dumper \%shared; $ipc->unlock();
sub get_ipc { my $init = shift @_ ? 'yes' : 'no'; my $ipc = tie %shared, 'Tie::ShareLite', -key=>1971, -mode=>0600, -create=>$init, -destroy =>'no' or die("Could not tie to shared memory: $!"); return $ipc; }
Snorik - 29 Sep 2008 13:22 GMT > Rather than explaining how to fix your example, here's a working program > that shows how to use Tie::ShareLite. Each child will lock, write > 'hello' to the key of its PID, then unlock. You should lock and unlock > around every access to the tied hash. The parent waits for the children > to finish and then prints the summary (also locking and unlocking, to > protect from other processes that might be accessing that shared memory). OK.
> The get_ipc() function is just for convenience. Key 1971 is just an > example from the Tie::ShareLite docs, you can use any value. I know, I know.
> Note I clear %shared every time I start. Ok, I was not aware I could just do that. I thought I needed to actually remove the contents of the shared memory segment somehow.
>It's not destroyed at the > program's end. Setting destroy to yes in the parent doesn't work for > me, and I didn't debug it (no time :) I have the nagging feeling that this does not work. But I like your solution to that, it is very simple yet powerful.
Anyhow, stuff works like a charm now, thank you so much for your time, I owe you a beer :-)
Ted Zlatanov - 23 Sep 2008 14:29 GMT S> I do the following now:
S> if ($pid) S> { S> push(@children, $pid); S> } S> elsif ($pid == 0) S> { S> use File::Find::Rule; S> my @fbas = File::Find::Rule->file->in("/default/main/www/$rg/$gg/ S> WORKAREA/workarea/$gg_fba"); S> $ipc->lock(LOCK_EX); S> $shared{$gg} = \@fbas; S> $ipc->unlock(); S> exit (0); S> } S> else S> { S> print STDERR "\nERROR: fork failed: $!\n"; S> } S> }
By the way, I forgot to mention: you can just put the file list in a file, and have the file name in the shared memory. That way you get locking (only the process with the shared memory lock can open or write to files), your shared memory usage is low, and you don't have to worry about serializing large amounts of data. This is what I've used in the past for database loaders. It works well.
Ted
Snorik - 23 Sep 2008 15:37 GMT > S> I do the following now: > [quoted text clipped - 24 lines] > about serializing large amounts of data. This is what I've used in the > past for database loaders. It works well. That is a good idea and what I am going to do!
|
|
|