Discussion:
Multithreading support
Michael Darling
2014-04-16 02:23:33 UTC
Permalink
Any thoughts of making Argyll threaded?

In a many hour colprof run. (I know the docs say -qu over -qh doesn't do
anything but run slower, but I want to see it.)

I know speeding up -qu must not be high on the priority list, but this
(presumably) would speed up the other modes as well.

It's bouncing between 12% and 13% CPU usage. 8 logical cores on my
machine, so looks like it isn't threaded. I'm assuming the operation here
has parallel elements to benefit from threading, but I could of course be
wrong.

The long stage is the "There are 2 rev cache instances with 10241 Mbytes
limit" part.

Also, 48GB of memory and it's only using 1.4GB. Not sure if using more
would make a difference, or exactly what the rev cache Mbytes limit is all
about.
Vladimir Gajic
2014-04-16 09:28:24 UTC
Permalink
Hello Michael,

check the Argyll docs (performance tuning). I had the same problem some
time ago. There are two parameters you can adjust to increase performance:

ARGYLL_REV_ACC_GRID_RES_MULT and ARGYLL_REV_CACHE_MULT

Setting the ARGYLL_REV_ACC_GRID_RES_MULT to a value of 2 helped me to
increase the performance to approx. 200%. This parameter refers to the
B2A-table creation.

Regards
Vladimir
Post by Michael Darling
Any thoughts of making Argyll threaded?
In a many hour colprof run. (I know the docs say -qu over -qh doesn't do
anything but run slower, but I want to see it.)
I know speeding up -qu must not be high on the priority list, but this
(presumably) would speed up the other modes as well.
It's bouncing between 12% and 13% CPU usage. 8 logical cores on my
machine, so looks like it isn't threaded. I'm assuming the operation here
has parallel elements to benefit from threading, but I could of course be
wrong.
The long stage is the "There are 2 rev cache instances with 10241 Mbytes
limit" part.
Also, 48GB of memory and it's only using 1.4GB. Not sure if using more
would make a difference, or exactly what the rev cache Mbytes limit is all
about.
--
Vladimir Gajic

[fineART imaging]
GrÃŒnebergstr. 85
22763 Hamburg
Germany
Phone: +4940 18174180
GSM: +49176 63645405
mailto: vgajic67-gM/Ye1E23mwN+***@public.gmane.org
Graeme Gill
2014-04-21 07:50:53 UTC
Permalink
Post by Michael Darling
Any thoughts of making Argyll threaded?
Hi,
well it's certainly possible - lots of things in the
profiling process are highly parallel in nature.
Post by Michael Darling
It's bouncing between 12% and 13% CPU usage. 8 logical cores on my
machine, so looks like it isn't threaded. I'm assuming the operation here
has parallel elements to benefit from threading, but I could of course be
wrong.
If you are talking about the B2A lookup step, then there are two parts,
setting up the acceleration structure and actually doing the inverse
lookups. I haven't really thought about what's involved in threading
the former. Threading the latter is complicated by the computation cache
- updates to the cache would have to be protected, and it might be a challenge
to make sure that this didn't become the bottleneck. Coordinating
threads could be interesting too - there would be quite different execution
times, and what do you do about grid points that depend on a cache entry that is
in the process of being computed ? - ie. how do you not end up starving
threads ? etc.

I have no plans to try and do such work though - I currently have many much
more urgent tasks.

Graeme Gill.
Ron Suessmann
2014-04-21 07:51:43 UTC
Permalink
Hello,

thank you for your message. Please note, that I´m actually out of office. Your message won´t be forwarded. After my return I will answer soon as I can.



Best regards,
Ron Süßmann

Loading...