Performance#

Profile Decoder#

The default Python and Cython decoder can be profiled with Python’s standard cprofile. The output can be a sorted table and a flame graph. Both are generated below:

%%bash
python -m openpifpaf.predict coco/000000081988.jpg --no-download-progress --debug --profile-decoder
INFO:__main__:neural network device: cpu (CUDA available: False, count: 0)
INFO:__main__:Running Python 3.10.13
INFO:__main__:Running PyTorch 2.2.1+cpu
DEBUG:openpifpaf.show.painters:color connections = False, lw = 6, marker = 3
DEBUG:openpifpaf.network.factory:Shell(
  (base_net): ShuffleNetV2K(
    (input_block): Sequential(
      (0): Sequential(
        (0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(24, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
    (stage2): Sequential(
      (0): InvertedResidualK(
        (branch1): Sequential(
          (0): Conv2d(24, 24, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=24, bias=False)
          (1): BatchNorm2d(24, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Conv2d(24, 174, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (3): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (4): ReLU(inplace=True)
        )
        (branch2): Sequential(
          (0): Conv2d(24, 174, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(174, 174, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=174, bias=False)
          (4): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(174, 174, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (1): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(174, 174, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(174, 174, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=174, bias=False)
          (4): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(174, 174, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (2): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(174, 174, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(174, 174, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=174, bias=False)
          (4): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(174, 174, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (3): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(174, 174, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(174, 174, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=174, bias=False)
          (4): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(174, 174, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(174, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
    )
    (stage3): Sequential(
      (0): InvertedResidualK(
        (branch1): Sequential(
          (0): Conv2d(348, 348, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=348, bias=False)
          (1): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (3): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (4): ReLU(inplace=True)
        )
        (branch2): Sequential(
          (0): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(348, 348, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=348, bias=False)
          (4): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (1): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(348, 348, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=348, bias=False)
          (4): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (2): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(348, 348, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=348, bias=False)
          (4): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (3): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(348, 348, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=348, bias=False)
          (4): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (4): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(348, 348, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=348, bias=False)
          (4): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (5): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(348, 348, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=348, bias=False)
          (4): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (6): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(348, 348, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=348, bias=False)
          (4): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (7): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(348, 348, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=348, bias=False)
          (4): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(348, 348, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(348, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
    )
    (stage4): Sequential(
      (0): InvertedResidualK(
        (branch1): Sequential(
          (0): Conv2d(696, 696, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=696, bias=False)
          (1): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Conv2d(696, 696, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (3): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (4): ReLU(inplace=True)
        )
        (branch2): Sequential(
          (0): Conv2d(696, 696, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(696, 696, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=696, bias=False)
          (4): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(696, 696, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (1): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(696, 696, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(696, 696, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=696, bias=False)
          (4): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(696, 696, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (2): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(696, 696, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(696, 696, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=696, bias=False)
          (4): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(696, 696, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
      (3): InvertedResidualK(
        (branch2): Sequential(
          (0): Conv2d(696, 696, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Conv2d(696, 696, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=696, bias=False)
          (4): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (5): Conv2d(696, 696, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (6): BatchNorm2d(696, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (7): ReLU(inplace=True)
        )
      )
    )
    (conv5): Sequential(
      (0): Conv2d(1392, 1392, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(1392, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (head_nets): ModuleList(
    (0): CompositeField4(
      (dropout): Dropout2d(p=0.0, inplace=False)
      (conv): Conv2d(1392, 340, kernel_size=(1, 1), stride=(1, 1))
      (upsample_op): PixelShuffle(upscale_factor=2)
    )
    (1): CompositeField4(
      (dropout): Dropout2d(p=0.0, inplace=False)
      (conv): Conv2d(1392, 608, kernel_size=(1, 1), stride=(1, 1))
      (upsample_op): PixelShuffle(upscale_factor=2)
    )
  )
)
DEBUG:openpifpaf.decoder.factory:head names = ['cif', 'caf']
DEBUG:openpifpaf.signal:subscribe to eval_reset
DEBUG:openpifpaf.decoder.pose_similarity:valid keypoints = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
DEBUG:openpifpaf.visualizer.base:cif: indices = []
DEBUG:openpifpaf.show.painters:color connections = True, lw = 2, marker = 6
DEBUG:openpifpaf.show.painters:color connections = False, lw = 6, marker = 3
DEBUG:openpifpaf.visualizer.base:cif: indices = []
DEBUG:openpifpaf.visualizer.base:caf: indices = []
DEBUG:openpifpaf.show.painters:color connections = True, lw = 2, marker = 6
DEBUG:openpifpaf.show.painters:color connections = False, lw = 6, marker = 3
DEBUG:openpifpaf.visualizer.base:cif: indices = []
DEBUG:openpifpaf.show.painters:color connections = True, lw = 2, marker = 6
DEBUG:openpifpaf.show.painters:color connections = False, lw = 6, marker = 3
DEBUG:openpifpaf.visualizer.base:cif: indices = []
DEBUG:openpifpaf.visualizer.base:caf: indices = []
DEBUG:openpifpaf.show.painters:color connections = True, lw = 2, marker = 6
DEBUG:openpifpaf.show.painters:color connections = False, lw = 6, marker = 3
DEBUG:openpifpaf.decoder.factory:created 2 decoders
INFO:openpifpaf.decoder.factory:No specific decoder requested. Using the first one from:
  --decoder=cifcaf:0
  --decoder=posesimilarity:0
Use any of the above arguments to select one or multiple decoders and to suppress this message.
INFO:openpifpaf.predictor:neural network device: cpu (CUDA available: False, count: 0)
DEBUG:openpifpaf.transforms.pad:valid area before pad: [  0.   0. 639. 426.], image size = (640, 427)
DEBUG:openpifpaf.transforms.pad:pad with (0, 3, 1, 3)
DEBUG:openpifpaf.transforms.pad:valid area after pad: [  0.   3. 639. 426.], image size = (641, 433)
DEBUG:openpifpaf.decoder.decoder:nn processing time: 450.4ms
DEBUG:openpifpaf.decoder.decoder:parallel execution with worker <openpifpaf.decoder.decoder.DummyPool object at 0x7100ce5c19c0>
DEBUG:openpifpaf.decoder.multi:task 0
DEBUG:openpifpaf.decoder.cifcaf:cpp annotations = 5 (9.3ms)
INFO:openpifpaf.decoder.cifcaf:annotations 5: [16, 14, 13, 12, 12]
INFO:openpifpaf.profiler:writing profile file profile_decoder.prof
         366 function calls in 0.010 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.010    0.010    0.010    0.010 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/decoder/cifcaf.py:224(__call__)
        5    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/annotation.py:17(__init__)
       15    0.000    0.000    0.000    0.000 {method 'numpy' of 'torch._C.TensorBase' objects}
       10    0.000    0.000    0.000    0.000 {built-in method numpy.asarray}
       10    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/visualizer/cif.py:47(predicted)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:283(__init__)
        2    0.000    0.000    0.000    0.000 {method 'flush' of '_io.TextIOWrapper' objects}
        2    0.000    0.000    0.000    0.000 {method 'unbind' of 'torch._C.TensorBase' objects}
       10    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/numpy/core/fromnumeric.py:71(_wrapreduction)
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/decoder/cifcaf.py:276(<listcomp>)
       15    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/_tensor.py:1058(__array__)
       10    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/numpy/core/fromnumeric.py:2177(sum)
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/visualizer/caf.py:47(predicted)
        5    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/visualizer/base.py:114(indices)
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/_tensor.py:996(__len__)
       10    0.000    0.000    0.000    0.000 {built-in method numpy.zeros}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1724(isEnabledFor)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1549(findCaller)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:359(getMessage)
        5    0.000    0.000    0.000    0.000 {method 'tolist' of 'numpy.ndarray' objects}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/posixpath.py:52(normcase)
       24    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1680(callHandlers)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/torch/_tensor.py:1012(__iter__)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1088(emit)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1600(_log)
       19    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
       10    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/numpy/core/fromnumeric.py:72(<dictcomp>)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/posixpath.py:140(basename)
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1455(debug)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1077(flush)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:955(handle)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1585(makeRecord)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:431(_format)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/posixpath.py:117(splitext)
       15    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:219(_acquireLock)
        3    0.000    0.000    0.000    0.000 {built-in method torch._C._get_tracing_state}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1626(handle)
       16    0.000    0.000    0.000    0.000 {built-in method torch._C._has_torch_function_unary}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:665(format)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/genericpath.py:121(_splitext)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:160(<lambda>)
        4    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:919(release)
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/visualizer/cifhr.py:17(predicted)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:119(getLevelName)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1710(getEffectiveLevel)
        6    0.000    0.000    0.000    0.000 {method 'rfind' of 'str' objects}
        6    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.RLock' objects}
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1467(info)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:423(usesTime)
        4    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:806(filter)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:643(usesTime)
        4    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:912(acquire)
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/visualizer/cif.py:54(_confidences)
       10    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/numpy/core/fromnumeric.py:2172(_sum_dispatcher)
        6    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:932(format)
        2    0.000    0.000    0.000    0.000 {built-in method posix.getpid}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:228(_releaseLock)
        3    0.000    0.000    0.000    0.000 {method 'dim' of 'torch._C.TensorBase' objects}
       10    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/threading.py:1430(current_thread)
        5    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/visualizer/base.py:118(<listcomp>)
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/visualizer/cif.py:63(_regressions)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/threading.py:1129(name)
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/visualizer/caf.py:54(_confidences)
        2    0.000    0.000    0.000    0.000 {built-in method time.perf_counter}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:438(format)
        6    0.000    0.000    0.000    0.000 {method 'release' of '_thread.RLock' objects}
        4    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/headmeta.py:21(stride)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/posixpath.py:41(_get_sep)
        2    0.000    0.000    0.000    0.000 {built-in method sys._getframe}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:649(formatMessage)
        2    0.000    0.000    0.000    0.000 {method 'find' of 'str' objects}
        5    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/openpifpaf/visualizer/caf.py:65(_regressions)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/multiprocessing/process.py:37(current_process)
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/logging/__init__.py:1307(disable)
        2    0.000    0.000    0.000    0.000 {method 'write' of '_io.TextIOWrapper' objects}
        4    0.000    0.000    0.000    0.000 {built-in method _thread.get_ident}
        6    0.000    0.000    0.000    0.000 {built-in method posix.fspath}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
        2    0.000    0.000    0.000    0.000 /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/multiprocessing/process.py:189(name)
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



DEBUG:openpifpaf.decoder.decoder:time: nn = 450.7ms, dec = 11.9ms
INFO:openpifpaf.predictor:batch 0: coco/000000081988.jpg
/home/runner/work/openpifpaf/openpifpaf/src/openpifpaf/csrc/src/cif_hr.cpp:102: UserInfo: resizing cifhr buffer
/home/runner/work/openpifpaf/openpifpaf/src/openpifpaf/csrc/src/occupancy.cpp:53: UserInfo: resizing occupancy buffer
!flameprof profile_decoder.prof > profile_decoder_flame.svg

flame

There is a second output that is generated from the Autograd Profiler. This can only be viewed in the Chrome browser:

  • open chrome://tracing

  • click “Load” in the top left corner

  • select decoder_profile.1.json

This is the same type of plot that is used to trace the training of a batch. An example of such a plot is shown below.

Profile Training#

For a training batch, the Chrome trace looks like this:

train_trace