Skip to content

feat(medcat): CU-869cgny1k Add pipe tokenizer speed#370

Open
mart-r wants to merge 7 commits intomainfrom
feat/medcat/CU-869cgny1k-add-pipe-tokenizer-speed
Open

feat(medcat): CU-869cgny1k Add pipe tokenizer speed#370
mart-r wants to merge 7 commits intomainfrom
feat/medcat/CU-869cgny1k-add-pipe-tokenizer-speed

Conversation

@mart-r
Copy link
Collaborator

@mart-r mart-r commented Mar 18, 2026

This PR extends the existing pipeline speed monitoring utilities to also track the tokenizer component, which was previously excluded from timing and profiling functionality.

There was also some improved protocolling and logging changes.
And added tests as well, of course.

Example code / usage
import logging

from medcat.components.types import CoreComponentType
from medcat.cat import CAT
from medcat.pipeline.speed_utils import pipeline_per_doc_timer, pipeline_timer_averaging_docs, profile_pipeline_component

path = ".temp/20230227__kch_gstt_trained_model_f76d2121b77c3e9a.zip"

print("Loading")
cat = CAT.load_model_pack(path)
print("RUN!")

texts = [
  """
  Description: Intracerebral hemorrhage (very acute clinical changes occurred immediately).
CC: Left hand numbness on presentation; then developed lethargy later that day.

HX: On the day of presentation, this 72 y/o RHM suddenly developed generalized weakness and lightheadedness, and could not rise from a chair. Four hours later he experienced sudden left hand numbness lasting two hours. There were no other associated symptoms except for the generalized weakness and lightheadedness. He denied vertigo.

He had been experiencing falling spells without associated LOC up to several times a month for the past year.

MEDS: procardia SR, Lasix, Ecotrin, KCL, Digoxin, Colace, Coumadin.

PMH: 1)8/92 evaluation for presyncope (Echocardiogram showed: AV fibrosis/calcification, AV stenosis/insufficiency, MV stenosis with annular calcification and regurgitation, moderate TR, Decreased LV systolic function, severe LAE. MRI brain: focal areas of increased T2 signal in the left cerebellum and in the brainstem probably representing microvascular ischemic disease. IVG (MUGA scan)revealed: global hypokinesis of the LV and biventricular dysfunction, RV ejection Fx 45% and LV ejection Fx 39%. He was subsequently placed on coumadin severe valvular heart disease), 2)HTN, 3)Rheumatic fever and heart disease, 4)COPD, 5)ETOH abuse, 6)colonic polyps, 7)CAD, 8)CHF, 9)Appendectomy, 10)Junctional tachycardia.
  """,
  """
  Patient Information:

Name: John Parkinson
Date of Birth: February 12, 1958
Gender: Male
Address: 789 Wellness Lane, Healthville, HV 56789
Phone: (555) 555-1234
Email: john.parkinson@email.com
Emergency Contact:

Name: Mary Parkinson
Relationship: Spouse
Phone: (555) 555-5678
Insurance Information:

Insurance Provider: HealthWell Assurance
Policy Number: HW765432109
Group Number: G876543
Medical History:

Allergies:

None reported
Medications:

Levodopa/Carbidopa for Parkinson's disease symptoms
Pramipexole for restless legs syndrome
Lisinopril for hypertension
Atorvastatin for hyperlipidemia
Metformin for Type 2 Diabetes
Medical Conditions:

Parkinson's Disease (diagnosed on June 20, 2015)
Hypertension
Hyperlipidemia
Type 2 Diabetes
Osteoarthritis
Vital Signs:

Blood Pressure: 130/80 mmHg
Heart Rate: 72 bpm
Temperature: 98.4°F
Respiratory Rate: 18 breaths per minute
Recent Inpatient Stay (Dates: September 1-10, 2023):

Reason for Admission: Acute exacerbation of Parkinson's symptoms, pneumonia, and uncontrolled diabetes.

Interventions:

Neurology Consultation for Parkinson's disease management adjustments.
Antibiotic therapy for pneumonia.
Continuous glucose monitoring and insulin therapy for diabetes control.
Physical therapy sessions to maintain mobility.
Complications:

Delirium managed with close monitoring and appropriate interventions.
Discharge Plan:

Medication adjustments for Parkinson's disease.
Follow-up appointments with neurologist, endocrinologist, and primary care.
Home health care for continued physical therapy.
Follow-up Visits:

Date: October 15, 2023

Reason for Visit: Post-discharge Follow-up
Notes: Stable Parkinson's symptoms, pneumonia resolved. Adjusted diabetes medications for better control.
Date: December 5, 2023

Reason for Visit: Neurology Follow-up
Notes: Fine-tuned Parkinson's medication regimen. Recommended ongoing physical therapy.
  """
]*5

print("\n***PER DOC\n")
with pipeline_per_doc_timer(cat.pipe):
  for text in texts:
      cat.get_entities(text)

print("\n***Every 0.3s\n")
with pipeline_timer_averaging_docs(cat.pipe, show_frequency_secs=0.3):
  for text in texts:
      cat.get_entities(text)

print("\n***Every 4 docs\n")
with pipeline_timer_averaging_docs(cat.pipe, show_frequency_docs=4):
  for text in texts:
      cat.get_entities(text)

print("\n***Defaults to 100 docs\n")
with pipeline_timer_averaging_docs(cat.pipe):
  for text in texts:
      cat.get_entities(text)

print("\n***Look at specific component\n")
with profile_pipeline_component(cat.pipe, CoreComponentType.ner):
  for text in texts:
      cat.get_entities(text)

print("\n***Look at tokenizer\n")
with profile_pipeline_component(cat.pipe, 'tokenizer'):
  for text in texts:
      cat.get_entities(text)
Example output
% python .temp/2026_03_16_speed/run_test.py                        
Loading
/Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/util.py:922: UserWarning: [W095] Model 'en_core_web_md' (3.1.0) was trained with spaCy v3.1.0 and may not be 100% compatible with the current version (3.8.7). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)
RUN!

***PER DOC

Component Tokenizer:SpacyTokenizer took 38.346ms
Component tagging:tag-and-skip-tagger took 0.763ms
Component token_normalizing:token_normalizer took 7.950ms
Component ner:cat_ner took 2053.910ms
Component linking:medcat2_linker took 29.364ms
Component meta_cat.Presence took 96.318ms
Component meta_cat.Subject took 69.876ms
Component meta_cat.Time took 69.034ms
Component Tokenizer:SpacyTokenizer took 39.176ms
Component tagging:tag-and-skip-tagger took 1.098ms
Component token_normalizing:token_normalizer took 5.057ms
Component ner:cat_ner took 5.233ms
Component linking:medcat2_linker took 38.468ms
Component meta_cat.Presence took 95.118ms
Component meta_cat.Subject took 90.860ms
Component meta_cat.Time took 91.139ms
Component Tokenizer:SpacyTokenizer took 22.520ms
Component tagging:tag-and-skip-tagger took 0.666ms
Component token_normalizing:token_normalizer took 6.469ms
Component ner:cat_ner took 3.243ms
Component linking:medcat2_linker took 28.185ms
Component meta_cat.Presence took 69.021ms
Component meta_cat.Subject took 68.427ms
Component meta_cat.Time took 67.621ms
Component Tokenizer:SpacyTokenizer took 31.301ms
Component tagging:tag-and-skip-tagger took 1.040ms
Component token_normalizing:token_normalizer took 4.983ms
Component ner:cat_ner took 4.001ms
Component linking:medcat2_linker took 34.966ms
Component meta_cat.Presence took 90.066ms
Component meta_cat.Subject took 90.146ms
Component meta_cat.Time took 91.601ms
Component Tokenizer:SpacyTokenizer took 22.944ms
Component tagging:tag-and-skip-tagger took 0.694ms
Component token_normalizing:token_normalizer took 5.912ms
Component ner:cat_ner took 2.979ms
Component linking:medcat2_linker took 27.106ms
Component meta_cat.Presence took 67.966ms
Component meta_cat.Subject took 67.439ms
Component meta_cat.Time took 68.057ms
Component Tokenizer:SpacyTokenizer took 31.161ms
Component tagging:tag-and-skip-tagger took 1.033ms
Component token_normalizing:token_normalizer took 5.185ms
Component ner:cat_ner took 4.120ms
Component linking:medcat2_linker took 36.009ms
Component meta_cat.Presence took 89.181ms
Component meta_cat.Subject took 90.559ms
Component meta_cat.Time took 89.316ms
Component Tokenizer:SpacyTokenizer took 22.298ms
Component tagging:tag-and-skip-tagger took 0.667ms
Component token_normalizing:token_normalizer took 5.692ms
Component ner:cat_ner took 2.917ms
Component linking:medcat2_linker took 27.720ms
Component meta_cat.Presence took 68.125ms
Component meta_cat.Subject took 69.701ms
Component meta_cat.Time took 68.739ms
Component Tokenizer:SpacyTokenizer took 32.212ms
Component tagging:tag-and-skip-tagger took 1.061ms
Component token_normalizing:token_normalizer took 5.165ms
Component ner:cat_ner took 4.292ms
Component linking:medcat2_linker took 36.161ms
Component meta_cat.Presence took 89.833ms
Component meta_cat.Subject took 90.303ms
Component meta_cat.Time took 89.141ms
Component Tokenizer:SpacyTokenizer took 22.373ms
Component tagging:tag-and-skip-tagger took 0.672ms
Component token_normalizing:token_normalizer took 5.706ms
Component ner:cat_ner took 2.862ms
Component linking:medcat2_linker took 27.491ms
Component meta_cat.Presence took 68.918ms
Component meta_cat.Subject took 69.006ms
Component meta_cat.Time took 68.867ms
Component Tokenizer:SpacyTokenizer took 30.985ms
Component tagging:tag-and-skip-tagger took 1.033ms
Component token_normalizing:token_normalizer took 5.322ms
Component ner:cat_ner took 4.044ms
Component linking:medcat2_linker took 35.904ms
Component meta_cat.Presence took 91.034ms
Component meta_cat.Subject took 89.671ms
Component meta_cat.Time took 91.067ms

***Every 0.3s

Component Tokenizer:SpacyTokenizer took (min/mean/median/max): 23.163ms / 27.156ms / 27.156ms / 31.150ms over 2 docs and a total of 0.305s
Component tagging:tag-and-skip-tagger took (min/mean/median/max): 0.690ms / 0.863ms / 0.863ms / 1.037ms over 2 docs and a total of 0.307s
Component token_normalizing:token_normalizer took (min/mean/median/max): 4.632ms / 5.302ms / 5.302ms / 5.973ms over 2 docs and a total of 0.311s
Component ner:cat_ner took (min/mean/median/max): 3.231ms / 3.610ms / 3.610ms / 3.989ms over 2 docs and a total of 0.315s
Component linking:medcat2_linker took (min/mean/median/max): 27.793ms / 31.211ms / 31.211ms / 34.629ms over 2 docs and a total of 0.350s
Component meta_cat.Presence took (min/mean/median/max): 71.409ms / 80.950ms / 80.950ms / 90.491ms over 2 docs and a total of 0.440s
Component meta_cat.Subject took (min/mean/median/max): 70.504ms / 80.282ms / 80.282ms / 90.060ms over 2 docs and a total of 0.531s
Component meta_cat.Time took (min/mean/median/max): 69.545ms / 79.794ms / 79.794ms / 90.043ms over 2 docs and a total of 0.621s
Component Tokenizer:SpacyTokenizer took (min/mean/median/max): 22.832ms / 22.832ms / 22.832ms / 22.832ms over 1 docs and a total of 0.341s
Component tagging:tag-and-skip-tagger took (min/mean/median/max): 0.700ms / 0.700ms / 0.700ms / 0.700ms over 1 docs and a total of 0.340s
Component token_normalizing:token_normalizer took (min/mean/median/max): 5.967ms / 5.967ms / 5.967ms / 5.967ms over 1 docs and a total of 0.342s
Component ner:cat_ner took (min/mean/median/max): 2.991ms / 2.991ms / 2.991ms / 2.991ms over 1 docs and a total of 0.341s
Component linking:medcat2_linker took (min/mean/median/max): 27.430ms / 27.430ms / 27.430ms / 27.430ms over 1 docs and a total of 0.333s
Component meta_cat.Presence took (min/mean/median/max): 68.034ms / 68.034ms / 68.034ms / 68.034ms over 1 docs and a total of 0.311s
Component meta_cat.Presence took (min/mean/median/max): 89.948ms / 89.948ms / 89.948ms / 89.948ms over 1 docs and a total of 0.302s
Component meta_cat.Subject took (min/mean/median/max): 67.649ms / 78.594ms / 78.594ms / 89.538ms over 2 docs and a total of 0.613s
Component meta_cat.Time took (min/mean/median/max): 67.732ms / 79.448ms / 79.448ms / 91.164ms over 2 docs and a total of 0.614s
Component Tokenizer:SpacyTokenizer took (min/mean/median/max): 22.689ms / 26.897ms / 26.897ms / 31.106ms over 2 docs and a total of 0.614s
Component tagging:tag-and-skip-tagger took (min/mean/median/max): 0.673ms / 0.857ms / 0.857ms / 1.041ms over 2 docs and a total of 0.614s
Component token_normalizing:token_normalizer took (min/mean/median/max): 4.512ms / 5.154ms / 5.154ms / 5.797ms over 2 docs and a total of 0.614s
Component ner:cat_ner took (min/mean/median/max): 2.913ms / 3.466ms / 3.466ms / 4.019ms over 2 docs and a total of 0.614s
Component linking:medcat2_linker took (min/mean/median/max): 26.648ms / 30.610ms / 30.610ms / 34.573ms over 2 docs and a total of 0.613s
Component meta_cat.Presence took (min/mean/median/max): 67.642ms / 67.642ms / 67.642ms / 67.642ms over 1 docs and a total of 0.310s
Component meta_cat.Presence took (min/mean/median/max): 91.833ms / 91.833ms / 91.833ms / 91.833ms over 1 docs and a total of 0.309s
Component meta_cat.Subject took (min/mean/median/max): 67.274ms / 78.864ms / 78.864ms / 90.453ms over 2 docs and a total of 0.620s
Component meta_cat.Time took (min/mean/median/max): 69.214ms / 79.525ms / 79.525ms / 89.836ms over 2 docs and a total of 0.618s
Component Tokenizer:SpacyTokenizer took (min/mean/median/max): 22.538ms / 27.274ms / 27.274ms / 32.011ms over 2 docs and a total of 0.618s
Component tagging:tag-and-skip-tagger took (min/mean/median/max): 0.665ms / 0.852ms / 0.852ms / 1.039ms over 2 docs and a total of 0.618s
Component token_normalizing:token_normalizer took (min/mean/median/max): 4.905ms / 5.356ms / 5.356ms / 5.807ms over 2 docs and a total of 0.618s
Component ner:cat_ner took (min/mean/median/max): 2.895ms / 3.629ms / 3.629ms / 4.363ms over 2 docs and a total of 0.618s
Component linking:medcat2_linker took (min/mean/median/max): 27.117ms / 31.800ms / 31.800ms / 36.483ms over 2 docs and a total of 0.619s
Component meta_cat.Presence took (min/mean/median/max): 71.199ms / 71.199ms / 71.199ms / 71.199ms over 1 docs and a total of 0.313s
Component meta_cat.Presence took (min/mean/median/max): 90.035ms / 90.035ms / 90.035ms / 90.035ms over 1 docs and a total of 0.306s
Component meta_cat.Subject took (min/mean/median/max): 69.607ms / 80.140ms / 80.140ms / 90.673ms over 2 docs and a total of 0.620s
Component meta_cat.Time took (min/mean/median/max): 68.386ms / 78.681ms / 78.681ms / 88.976ms over 2 docs and a total of 0.619s
Component Tokenizer:SpacyTokenizer took (min/mean/median/max): 22.893ms / 27.188ms / 27.188ms / 31.483ms over 2 docs and a total of 0.619s
Component tagging:tag-and-skip-tagger took (min/mean/median/max): 0.662ms / 0.855ms / 0.855ms / 1.048ms over 2 docs and a total of 0.619s
Component token_normalizing:token_normalizer took (min/mean/median/max): 4.742ms / 5.422ms / 5.422ms / 6.102ms over 2 docs and a total of 0.619s
Component ner:cat_ner took (min/mean/median/max): 2.907ms / 3.465ms / 3.465ms / 4.022ms over 2 docs and a total of 0.619s
Component linking:medcat2_linker took (min/mean/median/max): 27.179ms / 30.992ms / 30.992ms / 34.805ms over 2 docs and a total of 0.619s
Component meta_cat.Presence took (min/mean/median/max): 68.956ms / 68.956ms / 68.956ms / 68.956ms over 1 docs and a total of 0.311s
Component meta_cat.Presence took (min/mean/median/max): 91.592ms / 91.592ms / 91.592ms / 91.592ms over 1 docs and a total of 0.307s
Component meta_cat.Subject took (min/mean/median/max): 68.729ms / 80.102ms / 80.102ms / 91.476ms over 2 docs and a total of 0.619s
Component meta_cat.Time took (min/mean/median/max): 68.207ms / 79.131ms / 79.131ms / 90.054ms over 2 docs and a total of 0.620s
Component Tokenizer:SpacyTokenizer took (min/mean/median/max): 31.678ms / 31.678ms / 31.678ms / 31.678ms over 1 docs and a total of 0.597s
Component tagging:tag-and-skip-tagger took (min/mean/median/max): 1.106ms / 1.106ms / 1.106ms / 1.106ms over 1 docs and a total of 0.596s
Component token_normalizing:token_normalizer took (min/mean/median/max): 4.803ms / 4.803ms / 4.803ms / 4.803ms over 1 docs and a total of 0.590s
Component ner:cat_ner took (min/mean/median/max): 4.065ms / 4.065ms / 4.065ms / 4.065ms over 1 docs and a total of 0.587s
Component linking:medcat2_linker took (min/mean/median/max): 34.672ms / 34.672ms / 34.672ms / 34.672ms over 1 docs and a total of 0.560s

***Every 4 docs

Component Tokenizer:SpacyTokenizer took (min/mean/median/max): 22.309ms / 26.872ms / 26.742ms / 31.696ms over 4 docs and a total of 0.912s
Component tagging:tag-and-skip-tagger took (min/mean/median/max): 0.658ms / 0.862ms / 0.866ms / 1.058ms over 4 docs and a total of 0.913s
Component token_normalizing:token_normalizer took (min/mean/median/max): 4.507ms / 5.123ms / 5.149ms / 5.686ms over 4 docs and a total of 0.918s
Component ner:cat_ner took (min/mean/median/max): 2.894ms / 3.564ms / 3.491ms / 4.383ms over 4 docs and a total of 0.922s
Component linking:medcat2_linker took (min/mean/median/max): 26.542ms / 30.899ms / 30.730ms / 35.594ms over 4 docs and a total of 0.957s
Component meta_cat.Presence took (min/mean/median/max): 67.741ms / 79.015ms / 79.231ms / 89.857ms over 4 docs and a total of 1.047s
Component meta_cat.Subject took (min/mean/median/max): 67.417ms / 79.139ms / 79.424ms / 90.292ms over 4 docs and a total of 1.137s
Component meta_cat.Time took (min/mean/median/max): 67.980ms / 79.434ms / 80.096ms / 89.565ms over 4 docs and a total of 1.226s
Component Tokenizer:SpacyTokenizer took (min/mean/median/max): 22.288ms / 26.771ms / 26.814ms / 31.169ms over 4 docs and a total of 1.225s
Component tagging:tag-and-skip-tagger took (min/mean/median/max): 0.654ms / 0.859ms / 0.862ms / 1.059ms over 4 docs and a total of 1.225s
Component token_normalizing:token_normalizer took (min/mean/median/max): 4.477ms / 5.139ms / 5.113ms / 5.854ms over 4 docs and a total of 1.225s
Component ner:cat_ner took (min/mean/median/max): 2.913ms / 3.478ms / 3.498ms / 4.000ms over 4 docs and a total of 1.225s
Component linking:medcat2_linker took (min/mean/median/max): 26.557ms / 30.536ms / 30.527ms / 34.532ms over 4 docs and a total of 1.225s
Component meta_cat.Presence took (min/mean/median/max): 67.579ms / 79.429ms / 79.113ms / 91.913ms over 4 docs and a total of 1.225s
Component meta_cat.Subject took (min/mean/median/max): 67.621ms / 78.774ms / 78.865ms / 89.745ms over 4 docs and a total of 1.225s
Component meta_cat.Time took (min/mean/median/max): 67.162ms / 79.470ms / 79.928ms / 90.861ms over 4 docs and a total of 1.226s
Component Tokenizer:SpacyTokenizer took (min/mean/median/max): 22.412ms / 26.769ms / 26.769ms / 31.125ms over 2 docs and a total of 0.928s
Component tagging:tag-and-skip-tagger took (min/mean/median/max): 0.662ms / 0.858ms / 0.858ms / 1.054ms over 2 docs and a total of 0.927s
Component token_normalizing:token_normalizer took (min/mean/median/max): 4.485ms / 5.066ms / 5.066ms / 5.646ms over 2 docs and a total of 0.922s
Component ner:cat_ner took (min/mean/median/max): 2.952ms / 3.471ms / 3.471ms / 3.989ms over 2 docs and a total of 0.918s
Component linking:medcat2_linker took (min/mean/median/max): 26.663ms / 30.481ms / 30.481ms / 34.298ms over 2 docs and a total of 0.884s
Component meta_cat.Presence took (min/mean/median/max): 68.214ms / 78.489ms / 78.489ms / 88.763ms over 2 docs and a total of 0.794s
Component meta_cat.Subject took (min/mean/median/max): 69.157ms / 79.076ms / 79.076ms / 88.994ms over 2 docs and a total of 0.704s
Component meta_cat.Time took (min/mean/median/max): 67.862ms / 78.981ms / 78.981ms / 90.100ms over 2 docs and a total of 0.613s

***Defaults to 100 docs

Component Tokenizer:SpacyTokenizer took (min/mean/median/max): 22.270ms / 27.276ms / 27.293ms / 32.783ms over 10 docs and a total of 3.066s
Component tagging:tag-and-skip-tagger took (min/mean/median/max): 0.661ms / 0.885ms / 0.883ms / 1.207ms over 10 docs and a total of 3.066s
Component token_normalizing:token_normalizer took (min/mean/median/max): 4.618ms / 5.241ms / 5.285ms / 5.891ms over 10 docs and a total of 3.066s
Component ner:cat_ner took (min/mean/median/max): 2.860ms / 3.550ms / 3.538ms / 4.320ms over 10 docs and a total of 3.066s
Component linking:medcat2_linker took (min/mean/median/max): 26.338ms / 30.630ms / 30.490ms / 35.826ms over 10 docs and a total of 3.066s
Component meta_cat.Presence took (min/mean/median/max): 67.224ms / 78.771ms / 78.480ms / 90.658ms over 10 docs and a total of 3.066s
Component meta_cat.Subject took (min/mean/median/max): 66.939ms / 78.822ms / 79.799ms / 89.304ms over 10 docs and a total of 3.066s
Component meta_cat.Time took (min/mean/median/max): 68.715ms / 79.368ms / 79.550ms / 89.931ms over 10 docs and a total of 3.066s

***Look at specific component

Component ner:cat_ner profile (by tottime):
       155521 function calls in 0.068 seconds

 Ordered by: internal time
 List reduced from 39 to 20 due to restriction <20>

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     10    0.012    0.001    0.067    0.007 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/components/ner/vocab_based_ner.py:30(predict_entities)
   9195    0.010    0.000    0.010    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/tokens/underscore.py:23(__init__)
   5635    0.005    0.000    0.015    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:44(norm)
   9195    0.005    0.000    0.007    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/tokens/underscore.py:46(__getattr__)
   8265    0.005    0.000    0.006    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/cdb/cdb.py:51(has_subname)
   1585    0.004    0.000    0.010    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/components/ner/vocab_based_annotator.py:19(annotate_name)
   1585    0.003    0.000    0.004    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py:72(create_entity)
  16405    0.003    0.000    0.004    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:52(base)
   3560    0.003    0.000    0.010    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:36(to_skip)
  25330    0.002    0.000    0.002    0.000 {built-in method builtins.len}
   3570    0.002    0.000    0.002    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:260(__iter__)
   5635    0.002    0.000    0.019    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:60(text_versions)
   5635    0.002    0.000    0.002    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:64(lower)
   1585    0.001    0.000    0.012    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/components/ner/vocab_based_annotator.py:62(maybe_annotate_name)
   9195    0.001    0.000    0.001    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/tokens/underscore.py:92(_get_key)
  17990    0.001    0.000    0.001    0.000 /Users/martratas/.pyenv/versions/3.12.10/lib/python3.12/typing.py:2187(cast)
   9160    0.001    0.000    0.001    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:96(index)
   3170    0.001    0.000    0.001    0.000 /Users/martratas/.pyenv/versions/3.12.10/lib/python3.12/logging/__init__.py:1517(debug)
   4580    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
   2640    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}



Component ner:cat_ner profile (by cumtime):
       155521 function calls in 0.068 seconds

 Ordered by: cumulative time
 List reduced from 39 to 20 due to restriction <20>

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     10    0.000    0.000    0.068    0.007 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/components/types.py:150(__call__)
     10    0.012    0.001    0.067    0.007 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/components/ner/vocab_based_ner.py:30(predict_entities)
   5635    0.002    0.000    0.019    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:60(text_versions)
   5635    0.005    0.000    0.015    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:44(norm)
   1585    0.001    0.000    0.012    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/components/ner/vocab_based_annotator.py:62(maybe_annotate_name)
   1585    0.004    0.000    0.010    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/components/ner/vocab_based_annotator.py:19(annotate_name)
   9195    0.010    0.000    0.010    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/tokens/underscore.py:23(__init__)
   3560    0.003    0.000    0.010    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:36(to_skip)
   9195    0.005    0.000    0.007    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/tokens/underscore.py:46(__getattr__)
   8265    0.005    0.000    0.006    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/cdb/cdb.py:51(has_subname)
  16405    0.003    0.000    0.004    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:52(base)
   1585    0.003    0.000    0.004    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py:72(create_entity)
   3570    0.002    0.000    0.002    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:260(__iter__)
  25330    0.002    0.000    0.002    0.000 {built-in method builtins.len}
   5635    0.002    0.000    0.002    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:64(lower)
   9195    0.001    0.000    0.001    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/tokens/underscore.py:92(_get_key)
  17990    0.001    0.000    0.001    0.000 /Users/martratas/.pyenv/versions/3.12.10/lib/python3.12/typing.py:2187(cast)
   3170    0.001    0.000    0.001    0.000 /Users/martratas/.pyenv/versions/3.12.10/lib/python3.12/logging/__init__.py:1517(debug)
   9160    0.001    0.000    0.001    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokens.py:96(index)
   4580    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}




***Look at tokenizer

Component Tokenizer:SpacyTokenizer profile (by tottime):
       38760 function calls (37940 primitive calls) in 0.281 seconds

 Ordered by: internal time
 List reduced from 127 to 20 due to restriction <20>

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     50    0.180    0.004    0.181    0.004 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/maxout.py:45(forward)
     10    0.025    0.002    0.026    0.003 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/pipeline/attributeruler.py:136(match)
     10    0.015    0.001    0.015    0.001 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/ml/staticvectors.py:40(forward)
     10    0.010    0.001    0.010    0.001 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/pipeline/attributeruler.py:145(set_annotations)
   3520    0.009    0.000    0.019    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/pipeline/lemmatizer.py:173(rule_lemmatize)
   2055    0.005    0.000    0.006    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/lang/en/lemmatizer.py:8(is_base_form)
     10    0.004    0.000    0.023    0.002 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/pipeline/lemmatizer.py:98(__call__)
     40    0.004    0.000    0.004    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/hashembed.py:59(forward)
     10    0.003    0.000    0.003    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/backends/ops.py:259(affine)
     10    0.002    0.000    0.281    0.028 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/language.py:1020(__call__)
   3750    0.002    0.000    0.003    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/lookups.py:109(get)
    170    0.002    0.000    0.002    0.000 {method 'reduce' of 'numpy.ufunc' objects}
     10    0.001    0.000    0.002    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/language.py:1114(make_doc)
     50    0.001    0.000    0.003    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/numpy/_core/_methods.py:151(_var)
     50    0.001    0.000    0.007    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/layernorm.py:23(forward)
     50    0.001    0.000    0.001    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/layernorm.py:58(_begin_update_scale_shift)
   8345    0.001    0.000    0.001    0.000 {method 'get' of 'dict' objects}
     40    0.001    0.000    0.001    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/expand_window.py:25(_expand_window_floats)
     20    0.001    0.000    0.001    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/numpy/_core/shape_base.py:295(hstack)
   6580    0.001    0.000    0.001    0.000 {method 'lower' of 'str' objects}



Component Tokenizer:SpacyTokenizer profile (by cumtime):
       38760 function calls (37940 primitive calls) in 0.281 seconds

 Ordered by: cumulative time
 List reduced from 127 to 20 due to restriction <20>

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     10    0.000    0.000    0.281    0.028 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/medcat/tokenizing/spacy_impl/tokenizers.py:87(__call__)
     10    0.002    0.000    0.281    0.028 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/language.py:1020(__call__)
     20    0.000    0.000    0.217    0.011 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/model.py:330(predict)
 230/20    0.001    0.000    0.217    0.011 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/chain.py:48(forward)
 640/40    0.001    0.000    0.217    0.005 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/model.py:307(__call__)
     10    0.000    0.000    0.213    0.021 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/pipeline/tok2vec.py:108(predict)
     40    0.000    0.000    0.200    0.005 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/with_array.py:32(forward)
     50    0.180    0.004    0.181    0.004 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/maxout.py:45(forward)
     20    0.000    0.000    0.143    0.007 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/with_array.py:70(_list_forward)
     40    0.000    0.000    0.138    0.003 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/residual.py:28(forward)
     20    0.000    0.000    0.057    0.003 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/with_array.py:87(_ragged_forward)
     10    0.000    0.000    0.036    0.004 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/pipeline/attributeruler.py:120(__call__)
     10    0.025    0.002    0.026    0.003 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/pipeline/attributeruler.py:136(match)
     10    0.004    0.000    0.023    0.002 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/pipeline/lemmatizer.py:98(__call__)
  20/10    0.000    0.000    0.021    0.002 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/concatenate.py:56(forward)
   3520    0.009    0.000    0.019    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/pipeline/lemmatizer.py:173(rule_lemmatize)
     10    0.015    0.001    0.015    0.001 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/ml/staticvectors.py:40(forward)
     10    0.010    0.001    0.010    0.001 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/pipeline/attributeruler.py:145(set_annotations)
     50    0.001    0.000    0.007    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/thinc/layers/layernorm.py:23(forward)
   2055    0.005    0.000    0.006    0.000 /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/medcat-v2/.venv312/lib/python3.12/site-packages/spacy/lang/en/lemmatizer.py:8(is_base_form)





@tomolopolis
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants