Optical Character Recognition is used for data extraction from an image or scanned document and then convert it into machine readable format.
PaddleOCR was developed by a Chinese company named 'Baidu' in September 2020. They have used the PaddlePaddle deep learning framework.
In [73]:
Namespace(benchmark=False, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir= '/home/webtunix/.paddleocr/ 2.2.0.2/ocr/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, det=True, d et_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/webtunix/.paddleocr/2.2.0.2 /ocr/det/en/en_ppocr_mobile_v2.0_det_infer', det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path= './ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e 2e_pgnet_mode='fast', e2e_pgnet_polygon=True, e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set= 'totaltext', enable_mkldnn=False, gpu_mem=500, help='==SUPPRESS==', image_dir=None, i r_optim=True, label_list=['0', '180'], lang='en', layout_path_model='lp://PubLayNet /ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=10, output='./output/table', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/home/webtunix /.local/lib/python3.6/site-packages/paddleocr /ppocr/utils/en_dict.txt', rec_char_type='ch', rec_image_shape='3, 32, 320', rec_model_dir= '/home/webtunix/.paddleocr/ 2.2.0.2/ocr/rec/en/en_number_mobile_v2.0_rec_infer', save_log_path='./log_output/', show_log=True, table_char_dict_path=None, table_char_type='en', table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=True, use_dilation=False, use_gpu=True, use_mp=False, use_pdserving=False, use_space_char=True, u se_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=True) [2021/08/13 19:21:01] root DEBUG: dt_boxes num : 10, elapse : 0.4250295162200928 [2021/08/13 19:21:01] root DEBUG: cls num : 10, elapse : 0.06311750411987305 [2021/08/13 19:21:01] root DEBUG: rec_res num : 10, elapse : 0.16244864463806152 [[[104.0, 93.0], [256.0, 103.0], [254.0, 134.0], [102.0, 124.0]], ('MALA', 0.98211896)] [[[66.0, 161.0], [265.0, 157.0], [266.0, 181.0], [66.0, 184.0]], ( '00418-00-6418', 0.9912834)] [[[60.0, 360.0], [152.0, 360.0], [152.0, 378.0], [60.0, 378.0]], ( 'BINTLAR', 0.9820388)] [[[158.0, 364.0], [200.0, 364.0], [200.0, 378.0], [158.0, 378.0]], ('RN', 0.8610319)] [[[56.0, 403.0], [140.0, 406.0], [139.0, 427.0], [55.0, 424.0]], ( 'LOT2272', 0.99408233)] [[[54.0, 429.0], [265.0, 427.0], [266.0, 455.0], [54.0, 457.0]], ('4B400PETALING JAYA', 0.9694132)] [[[521.0, 438.0], [663.0, 436.0], [663.0, 450.0], [521.0, 452.0]], ('WARGANEGARA', 0.9971807)] [[[55.0, 461.0], [162.0, 461.0], [162.0, 485.0], [55.0, 485.0]], ('SELANGOR', 0.9964472)] [[[586.0, 454.0], [697.0, 452.0], [697.0, 466.0], [587.0, 468.0]], ( 'PEREMPUAN', 0.9980258)]
In [79]:
Namespace(benchmark=False, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir= '/home/webtunix/.paddleocr/2.2.0.2/ocr/cls /ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, d et=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/webtunix/ .paddleocr/2.2.0.2/ocr/det/en /en_ppocr_mobile_v2.0_det_infer', det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path=' ./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_polygon=True, e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, l abel_list=['0', '180'], lang='en', layout_path_model= 'lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=10, output='./output/table', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/home/webtunix/.local/lib/python3.6/ site-packages/paddleocr/ppocr/utils/en_dict.txt', rec_char_type='ch', rec_image_shape='3, 32, 320', rec_model_dir='/home/webtunix/.paddleocr /2.2.0.2/ocr/rec/en/en_number_mobile_v2.0_rec_infer', save_log_path='./log_output/', show_log=True, table_char_dict_path=None, table_char_type='en', table_max_len=488, table_model_dir=None, t otal_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_pdserving=False, use_space_char=True, u se_tensorrt=False, vis_font_path= './doc/fonts/simfang.ttf', warmup=True) [2021/08/13 19:23:56] root WARNING: Since the angle classifier is not initialized, the angle classifier will not be uesd during the forward process [2021/08/13 19:23:57] root DEBUG: dt_boxes num : 10, elapse : 0.32827091217041016 [2021/08/13 19:23:57] root DEBUG: rec_res num : 10, elapse : 0.12045407295227051 [2021/08/13 19:23:57] root WARNING: Since the angle classifier is not initialized, the angle classifier will not be uesd during the forward process [2021/08/13 19:23:57] root DEBUG: dt_boxes num : 10, elapse : 0.34050536155700684 [2021/08/13 19:23:57] root DEBUG: rec_res num : 10, elapse : 0.16071629524230957 [[[104.0, 93.0], [256.0, 103.0], [254.0, 134.0], [102.0, 124.0]], ('MALA', 0.98211896)] [[[66.0, 161.0], [265.0, 157.0], [266.0, 181.0], [66.0, 184.0]], ('00418-00-6418', 0.9912834)] [[[60.0, 360.0], [152.0, 360.0], [152.0, 378.0], [60.0, 378.0]], ('BINTLAR', 0.9820388)] [[[158.0, 364.0], [200.0, 364.0], [200.0, 378.0], [158.0, 378.0]], ('RN', 0.8610319)] [[[56.0, 403.0], [140.0, 406.0], [139.0, 427.0], [55.0, 424.0]], ( 'LOT2272', 0.99408233)] [[[54.0, 429.0], [265.0, 427.0], [266.0, 455.0], [54.0, 457.0]], ('4B400PETALING JAYA', 0.9694132)] [[[521.0, 438.0], [663.0, 436.0], [663.0, 450.0], [521.0, 452.0]], ( 'WARGANEGARA', 0.9971807)] [[[55.0, 461.0], [162.0, 461.0], [162.0, 485.0], [55.0, 485.0]], ('SELANGOR', 0.9964472)] [[[586.0, 454.0], [697.0, 452.0], [697.0, 466.0], [587.0, 468.0]], ( 'PEREMPUAN', 0.9980258)]
I have extracted an image file into machine readable format.
Better data is the key for the better products. We train you data for Machine Learning and better business analytics. We can annotate, collect, evaluate and translate any type of data in any language.