Spaces:

Hyunsang
/

pdf_to_pptx

Running on Zero

App Files Files Community

HyunsangJoo commited on 9 days ago

Commit

209d412

1 Parent(s): 8a27d1f

옵션 추가, 로컬 ui 테스트, readme 업데이트

Browse files

Files changed (3) hide show

README.md +52 -0
app.py +33 -5
dots_ocr/utils/pptx_generator.py +53 -22

README.md CHANGED Viewed

@@ -12,3 +12,55 @@ short_description: Convert pdf/image to pptx with text ready to edit.
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+## 🏗 전체 과정 요약 (4단계)
+마치 **[사진 촬영] → [탐정 수사] → [설계도 작성] → [건물 조립]** 과정과 같습니다.
+### 1단계: 사진 찍기 (이미지 준비) 📸
+*   **담당:** `dots_ocr/utils/doc_utils.py`
+*   **내용:** PDF 파일은 AI가 바로 보기 어렵습니다. 그래서 책을 스캔하듯이 모든 페이지를 **고화질 이미지(사진)**로 변환합니다.
+*   **핵심 기술:** 작은 글씨도 잘 보이게 **2배 확대(Zoom-in)**해서 찍습니다.
+### 2단계: 탐정 로봇의 분석 (좌표 & 내용 추출) 🕵️
+*   **담당:** `dots_ocr/parser.py`, `dots_ocr/model/inference.py`
+*   **내용:** 똑똑한 AI(탐정)가 사진을 보고 두 가지를 찾아냅니다.
+    1.  **글자 읽기:** "여기에 '안녕하세요'라고 써있네."
+    2.  **위치 찾기(좌표):** "이 글자는 종이 왼쪽에서 10칸, 위에서 20칸 떨어진 곳에 있어."
+*   **핵심 기술:** AI에게 이미지를 보낼 때 `<|img|>` 같은 특수 암호를 써서 전송합니다.
+### 3단계: 설계도 그리기 (데이터 정리) 📝
+*   **담당:** `dots_ocr/utils/output_cleaner.py`
+*   **내용:** AI가 찾아낸 뒤죽박죽인 정보들을 깔끔한 **설계도(JSON)**로 정리합니다.
+    *   "1번 상자: [10, 20] 위치, 내용 '제목'"
+    *   "2번 상자: [50, 80] 위치, 내용 '본문'"
+### 4단계: 건축가의 조립 (PPT 만들기) 🔨
+*   **담당:** `dots_ocr/utils/pptx_generator.py`
+*   **내용:** 빈 PPT 슬라이드를 꺼내고, 설계도를 보며 글상자와 표를 배치합니다.
+*   **핵심 기술:**
+    *   **비율 계산:** 사진 크기와 PPT 크기가 달라도 비율(%)을 계산해서 정확한 위치에 넣습니다.
+    *   **폰트 계산:** 글자가 상자 밖으로 튀어나가지 않게 적절한 폰트 크기를 자동으로 계산합니다.
+---
+## 🔍 심화 탐구: 좌표와 크기의 비밀
+### Q1. 좌표는 어떻게 측정하나요?
+*   **기준점:** 종이의 **왼쪽 맨 위 모서리**가 `(0, 0)`입니다.
+*   **방향:** 오른쪽으로 갈수록 x값이 커지고, 아래로 갈수록 y값이 커집니다.
+*   **단위:** **픽셀(Pixel)**이라는 점의 개수를 셉니다.
+### Q2. 이미지를 확대하면 좌표가 망가지지 않나요?
+망가지지 않습니다! **비율(Scale)**을 사용하기 때문입니다.
+*   **작은 사진:** 가로 100 중에 10 위치 (10%)
+*   **큰 사진:** 가로 200 중에 20 위치 (10%)
+*   **결론:** 크기는 달라져도 "전체에서 10% 지점"이라는 사실은 변하지 않으므로, PPT에서도 제자리에 들어갑니다.
+### Q3. PPT 페이지 크기는 어떻게 정하나요?
+프로그램이 눈치껏 상황에 맞춰 결정합니다.
+1.  **배경 이미지가 있을 때:** 원본 이미지 크기를 그대로 따라갑니다. (가장 정확!)
+2.  **이미지가 없을 때:** 가장 멀리 있는 좌표(오른쪽 끝, 아래쪽 끝)를 찾아서 크기를 짐작합니다.
+3.  **최종 설정:** PPT 프로그램에 맞게 **가로를 10인치(약 25.4cm)**로 고정하고, 세로 길이는 비율에 맞춰 자동으로 늘리거나 줄입니다.

app.py CHANGED Viewed

@@ -781,7 +781,9 @@ def save_results(
     all_results: List[Dict],
     file_stem: str,
     include_background: bool,
-    images: List[Image.Image]
 ) -> Tuple[Optional[str], Optional[str], List[str], List[List[Dict]]]:
     """결과 저장 로직 분리 (부분 저장 지원용)"""
     try:
@@ -799,7 +801,11 @@ def save_results(
             background_images = images[:processed_count] if include_background else []
             page_count, box_count = build_pptx_from_results(
-                all_results, background_images, Path(pptx_path)
             )
             print(f"📊 PPTX 저장: {page_count}페이지, {box_count}개 텍스트박스")
@@ -838,6 +844,8 @@ def process_document(
     prompt_mode: str,
     quality_mode: str,
     include_background: bool,
 ) -> Tuple[Optional[str], Optional[str], Optional[str], str, Any, str]:
     """문서 처리 메인 함수"""
@@ -863,6 +871,7 @@ def process_document(
     print(f"   프롬프트 모드: {prompt_mode}")
     print(f"   품질 모드: {quality_mode} ({target_max_pixels} pixels)")
     print(f"   배경 포함: {include_background}")
     print("=" * 60)
     # 프롬프트 모드 변환
@@ -927,7 +936,12 @@ def process_document(
         # 결과 저장 (정상 완료 시)
         pptx_path, json_path, layout_img_paths, json_data = save_results(
-            all_results, file_stem, include_background, images
         )
         # Markdown 결과
@@ -961,7 +975,12 @@ def process_document(
         if all_results:
             print(f"⚠️ 에러 발생! 현재까지 처리된 {len(all_results)}페이지 결과를 저장합니다...")
             pptx_path, json_path, layout_img_paths, json_data = save_results(
-                all_results, file_stem, include_background, images
             )
             combined_markdown = "\n\n---\n\n".join(all_markdown) if all_markdown else f"*처리 도중 오류 발생: {str(e)}*"
@@ -1119,6 +1138,15 @@ with gr.Blocks(title="PDF/Image to PPTX Converter") as demo:
                     label="PPTX에 배경 이미지 포함",
                     value=True
                 )
             run_btn = gr.Button("🚀 변환 실행", variant="primary", size="lg")
@@ -1177,7 +1205,7 @@ with gr.Blocks(title="PDF/Image to PPTX Converter") as demo:
     run_btn.click(
         fn=process_document,
-        inputs=[file_input, prompt_mode, quality_mode, include_background],
         outputs=[pptx_output, json_output, processed_image, extracted_content, layout_json, log_output]
     )

     all_results: List[Dict],
     file_stem: str,
     include_background: bool,
+    images: List[Image.Image],
+    use_dark_mode: bool = False,
+    show_border: bool = False
 ) -> Tuple[Optional[str], Optional[str], List[str], List[List[Dict]]]:
     """결과 저장 로직 분리 (부분 저장 지원용)"""
     try:
             background_images = images[:processed_count] if include_background else []
             page_count, box_count = build_pptx_from_results(
+                all_results,
+                background_images,
+                Path(pptx_path),
+                use_dark_mode=use_dark_mode,
+                show_border=show_border
             )
             print(f"📊 PPTX 저장: {page_count}페이지, {box_count}개 텍스트박스")
     prompt_mode: str,
     quality_mode: str,
     include_background: bool,
+    use_dark_mode: bool,
+    show_border: bool,
 ) -> Tuple[Optional[str], Optional[str], Optional[str], str, Any, str]:
     """문서 처리 메인 함수"""
     print(f"   프롬프트 모드: {prompt_mode}")
     print(f"   품질 모드: {quality_mode} ({target_max_pixels} pixels)")
     print(f"   배경 포함: {include_background}")
+    print(f"   스타일 옵션: 다크모드={use_dark_mode}, 테두리={show_border}")
     print("=" * 60)
     # 프롬프트 모드 변환
         # 결과 저장 (정상 완료 시)
         pptx_path, json_path, layout_img_paths, json_data = save_results(
+            all_results,
+            file_stem,
+            include_background,
+            images,
+            use_dark_mode=use_dark_mode,
+            show_border=show_border
         )
         # Markdown 결과
         if all_results:
             print(f"⚠️ 에러 발생! 현재까지 처리된 {len(all_results)}페이지 결과를 저장합니다...")
             pptx_path, json_path, layout_img_paths, json_data = save_results(
+                all_results,
+                file_stem,
+                include_background,
+                images,
+                use_dark_mode=use_dark_mode,
+                show_border=show_border
             )
             combined_markdown = "\n\n---\n\n".join(all_markdown) if all_markdown else f"*처리 도중 오류 발생: {str(e)}*"
                     label="PPTX에 배경 이미지 포함",
                     value=True
                 )
+                with gr.Row():
+                    use_dark_mode = gr.Checkbox(
+                        label="다크 모드 (검정 배경/흰 글씨)",
+                        value=False
+                    )
+                    show_border = gr.Checkbox(
+                        label="텍스트 박스 테두리 표시",
+                        value=False
+                    )
             run_btn = gr.Button("🚀 변환 실행", variant="primary", size="lg")
     run_btn.click(
         fn=process_document,
+        inputs=[file_input, prompt_mode, quality_mode, include_background, use_dark_mode, show_border],
         outputs=[pptx_output, json_output, processed_image, extracted_content, layout_json, log_output]
     )

dots_ocr/utils/pptx_generator.py CHANGED Viewed

@@ -254,22 +254,30 @@ def _add_textbox(
     scale_y,
     category: str = "",
     page_height: int = 0,
-    use_dark_bg: bool = False
 ) -> None:
     """
     텍스트 박스 추가 (AutoSize 강제 적용 - 순서 수정 최종 버전)
     """
     try:
-        left = int(bbox[0] * scale_x)
-        top = int(bbox[1] * scale_y)
-        width = int((bbox[2] - bbox[0]) * scale_x)
-        height = int((bbox[3] - bbox[1]) * scale_y)
         if width <= 0 or height <= 0:
             return
         textbox = slide.shapes.add_textbox(left, top, width, height)
         # 1. 텍스트 프레임 설정
         text_frame = textbox.text_frame
         text_frame.clear()
@@ -301,7 +309,7 @@ def _add_textbox(
         run.font.size = _calculate_font_size(width, height, cleaned_text, category, is_bold=is_bold)
         # 4. 색상 및 배경
-        if use_dark_bg:
             run.font.color.rgb = RGBColor(255, 255, 255)
             textbox.fill.solid()
             textbox.fill.fore_color.rgb = RGBColor(0, 0, 0)
@@ -345,7 +353,9 @@ def _add_table(
     bbox,
     html_text,
     scale_x,
-    scale_y
 ) -> None:
     """PPTX 슬라이드에 표 추가"""
     try:
@@ -360,11 +370,11 @@ def _add_table(
         if rows == 0 or cols == 0:
             return
-        # 2. 위치 및 크기 계산
-        left = int(bbox[0] * scale_x)
-        top = int(bbox[1] * scale_y)
-        width = int((bbox[2] - bbox[0]) * scale_x)
-        height = int((bbox[3] - bbox[1]) * scale_y)
         # 3. 표 생성
         graphic_frame = slide.shapes.add_table(rows, cols, left, top, width, height)
@@ -387,17 +397,34 @@ def _add_table(
                 for paragraph in cell.text_frame.paragraphs:
                     paragraph.font.size = Pt(9)
-                    # 헤더 행(첫 행) 스타일링: 검정 배경 + 흰색 글씨 + 볼드
                     if r_idx == 0:
                         paragraph.font.bold = True
-                        paragraph.font.color.rgb = RGBColor(255, 255, 255)  # 흰색 글씨
-                        cell.fill.solid()
-                        cell.fill.fore_color.rgb = RGBColor(0, 0, 0)       # 검정 배경
                     else:
-                        # 나머지 행: 흰색 배경(기본값) + 검정 글씨
-                        paragraph.font.color.rgb = RGBColor(0, 0, 0)
-                        cell.fill.solid()
-                        cell.fill.fore_color.rgb = RGBColor(255, 255, 255)
     except Exception as e:
         print(f"Table add failed: {e}")
@@ -407,6 +434,8 @@ def build_pptx_from_results(
     parse_results: List[Dict],
     background_images: List[Image.Image],
     output_path: Path,
 ) -> Tuple[int, int]:
     """파싱 결과로부터 PPTX 생성"""
     prs = Presentation()
@@ -467,7 +496,7 @@ def build_pptx_from_results(
                 if category == "Table":
                     if not text.strip(): continue
-                    _add_table(slide, bbox, text, scale_x, scale_y)
                     total_boxes += 1
                     continue
@@ -477,7 +506,9 @@ def build_pptx_from_results(
                 _add_textbox(
                     slide, bbox, text, scale_x, scale_y,
                     category=category,
-                    page_height=page_height
                 )
                 total_boxes += 1

     scale_y,
     category: str = "",
     page_height: int = 0,
+    use_dark_mode: bool = False,
+    show_border: bool = False
 ) -> None:
     """
     텍스트 박스 추가 (AutoSize 강제 적용 - 순서 수정 최종 버전)
     """
     try:
+        # [수정] int(버림) 대신 round(반올림)를 사용하여 좌표 정밀도 향상
+        left = int(round(bbox[0] * scale_x))
+        top = int(round(bbox[1] * scale_y))
+        width = int(round((bbox[2] - bbox[0]) * scale_x))
+        height = int(round((bbox[3] - bbox[1]) * scale_y))
         if width <= 0 or height <= 0:
             return
         textbox = slide.shapes.add_textbox(left, top, width, height)
+        # [추가] 테두리 옵션
+        if show_border:
+            line = textbox.line
+            line.color.rgb = RGBColor(200, 200, 200) # 연한 회색 테두리
+            line.width = Pt(1)
         # 1. 텍스트 프레임 설정
         text_frame = textbox.text_frame
         text_frame.clear()
         run.font.size = _calculate_font_size(width, height, cleaned_text, category, is_bold=is_bold)
         # 4. 색상 및 배경
+        if use_dark_mode:
             run.font.color.rgb = RGBColor(255, 255, 255)
             textbox.fill.solid()
             textbox.fill.fore_color.rgb = RGBColor(0, 0, 0)
     bbox,
     html_text,
     scale_x,
+    scale_y,
+    use_dark_mode: bool = False,
+    show_border: bool = False
 ) -> None:
     """PPTX 슬라이드에 표 추가"""
     try:
         if rows == 0 or cols == 0:
             return
+        # 2. 위치 및 크기 계산 (반올림 적용)
+        left = int(round(bbox[0] * scale_x))
+        top = int(round(bbox[1] * scale_y))
+        width = int(round((bbox[2] - bbox[0]) * scale_x))
+        height = int(round((bbox[3] - bbox[1]) * scale_y))
         # 3. 표 생성
         graphic_frame = slide.shapes.add_table(rows, cols, left, top, width, height)
                 for paragraph in cell.text_frame.paragraphs:
                     paragraph.font.size = Pt(9)
+                    # 헤더 행(첫 행) 스타일링
                     if r_idx == 0:
                         paragraph.font.bold = True
+                        if use_dark_mode:
+                            # 다크모드 헤더: 흰 글씨 / 짙은 회색 배경
+                            paragraph.font.color.rgb = RGBColor(255, 255, 255)
+                            cell.fill.solid()
+                            cell.fill.fore_color.rgb = RGBColor(50, 50, 50)
+                        else:
+                            # 기본 헤더: 흰 글씨 / 검정 배경
+                            paragraph.font.color.rgb = RGBColor(255, 255, 255)
+                            cell.fill.solid()
+                            cell.fill.fore_color.rgb = RGBColor(0, 0, 0)
                     else:
+                        # 나머지 행
+                        if use_dark_mode:
+                            # 다크모드 내용: 흰 글씨 / 검정 배경
+                            paragraph.font.color.rgb = RGBColor(255, 255, 255)
+                            cell.fill.solid()
+                            cell.fill.fore_color.rgb = RGBColor(0, 0, 0)
+                        else:
+                            # 기본 내용: 검정 글씨 / 흰 배경
+                            paragraph.font.color.rgb = RGBColor(0, 0, 0)
+                            cell.fill.solid()
+                            cell.fill.fore_color.rgb = RGBColor(255, 255, 255)
+        # 표 테두리는 기본적으로 존재하므로 show_border 옵션은 표에서는 생략하거나
+        # 필요하다면 별도 스타일 적용 가능 (여기서는 텍스트박스와 일관성을 위해 매개변수만 ��아둠)
     except Exception as e:
         print(f"Table add failed: {e}")
     parse_results: List[Dict],
     background_images: List[Image.Image],
     output_path: Path,
+    use_dark_mode: bool = False,
+    show_border: bool = False
 ) -> Tuple[int, int]:
     """파싱 결과로부터 PPTX 생성"""
     prs = Presentation()
                 if category == "Table":
                     if not text.strip(): continue
+                    _add_table(slide, bbox, text, scale_x, scale_y, use_dark_mode=use_dark_mode, show_border=show_border)
                     total_boxes += 1
                     continue
                 _add_textbox(
                     slide, bbox, text, scale_x, scale_y,
                     category=category,
+                    page_height=page_height,
+                    use_dark_mode=use_dark_mode,
+                    show_border=show_border
                 )
                 total_boxes += 1