前端如何實(shí)現(xiàn).md文件轉(zhuǎn)換成.html文件

者：前端小智來源：大遷世界

.md文件是markdown的一種標(biāo)記語言，和html比較起來，更簡(jiǎn)單快捷，主要體現(xiàn)在：標(biāo)記符的數(shù)量和書寫上。

標(biāo)記符的數(shù)量：html文檔需要用到數(shù)量繁多的標(biāo)記符，再輔以css來控制樣式和排版，而markdown文檔只需要四個(gè)基本的標(biāo)記符號(hào)就能完成同樣的事。
標(biāo)記符的書寫：HTML文檔內(nèi)容需要同時(shí)標(biāo)記開始和結(jié)束這是一個(gè)網(wǎng)頁(yè)，而markdown文檔則只要在開始位置標(biāo)記即可# 這是一個(gè)md文檔。下面介紹如何實(shí)現(xiàn)將.md文件轉(zhuǎn)換成.html文件。

方式一：使用i5ting_toc插件

需要先安裝npm(安裝node.js后會(huì)自帶npm)，然后才能安裝i5ting插件：

npm install i5ting_toc -g

執(zhí)行命令行生成html文件，在輸入前要進(jìn)入到對(duì)應(yīng)根目錄下：

i5ting_toc -f **.md

需要注意的是：寫md文檔的特殊符號(hào)時(shí)記得添加空格。小技巧：如何快速在當(dāng)前目錄打開cmd?選擇當(dāng)前目錄，按住shift，然后鼠標(biāo)右鍵在此處打開命令窗口(在此處打開powerShell窗口)。

方式二：使用gitbook

同樣先需要安裝node，然后運(yùn)行：

npm i gitbook gitbook-cli -g

生成md文件,這個(gè)命令會(huì)生成相應(yīng)的md的文件，然后在相應(yīng)的文件里寫你的內(nèi)容即可：

gitbook init

md轉(zhuǎn)html,生成一個(gè)_doc目錄，打開就可以看到你html文件了。

gitbook build

方式三：利用前端代碼

實(shí)現(xiàn)原理是采用node.js搭建服務(wù)器，讀取md文件轉(zhuǎn)化為html片斷。瀏覽器發(fā)送ajax請(qǐng)求獲取片段后再渲染生成html網(wǎng)頁(yè)。

node代碼：

var express = require('express');
var http = require('http');
var fs = require('fs');
var bodyParser = require('body-parser');
var marked = require('marked'); // 將md轉(zhuǎn)化為html的js包
var app = express();

app.use(express.static('src')); //加載靜態(tài)文件
var urlencodedParser = bodyParser.urlencoded({ extended: false });

app.get('/getMdFile',urlencodedParser, function(req, res) {
var data = fs.readFileSync('src/test.md', 'utf-8'); //讀取本地的md文件
res.end(JSON.stringify({
body : marked(data)
}));
} );

//啟動(dòng)端口監(jiān)聽
var server = app.listen(8088, function () {
var host = server.address().address;
var port = server.address().port;
console.log("應(yīng)用實(shí)例，訪問地址為 http://%s:%s", host, port)
});

前端html：

<div id="content"> <h1 class="title">md-to-HTML web app</h1> <div id="article"> </div></div><script type="text/JavaScript" src="js/jquery-1.11.3.min.js"></script><script> var article = document.getElementById('article'); $.ajax({ url: "/getMdFile", success: function(result) { console.log('數(shù)據(jù)獲取成功'); article.innerHTML = JSON.parse(result).body; }, error: function (err) { console.log(err); article.innerHTML = '<p>獲取數(shù)據(jù)失敗</p>'; } });</script>

tml2pdf

pdfbox

PDFBox是一個(gè)Java庫(kù)，可用于創(chuàng)建，修改和提取PDF文件的內(nèi)容。它是一個(gè)Apache軟件基金會(huì)的項(xiàng)目，使用Apache License 2.0許可證。
PDFBox提供了一組API，可用于提取文本和圖像，添加和刪除頁(yè)面，提取PDF元數(shù)據(jù)和加密PDF文件等。

主要依賴

        <!-- 將 html 轉(zhuǎn)換為 xml 工具庫(kù) -->
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.17.1</version>
        </dependency>

        <!-- 第三方 pdfbox 包裝庫(kù),提供 html 轉(zhuǎn) pdf 功能 -->
        <dependency>
            <groupId>com.openhtmltopdf</groupId>
            <artifactId>openhtmltopdf-pdfbox</artifactId>
            <version>1.0.10</version>
        </dependency>

測(cè)試代碼

        // 獲取 java 版本
        String version = System.getProperty("java.specification.version");

        // 獲取系統(tǒng)類型
        String platform = System.getProperty("os.name", "");
        platform = platform.toLowerCase().contains("window") ? "win" : "linux";

        // 當(dāng)前程序目錄
        String current = System.getProperty("user.dir");

        System.out.println(String.format("current=%s", current));

        // html 文件路徑
        File index = Paths.get(current, "..", "index.html").toFile();
        if (!index.exists()) {
            System.out.println(String.format("file not exist,file=%s", index.getAbsolutePath()));
            return;
        }

        try {
            Document doc = Jsoup.parse(index, "UTF-8");
            // 補(bǔ)全標(biāo)記
            doc.outputSettings().syntax(Document.OutputSettings.Syntax.xml);

            File file = Paths.get(current, String.format("java%s_%s.pdf", version, platform)).toFile();
            FileOutputStream stream = new FileOutputStream(file);

            PdfRendererBuilder builder = new PdfRendererBuilder();

            // NOTE 字體問題,文檔中出現(xiàn)過的字段,需要手動(dòng)加載字體
            builder.useFont(Paths.get(current, "..", "fonts", "simsun.ttc").toFile(), "SimSun");
            builder.useFont(Paths.get(current, "..", "fonts", "msyh.ttc").toFile(), "font-test");
            builder.useFont(Paths.get(current, "..", "fonts", "msyh.ttc").toFile(), "Microsoft YaHei UI");

            // NOTE 設(shè)置根目錄
            String baseUrl = Paths.get(current, "..").toUri().toString();
            builder.withHtmlContent(doc.html(), baseUrl);

            builder.toStream(stream);
            builder.run();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }

效果預(yù)覽

pdfbox-demo/java1.8_win.pdf · yjihrp/linux-html2pdf-demo - Gitee.com

pdfbox-demo/java11_linux.pdf · yjihrp/linux-html2pdf-demo - Gitee.com

實(shí)用工具

# 查看 pdf 內(nèi)部結(jié)構(gòu)
java -jar pdfbox-app debug path-to-pdf/test.pdf
java -jar debugger-app path-to-pdf/test.pdf

測(cè)試結(jié)果

下一篇 5-LINUX HTML 轉(zhuǎn) PDF-selenium

近有一個(gè)業(yè)務(wù)是前端要上傳word格式的文稿，然后用戶上傳完之后，可以用瀏覽器直接查看該文稿，并且可以在富文本框直接引用該文稿，所以上傳word文稿之后，后端保存到db的必須是html格式才行，所以涉及到word格式轉(zhuǎn)html格式。

通過調(diào)查，這個(gè)word和html的處理，有兩種方案，方案1是前端做這個(gè)轉(zhuǎn)換。方案2是把word文檔上傳給后臺(tái)，后臺(tái)轉(zhuǎn)換好之后再返回給前端。至于方案1，看到大家的反饋都說很多問題，所以就沒采用前端轉(zhuǎn)的方案，最終決定是后端轉(zhuǎn)化為html格式并返回給前段預(yù)覽，待客戶預(yù)覽的時(shí)候，確認(rèn)格式?jīng)]問題之后，再把html保存到后臺(tái)（因?yàn)閣ord涉及到的格式太多，比如圖片，visio圖，表格，圖片等等之類的復(fù)雜元素，轉(zhuǎn)html的時(shí)候，可能會(huì)很多格式問題，所以要有個(gè)預(yù)覽的過程）。

對(duì)于word中普通的文字，問題倒不大，主要是文本之外的元素的處理，比如圖片，視頻，表格等。針對(duì)我本次的文章，只處理了圖片，處理的方式是：后臺(tái)從word中找出圖片（當(dāng)然引入的jar包已經(jīng)帶了獲取word中圖片的功能），上傳到服務(wù)器，拿到絕對(duì)路徑之后，放入到html里面，這樣，返回給前端的html內(nèi)容，就可以直接預(yù)覽了。

maven引入相關(guān)依賴包如下：

 <poi-scratchpad.version>3.14</poi-scratchpad.version>
        <poi-ooxml.version>3.14</poi-ooxml.version>
        <xdocreport.version>1.0.6</xdocreport.version>
        <poi-ooxml-schemas.version>3.14</poi-ooxml-schemas.version>
        <ooxml-schemas.version>1.3</ooxml-schemas.version>
        <jsoup.version>1.11.3</jsoup.version>

<dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
            <version>${poi-scratchpad.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>${poi-ooxml.version}</version>
        </dependency>
        <dependency>
            <groupId>fr.opensagres.xdocreport</groupId>
            <artifactId>xdocreport</artifactId>
            <version>${xdocreport.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml-schemas</artifactId>
            <version>${poi-ooxml-schemas.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>ooxml-schemas</artifactId>
            <version>${ooxml-schemas.version}</version>
        </dependency>
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>${jsoup.version}</version>
        </dependency>

word轉(zhuǎn)html，對(duì)于word2003和word2007轉(zhuǎn)換方式不一樣，因?yàn)閣ord2003和word2007的格式不一樣，工具類如下：

使用方法如下：

public String uploadSourceNews(MultipartFile file)  {
        String fileName = file.getOriginalFilename();
        String suffixName = fileName.substring(fileName.lastIndexOf("."));
        if (!".doc".equals(suffixName) && !".docx".equals(suffixName)) {
            throw new UploadFileFormatException();
        }
        DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyyMM");
        String dateDir = formatter.format(LocalDate.now());
        String directory = imageDir + "/" + dateDir + "/";
        String content = null;
        try {
            InputStream inputStream = file.getInputStream();
            if ("doc".equals(suffixName)) {
                content = wordToHtmlUtil.Word2003ToHtml(inputStream, imageBucket, directory, Constants.HTTPS_PREFIX + imageVisitHost);
            } else {
                content = wordToHtmlUtil.Word2007ToHtml(inputStream, imageBucket, directory, Constants.HTTPS_PREFIX + imageVisitHost);
            }
        } catch (Exception ex) {
            logger.error("word to html exception, detail:", ex);
            return null;
        }
        return content;
    }

關(guān)于doc和docx的一些存儲(chǔ)格式介紹：

docx 是微軟開發(fā)的基于 xml 的文字處理文件。docx 文件與 doc 文件不同, 因?yàn)?docx 文件將數(shù)據(jù)存儲(chǔ)在單獨(dú)的壓縮文件和文件夾中。早期版本的 microsoft office (早于 office 2007) 不支持 docx 文件, 因?yàn)?docx 是基于 xml 的, 早期版本將 doc 文件另存為單個(gè)二進(jìn)制文件。
DOCX is an XML based word processing file developed by Microsoft. DOCX files are different than DOC files as DOCX files store data in separate compressed files and folders. Earlier versions of Microsoft Office (earlier than Office 2007) do not support DOCX files because DOCX is XML based where the earlier versions save DOC file as a single binary file.

可能你會(huì)問了，明明是docx結(jié)尾的文檔，怎么成了xml格式了？

很簡(jiǎn)單:你隨便選擇一個(gè)docx文件，右鍵使用壓縮工具打開，就能得到一個(gè)這樣的目錄結(jié)構(gòu)：

所以你以為docx是一個(gè)完整的文檔，其實(shí)它只是一個(gè)壓縮文件。

參考：

https://www.cnblogs.com/ct-csu/p/8178932.html

在線咨詢

上一篇：2.3K star！解鎖在線表格—FortuneSheet
下一篇：ID干貨-快速入門InDesign

您的項(xiàng)目需求

*請(qǐng)認(rèn)真填寫需求信息，我們會(huì)在24小時(shí)內(nèi)與您取得聯(lián)系。

整合營(yíng)銷服務(wù)商