Software & Apps

microsoft/markitdown: Python tool for converting office files and documents to Markdown.

The MarkItDown library is a utility tool for converting various files to Markdown (eg, for indexing, text analysis, etc.)

It currently supports:

  • PDF (.pdf)
  • PowerPoint (.pptx)
  • Speech (.docx)
  • Excel (.xlsx)
  • Images (EXIF metadata, and OCR)
  • Audio (EXIF metadata, and speech transcription)
  • HTML (special handling of Wikipedia, etc.)
  • Various text-based formats (csv, json, xml, etc.)

The API is simple:

from markitdown import MarkItDown

markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) that states that you have the right to, and essentially grant us rights to, use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, the CLA bot will automatically determine if you need to issue a CLA and decorate the PR appropriately (eg, status check, comment). Just follow the instructions given by the bot. You only need to do this once for all repos using our CLA.

This project adopts the Microsoft Open Source Code of Conduct. For more information see Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must be followed
Microsoft Trademark and Brand Guidelines. Use of Microsoft marks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third party trademarks or logos is subject to the third party’s policies.


https://opengraph.githubassets.com/8975fb02b7d6575a24f94ee957114594c8f73415c45203888bbdc1c0227e901f/microsoft/markitdown

2024-12-13 18:02:03

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button