What a library is

A SharePoint document library is where documents are stored and organized within a site. Libraries are typically organized by the type of information they contain — a Policies library, a Project Files library, an Operational Documents library — rather than by who owns the content or what folder it used to live in on a file share.

Every library provides:

File storage. Versioned, with check-in/check-out and recycle bin.
Version history. Major and minor versions, configurable retention.
Metadata fields. Properties applied to every file in the library.
Views and filtering. Multiple ways to look at the same content.
Permissions. Inherited from the site by default, with the option to break inheritance.
Microsoft 365 integration. Office desktop apps, Teams files tab, Power Platform, search, Copilot.

Rule of thumb

If the site is the workspace, the library is the cabinet where documents live. A cabinet can have shelves (folders) or labels on the files (metadata) — usually both, in different proportions.

When to create a separate library

A common mistake is to put everything in one library and rely on folders to keep it organized. Separate libraries are cheap to create and dramatically improve usability when content has different governance needs.

Create a separate library when:

Different groups of users need different access.
Content needs different metadata structures.
Content has different retention or compliance requirements.
Volume is large enough that one library becomes unwieldy.
Different lifecycle rules apply (e.g., active vs. archive content).

Watch out

Avoid recreating deep folder hierarchies from file shares. Folders are a single dimension; metadata is multi-dimensional. Five well-designed libraries with metadata almost always outperform one library with five folder levels.

Folders vs. metadata

There are two ways to organize content inside a library: folders and metadata. Most real-world libraries use both, but the balance matters.

	Folders	Metadata
Strengths	Familiar, easy to explain. Useful when content has a single, stable category. Permissions can be inherited at folder level (use sparingly).	Multiple ways to find the same document. Filter, group, sort, search by any field. Easier to govern at scale. Works well for large libraries and knowledge bases.
Weaknesses	One structure only. Hard to filter across categories. Hard to reorganize. Encourages duplication ('I'll just put a copy here too').	Requires planning and design. Users must enter values correctly. Too much metadata can overwhelm and slow data entry.
Best for	Small or simple libraries; staging content during migration; content that genuinely has one home.	Large libraries; knowledge bases; anything searched or filtered routinely; content with cross-cutting categories.

A simple test: if users would naturally want to find the same document by more than one category — by department and by year and by document type — folders alone won't get them there. Metadata will.

Rule of thumb

Folders are fine for small, simple libraries. Metadata is required for libraries that need to scale, surface in search, or feed automation. Don't try to enforce metadata where folders are sufficient — but don't pretend folders will scale beyond a few thousand documents either.

Content types — repeatable metadata

A content type defines a type of information and the metadata associated with it. A 'Policy' content type might require Policy Owner, Effective Date, Review Date, and Document Status. A 'Project Document' content type might require Project, Phase, Deliverable, and Status. Applying a content type to a library means every document in that library shares the same schema.

Why content types matter:

Standardize document structure and metadata across sites and libraries.
Promote consistent search, filtering, and reporting.
Enable governance, automation, and lifecycle management at the type level.
Make it easier to apply retention labels, sensitivity labels, and Power Platform automations.

Watch-outs:

Don't try to model every information type at the enterprise level. Start with the high-value cases (Policy, Procedure, Report, Contract).
Allow flexibility for site-specific libraries to add their own metadata.
Libraries must be configured to use content types — and the default 'Document' content type usually needs to be removed.
Overly complex content types make libraries harder for users. More metadata isn't always better.

With Kybera Impact

Enterprise content types are part of the Kybera Impact Information Model, deployed once at the tenant level and synced down to sites. The Library Catalog standardizes which content types are applied to which library templates. This means a 'Policies' library looks the same wherever it is provisioned, which makes search, retention, and reporting predictable.

Taxonomy — shared vocabulary

A taxonomy is a standardized list of terms used to classify information across libraries and sites. Examples: a list of departments, a list of clinical topics, a list of product lines, a list of fiscal years. Taxonomies live in the SharePoint Term Store and are referenced by Managed Metadata fields on libraries and content types.

Why taxonomies matter:

Promote consistent tagging — everyone uses the same term for the same concept.
Improve search, filtering, and reporting.
Enable shared terminology across departments.
Support translation in multilingual environments.

Watch-outs:

Don't try to model every possible term. Keep taxonomies focused on commonly used categories.
Allow site-specific extensions where the enterprise term set isn't sufficient.
Term store maintenance is real work — assign an owner.

Phasing

Most organizations should start with minimal metadata and a small enterprise taxonomy (Document Type, Department, Status). Introduce more taxonomies and content types as Phase 2 — once actual usage shows where the value is.

Document sets — grouped metadata

A document set groups related documents together while sharing common metadata. The set has a single set of properties, and every document inside the set inherits those properties. This is useful for structured document packages — case files, project records, employee files, OHS incident packages — where the same metadata applies to every document in the package and would otherwise have to be entered manually for each one.

Why document sets matter:

Reduce the metadata-tagging burden — enter once, inherit everywhere in the set.
Ensure consistent tagging across related documents.
Support views, filtering, and search across the set.
Enable automation (Power Automate workflows triggered at the set level).
Improve discoverability for search, Copilot, and compliance tools.

Watch-outs:

Best for structured document packages, not general-purpose storage.
Requires planning and configuration — not enabled by default on a site.
Requires user training to be used correctly. Without training, document sets are often misused as folders.

With Kybera Impact

Document Sets and Document ID are enabled as part of every Impact-provisioned site through the Workspaces template. The Repositories module ships document set templates for common scenarios (employee files, project records, policy packages) so site owners don't need to configure them from scratch.

Auto-extraction with SharePoint Premium (Syntex)

SharePoint Premium (formerly Syntex) provides AI-driven document processing — extracting metadata from document content automatically and applying it to library fields. This is particularly valuable for high-volume libraries (invoices, contracts, forms) where manual metadata entry is impractical.

Two notes on positioning:

Don't configure auto-extraction across a thousand sites. The pattern is: pick up high-value content (invoices, contracts), move it to a centralized library, run extraction once, distribute the results.
Pricing has shifted from licensing to pay-per-transaction (per page processed, per signature). This makes Syntex easier to adopt for targeted use cases without committing the whole tenant.

Watch out

Don't try to introduce Syntex/SharePoint Premium in Phase 1. Get the foundational metadata, content types, and library structure right first. Auto-extraction is a Phase 2+ optimization — it works well when there's a clear target structure to extract into, and badly when the library design is still in flux.

Designing a library structure (workshop pattern)

A repeatable five-step pattern works for most departments:

Understand the business. Engage department leadership, key stakeholders, operational users. Identify types of information, how they're used, access requirements, compliance considerations.
Define the target structure. What are the main content types? Should they be in separate libraries? Who needs access? What metadata helps users find content?
Review existing content. Analyze file shares and legacy SharePoint sites. Identify ROT (redundant, outdated, trivial) content. Don't bring forward what shouldn't survive.
Validate and refine. Walk through real examples with stakeholders. Adjust based on feedback before any content moves.
Implement and evolve. Create libraries and metadata. Begin migration. Monitor usage and refine. Most designs need a meaningful adjustment within 90 days of go-live.

Discussion Questions

• How much metadata do we want to introduce in Phase 1, vs. defer to Phase 2?

• Where are folders acceptable in our environment, and where do we want metadata-only?

• What enterprise content types are worth defining at the tenant level (Policy, Procedure, Report, Contract, ...)?

• What enterprise taxonomies are worth standardizing centrally — Document Type, Department, Fiscal Year, Status?

• Where do document sets add real value (employee files, project records, case files)?

• Are we ready to invest in user training on metadata, or do we need to design for very-low-effort tagging?

• Where is auto-extraction (SharePoint Premium / Syntex) a Phase 2 candidate?