Dataset Store
Book
Expert responses in various fields, including law, lifestyle, finance, and health (over 2.3 million counseling conversations)
Dataset Store
Book
Expert responses in various fields, including law, lifestyle, finance, and health (over 2.3 million counseling conversations)
Books available for AI training
Book
E-book
Printed book
Korean book
TAG
Books available for AI training
Format
• Scanned physical books
• Customized structured data(e.g., JSON)
Volume
Language Offered
Format
• Scanned physical books
• Customized structured data(e.g., JSON)
Volume
Language Offered
Purchase Procedure
Category Selection
• Confirm required book categories and conditions(e.g., engineering textbooks, economics, liberal arts, foreign language study books, etc.)
• Review the need for data cleansing
Book List Selection
• Provide books lists based on requested categories and conditions(licensing terms vary by title)
• Select books for purchase from the provided list
Format Agreement
• Original files must be destroyed after extraction and cleansing, within the agreed timeframe
• If structured data is requested, additional cleansing costs will be quoted separately
Additional Processing / Refinement
• Discuss detailed processing and refinement criteria(text, images, tables, footnotes, etc.)
Category Selection
• Confirm required book categories and conditions(e.g., engineering textbooks, economics, liberal arts, foreign language study books, etc.)
• Review the need for data cleansing
Book List Selection
• Provide book lists based on requested categories and conditions(licensing terms vary by title)
• Select books for purchase from the provided list
Format Agreement
• Original files must be destroyed after extraction and cleansing, within the agreed timeframe
• If structured data is requested, additional cleansing costs will be quoted separately
Additional Processing / Refinement
• Discuss detailed processing and refinement criteria(text, images, tables, footnotes, etc.)
Category and Condition Examples
Category
Humanities
Literature
Religion
History
Biography
Society
Science
Computer · Internet
Linguistics
Economy · Business
Dictionaries
Education
Foreign Books
Travel · Maps
Hobbies · Leisure
Family · Health · Lifestyle
Arts · Popular Culture
Self-published Works
Adult
Textbooks
Condition Setting 1
Publication Date
Books published within the last year
Target Audience
No restrictions
Original Format
Prioritize EPUB format for easier text extraction
Others
Maximize text acquisition within budget
Condition Setting 2
Publication Date
No restrictions
Target Audience
Professional books such as university textbooks
Original Format
No restrictions(prefer higher text quality)
Others
• Secure as many diverse topics as possible within the designated category
• If multiple editions exist, retain only the latest edition and remove duplicates
Category
Humanities
Literature
Religion
History
Biography
Society
Science
Computer · Internet
Linguistics
Economy · Business
University Textbooks
Dictionaries
Education
Foreign Books
Travel · Maps
Hobbies · Leisure
Family · Health · Lifestyle
Arts · Popular Culture
Self-published Works
Adult
Condition Setting 1
Publication Date: Books published within the last year
Target Audience: No restrictions
Original Format: Prioritize EPUB format for easier text extraction
Others: Maximize text acquisition within budget
Condition Setting 2
Publication Date: No restrictions
Target Audience: Professional books such as univerity textbooks
Original Format: No restrictions(prefer higher text quality)
Others:
• Secure as many diverse topics as possible within the design ated category
• If multiple editions exisst, retain only the latest edition and remove duplicates
Features
Application Fields
Improve Generalization
Train models using Q&A collected from diverse fields and numerous users. This effectively enhances the model's generalization capability to stably respond to new subjects or unfamiliar question formats.
LLM Performance Improvement
Possesses a clear and simple Q&A pair structure. This allows for immediate deployment in advanced training techniques like Few-shot Learning or Instruction Tuning with0ut complex preprocessing.
Trainig of NLP Models
Training reflects real-world Q&A patterns. This simultaneously trains the AI model for both Natural Language Understanding (NLU) and Natural Language Generation (NLG), enhancing accurate communication skills.
Applicable to diverse other use cases.