""" config file for gpt2-fedibooks this file is a simple python file that will be imported by gpt2-fedibooks. it looks for the following keys: datadir: PathLike # the working directory of fedibooks. it will recursively create all directories needed, and exclusively create files there. parsing_arbitrary_exclude_fn: callable # this function takes a string (the post) and returns True if that post should be excluded parsing_exclude_mentions: bool # strip out any word starting with @ parsed_posts_file: str # filename that the outbox parser will save into tokenizer_output_prefix: str # file prefix for the merges and vocab files generated by the tokenizer model_size: enum[str] # CURRENTLY NOT IMPLEMENTED! s/m/l/xl to pick the gpt2 model size (124M, 355M, 774M, and 1558M) model_folder: str # name of the folder (relative to datadir) for trained model storage use_gpu: bool # NOT YET IMPLEMENTED! prompt_before_training: bool training_block_size: int training_num_workers: int # seems to have absolutely no effect? training_batch_size: int training_num_steps: int training_sample_frequency: int # print out sample generations every n training steps training_save_frequency: int # save model snapshots every n steps generation_zwsp_mentions: bool # add a zero width space after every @ in generated texts generation_prompt: str | None # prompt for gpt2 generation generation_include_prompt: bool # whether to include the prompt in the output generation_max_length: int generation_temperature: fload # 0.0 to 1.0, how "crazy" the output is, higher is more configuration is done by defining python variables / functions, like so: ```py model_size = 'l' training_block_size = 64 generation_temperature = 0.8 prompt_before_training = False def parsing_arbitrary_exclude_fn(post): return random.randint(0, 1) ``` any defined variables that aren't in the above list will be ignored. any scripts are possible. """