BitcoinPaper / README.md

Update Readme.md

c454553 verified over 1 year ago

5.88 kB

	# Title
	"Bitcoin Forum analysis underscores enduring and substantial carbon footprint of Bitcoin"

	- Cyrille Grumbach, ETH Zurich, Switzerland (cgrumbach@ethz.ch)
	- Didier Sornette, Southern University of Science and Technology, China (didier@sustech.edu.cn)

	# Folders

	## scraper

	File main.ipynb is used to scrape the forum.

	## hardwarelist

	Includes the following:

	- pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates originally made by Cyrille.
	- get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js: These can be pasted into the browser console in the URLs listed within the files, used to extract the hardware efficiency table
	- hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
	- 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
	- 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
	- 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
	- 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files

	## bitcoinforum

	### 1_forum_dataset

	It contains the raw HTML from the forum and code to parse it and combine it into data frames.

	### 2_train_set_creation

	Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, and also creates the inputs that will be given to Mistral 7B after training.

	### 3_training

	Trains Mistral 7B using LoRA on the dataset generated earlier and saves the merged model.

	### 4_inference

	Runs inference of the trained Mistral 7B on inputs.csv created in part 2.

	### 5_processing_extracted_data

	Includes the following files:

	- 1_processing.ipynb: Takes the raw output from Mistral 7B and converts it into hardware_instances.csv
	- 2_create_mapping.ipynb: Uses GPT4 to map the hardware names to those of the efficiency table
	- 3_add_efficiency.ipynb: Merges the mapped hardware instances and the efficiency table to get hardware_instances_with_efficiency.csv
	- 4_visualizations.ipynb, not_usable_threads.txt, hardware_instances_inc_threads.csv: Only used for debugging
	- hardware_mapping.py: automatically generated by step 3

	### 6_merging

	Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv

	monthly_stuff.csv contains columns: date, price, hashrate, coins_per_block, efficiency, max_efficiency

	## plots

	Includes the following:

	- carbon-comparison folder: Contains the 17 sources used to create the carbon comparison table
	- carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
	- appendix2.ipynb: Creates all plots from appendix2

	# System requirements

	Running Mistral 7 B's training or inference requires an NVIDIA GPU with at least 24GB of VRAM (it can also be a Runpod instance).

	Everything else can be run on a normal desktop/laptop computer with Python 3.10 installed.

	# Operating system

	Code unrelated to training or inference of Mistral 7B has been tested on Windows 10.

	Code for Mistral 7B training and inference has been tested on Runpod instances.

	# Installation guide for software dependencies

	For the code unrelated to training or inference of Mistral 7B, use the packages listed in requirements.txt

	## Installation guide for Mistral 7B training and inference

	Setup a Runpod instance with the axolotl docker image, then install Unsloth using the instructions at https://github.com/unslothai/unsloth

	Also, install SGLang for inference.

	## Typical install time on a "normal" desktop computer

	For the code unrelated to training or inference of Mistral 7B, the install time is around 5 minutes.

	For Mistral 7B training and inference, the install time is around 1 hour.

	# Demo

	## Instructions to run on data

	Run the code in the order listed in the folders section above.

	Note: 3 files normally take a long time to run. I have included a const "DEMO_MODE" at the top of each file. When turned on, the files will run on a tiny subset of the data. The original runtimes are as follows:

	- The scraper takes over 12 hours to run.
	- The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$ OpenAI credits.
	- The process of mapping the hardware names to those of the efficiency table takes around 3 hours and also costs about 10$ of OpenAI credits.

	All other files can be run in a few minutes.

	## Expected output

	You should re-obtain the CSV files already in the folders and the plots used in the paper.

	## Expected run time for demo on a "normal" desktop computer

	The expected run time to run every notebook on a "normal" desktop computer is around 10 minutes (excluding the training and inference of Mistral 7B).

	## Instructions for use on custom data

	The code is designed only to analyze the mining section of bitcointalk.org.

	Acknowledgments
	This work was partially supported by the National Natural Science Foundation of China (Grant No.
	T2350710802 and No. U2039202), Shenzhen Science and Technology Innovation Commission
	Project (Grants No. GJHZ20210705141805017 and No. K23405006), and the Center for
	Computational Science and Engineering at Southern University of Science and Technology. The
	authors acknowledge T. Laborie for excellent research assistance and Y. Cui and M. von Krosigk
	for helpful comments. Any errors are our own.