Posts

100 Bitcoins Forgone for Science

Feb 26, 2019
Bitcoin

This post is just another piece of my serious nonsense. All of a sudden, I wanted to know how many Bitcoins I could have mined since 2012? This is because I’ve known Bitcoin since its existence in 2009, but have never really put any effort in mining. Instead, I was fascinated by the idea of using distributed (volunteer) computing to solve scientific problems. For example, BOINC and related projects like World Community Grid are using the computing power donated from around the world to find effective treatments for cancer and HIV/AIDS, low-cost water filtration systems and new materials for capturing solar energy efficiently, etc. ...

贝叶斯理论和疾病检查

Feb 14, 2019

最近几天有一篇文章刷爆了朋友圈,《流感下的北京中年》,让人看了以后不是滋味。可以想见一些人会蜂拥去医院做检查,看自己是否有疾病困扰。Baye’s theorem在这里也是有用的,一是检查阳性不代表就真的得病了,二是为了确诊而做许多不同检查是必要的。比如说用甲胎蛋白查肝癌,令: $$C=\text{被检者患肝癌}$$ $$\overline{C}=\text{被检者未患肝癌}$$ $$A=\text{甲胎蛋白检验为阳性}$$ $$\overline{A}=\text{甲胎蛋白检验为阴性}$$ 过去的统计资料显示, $$P(A|C)=0.95$$ $$P(\overline{A}|\overline{C})=0.90$$ 又已知当地居民肝癌发病率, $$P( C )=0.0004$$ 若某人甲胎蛋白检验为阳性,他患有肝癌的概率$P(C|A)$有多大呢?由贝叶斯公式可得: $$P(C|A)=\frac{P( C )P(A|C)}{ P( C )P(A|C)+P(\overline{C})P(A|\overline{C}) }=0.0038$$ 即,虽然他经准确率很高的甲胎蛋白检查为阳性,其实际患有肝癌的概率只有0.38%。这是为什么呢?这是因为虽然$P(A|\overline{C})=0.1$是不大的(这时被检者未患肝癌但是检查为阳性,即检验结果是错误的),但是患有肝癌的人毕竟很少($P( C )=0.0004$),这就使得检验结果是错误的部分$P(\overline{C})P(A|\overline{C})$相对很大,从而造成$P(C|A)$很小。 换个方法表述,假设有10,000人,其中应有约4人患有肝癌,而检验为阳性的人但未患肝癌的人有1,000个,也就是说,当某人甲胎蛋白检验为阳性时,他更有可能是落在了检验错误的人群中而不是真的患了肝癌。这就是已经得到的知识,先验概率$P( C )$的影响——在未怀疑检验对象患有肝癌的时候,准确率很高的测试结果为阳性也不能说明什么问题。但这并不意味着这检查方法就没有用了。通常医生会先采取一些其他的辅助方法来检查,当医生怀疑某个对象有可能患肝癌时候再进行甲胎蛋白检查,此时该对象肝癌的发病率已经显著增加了。你可以理解是人没变,但是他所属的样本群体已经不再是“当地居民”、而是“被怀疑可能患有肝癌的人群”了。如果被怀疑的对象中患有肝癌的概率是0.5,此时的$P( C )=0.5$,可以计算出$P(C|A)$为0.9,这就是相当高的准确度了。 有些人读了一些书,看这个病像是得了这个病、看那个病像是得了那个病,而实际屁事都没有,就是错误地认为自己所属的样本群体已经是了“可疑患者”而不是“当地居民”。而这两个群体的先验概率——发病率$P( C )$的差别是非常大的。当然,但是,身体不舒服了还是要去医院的。专业的事情交给专业的人做。

Secret Macros to Use in WRDS Cloud

Jan 27, 2019
WRDS/SAS Tutorial
WRDS, SAS

When a SAS program is submitted to WRDS Cloud for remote execution, a small script named autoexec.sas runs before everything. The content of this small script is as below: 1 2 3 4 5 * The library name definitions below are used by SAS; * Assign default libref for WRDS (Wharton Research Data Services); %include '/wrds/lib/utility/wrdslib.sas' ; options sasautos=('/wrds/wrdsmacros/', SASAUTOS) MAUTOSOURCE; What it does is basically making the libnames available, so that when we want to, for example, access the Compustat funda dataset, we just need to write: ...

Access WordPress Database via SSH Tunnel on Mac

Jan 7, 2019
SSH, WordPress

To access the MySQL database behind a WordPress site that is hosted on the cloud from a remote Mac, it’s relatively simple: establish a SSH tunnel and then use any browser to visit the phpMyAdmin. First, establish the SSH tunnel to the host machine: ssh -N -L SOURCE-PORT:127.0.0.1:DESTINATION-PORT -i KEYFILE USERNAME@SERVER-IP Because by default phpMyAdmin is running on port 8888, the above commband becomes: ssh -N -L 8888:127.0.0.1:80 -i KEYFILE USERNAME@SERVER-IP Second, open browser and visit 127. ...

Reconciliation of Black-Scholes Variants

Apr 17, 2018
Option Pricing, Black-Scholes

This post is just to show that the different variants of Black-Scholes formula are in fact the same. $S$: Underlying share price $t$: Time to maturity $\sigma$: Standard deviation of underlying share price $K$: Exercise price $r_f$: Risk-free rate Variant 1 This is the one shown in our formula sheet, and is also the traditional presentation of Black-Scholes model. $$ \begin{equation} C=SN(d_1)-N(d_2)Ke^{-r_f t} \end{equation} $$ $$ \begin{equation} d_1=\frac{ln(\frac{S}{K})+(r_f+\frac{\sigma^2}{2})t}{\sigma \sqrt{t}} \end{equation} $$ ...

Bloomberg BQuant (BQNT)

Apr 6, 2018
Bloomberg, Python

Bloomberg is developing a new function in the Terminal, called BQuant, BQNT, under the Bloomberg Anywhere license. I happen to be able to test it thanks to a fund manager and find it could be a future way of using Bloomberg Terminal. Background Bloomberg recently made JupyterLab available inside the Terminal and invited partners to test it out. This function is named BQuant, or BQNT<GO>, which is still under heavy development, but the idea is just great. ...

Handy Stata Code to Generate Fama-French Industry Classification from SIC Code

Feb 26, 2018
SIC, STATA, Fama-French

For it to be handy next time, here’s the Fama-French 48 Industries classification. Here’s the Stata program to create Fama-French 48 Industries from SIC code. Basic usage is: 1 ffind sic, generate(“FF48”) type(48) where sic is SIC code, FF48 is the generated industry variable name, and we are using 48-industry classification. Alternatively, one can choose 5, 10, 12, 17, 30, 38 or 49 industries. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 /**************************************** * ffind. ...